SDL :: View topic - Lack of batching for RenderCopy/RenderCopyEx

SDL
Simple Directmedia Layer Forums

Lack of batching for RenderCopy/RenderCopyEx

.3lite

Joined: 25 Oct 2013

Posts: 38

Posted: Sat Aug 01, 2015 4:09 pm

Hey guys

I wanted to create yet another rendering engine for my game which would be SDL2 internal renderer besides pure Direct3D and OpenGL. The reason for that is because SDL2 has a great software renderer which might come in handy sooner or later. However, seeing the lack of batch drawing within those two functions I can deduce the impact of performance when rendering glyphs of a text and few other minor stuff when batching is more than mandatory.

Perhaps it's time to create some functions to batch few rectangles within one primitive draw call? It's pretty much the same as with already available functions of SDL_RenderDrawPoints, SDL_RenderDrawRects, and SDL_RenderFillRects. Perhaps I'm missing something and there is a specific reason behind not implementing it?

.3lite

Joined: 25 Oct 2013

Posts: 38

Posted: Sat Aug 01, 2015 4:36 pm

I mean I know that batching won't change anything within a software renderer, but it would be nice to have it nonetheless - after all, why not use available Direct3D and OpenGL back-ends when possible?

Lack of batching for RenderCopy/RenderCopyEx

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Sat Aug 01, 2015 6:46 pm

2015-08-01 13:09 GMT-03:00, .3lite:

Quote:

Perhaps it's time to create some functions to batch few rectangles within
one primitive draw call? It's pretty much the same as with already available
functions of SDL_RenderDrawPoints, SDL_RenderDrawRects, and
SDL_RenderFillRects. Perhaps I'm missing something and there is a specific
reason behind not implementing it?

Mostly that people attempted to automatically batch the calls to the
existing functions and somehow expect that to improve all programs
(despite said programs constantly changing textures most likely).

A new function would be probably the best and easiest option (actually
an equivalent for the scaling/rotation variant would be neat too). It
can even be made to just fall back to non-batched functions in
backends where batching wasn't implemented yet (worst case it just
takes about the same amount of time, best case it improves by a lot).
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Lack of batching for RenderCopy/RenderCopyEx

Ethan Lee

Guest

Posted: Sat Aug 01, 2015 7:20 pm

Odds are it'd look a lot like the XNA SpriteBatch when using SpriteSortMode.Deferred:

https://msdn.microsoft.com/en-us/library/microsoft.xna.framework.graphics.spritebatch.aspx
https://msdn.microsoft.com/en-us/library/microsoft.xna.framework.graphics.spritesortmode.aspx
https://github.com/flibitijibibo/FNA/blob/master/src/Graphics/SpriteBatch.cs

You could either require that all copies in a single batch use one texture, like RenderCopyBatched(SDL_Texture*, SDL_Rect**), or do RenderBatchBegin/RenderBatchEnd hints that try to generate batches of RenderCopy calls on SDL's end at the cost of having things split up more than you might expect.

-Ethan

On 8/1/15 3:06 PM, Sik the hedgehog [url=mailto:][/url] wrote:

Quote:

2015-08-01 13:09 GMT-03:00, .3lite:

Quote:

.3lite

Joined: 25 Oct 2013

Posts: 38

Posted: Sat Aug 01, 2015 7:39 pm

To be honest guys I'm expecting something much simpler. Lets give an example of following functions:

Code:

SDL_RenderCopies(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect** srcrect,
const SDL_Rect** dstrect)

Code:

SDL_RenderCopiesEx(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect** srcrect,
const SDL_Rect** dstrect,
const double angle,
const SDL_Point* center,
const SDL_RendererFlip flip)

The only difference between those functions and their equivalents of SDL_RenderCopy and SDL_RenderCopyEx is taking an array of source and destination rectangles. The rest can be taken care of by the programmer. That way the implemention of these functions should be straight forward and should take no more than 10 minutes for both OpenGL and Direct3D. I can make it myself, yes, but I would like to stay up to date with the SDL2 itself and I do not like modifying external libraries I'm relying on.

D3D_RenderCopy uses DrawPrimitiveUP of Direct3D which is almost ready for batch drawing - just add more vertices of the rest rectangles. Desktop OpenGL uses old immediate mode (draw arrays would be much better), but it's easy to implement it as well. All available renderers within SDL2 are pretty much ready to add batching of the same texture and they require minor changes to SDL_RenderCopy and SDL_RenderCopyEx to make new functions out of them.

Lack of batching for RenderCopy/RenderCopyEx

Eric Wing

Guest

Posted: Sat Aug 01, 2015 10:21 pm

On 8/1/15, .3lite wrote:

Quote:

To be honest guys I'm expecting something much simpler. Lets give an example
of following functions:

Code:
SDL_RenderCopies(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect** srcrect,
const SDL_Rect** dstrect)

Code:
SDL_RenderCopiesEx(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect** srcrect,
const SDL_Rect** dstrect,
const double angle,
const SDL_Point* center,
const SDL_RendererFlip flip)

The only difference between those functions and their equivalents of
SDL_RenderCopy and SDL_RenderCopyEx is taking an array of source and
destination rectangles. The rest can be taken care of by the programmer.
That way the implemention of these functions should be straight forward and
should take no more than 10 minutes for both OpenGL and Direct3D. I can make
it myself, yes, but I would like to stay up to date with the SDL2 itself and
I do not like modifying external libraries I'm relying on.

D3D_RenderCopy uses DrawPrimitiveUP of Direct3D which is almost ready for
batch drawing - just add more vertices of the rest rectangles. Desktop
OpenGL uses old immediate mode (draw arrays would be much better), but it's
easy to implement it as well. All available renderers within SDL2 are pretty
much ready to add batching of the same texture and they require minor
changes to SDL_RenderCopy and SDL_RenderCopyEx to make new functions out of
them.

I agree that it should be simpler than XNA and I personally like this
line of thinking.

I think the goal of the batching API should be stated explicitly so
everybody is on the same page. In my opinion, the goal should be to
allow performance optimizations and that's it. (XNA conflates multiple
things...performance plus read-my-mind-do-everything-I-want which
ultimately makes things more complicated.) Convenience wrappers can
always be written on the outside, but you can't wrap around
API/performance bottlenecks and expect them go faster.

Additionally, I suspect that a good SIMD backend could make the
software renderer go a lot faster too. (Watch Handmade Hero for a
great demonstration of how he made a chunky software renderer go to
60fps at 1080p using SSE2.)

I would suggest constraining the API as much as possible for speed. To
make SIMD or any vectorization to go fast, you generally want
predictable data layouts and no branches.

So for example, with

Quote:

SDL_RenderCopiesEx(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect** srcrect,
const SDL_Rect** dstrect,
const double angle,
const SDL_Point* center,
const SDL_RendererFlip flip)

I might suggest that individual array elements for src/dst rect must
always have values and can't be NULL, that way the code that is trying
to shuffle things into registers isn't needing to check for NULL all
the time.

Along that line of thought, we may want explicit array sizes for
srcrect/dstrect as additional parameters. Algorithms may want to
compute up front how it is going to deal with odd number cases where
the number of objects doesn't perfectly divide evenly into the wide
registers. This may have an additional convenience for when the user
has a large array of rects already, but only needs a subset,
preventing the need to make a new copy.

srcrect or dstrect arrays themselves being NULL/empty probably could
be handled efficiently by separating into specialized versions early
before entering into the inner loops.

I'm a little ambivalent about flip. It seems like for performance, the
user should have pre-oriented the texture. On the otherhand, since it
is already in the core SDL API, consistency is nice, and I don't think
this needs to incur a noticeable cost as it can also be separated out
into specialized versions early.

-Eric
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Re: Lack of batching for RenderCopy/RenderCopyEx

.3lite

Joined: 25 Oct 2013

Posts: 38

Posted: Sun Aug 02, 2015 9:52 am

Eric Wing wrote:

I agree. In fact I made a small mistake in my example. I mean't an array of objects, not array of pointers. Including amount of elements inside the array.

That is:

Code:

SDL_RenderCopies(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect* srcrect,
const SDL_Rect* dstrect,
int count)

Code:

SDL_RenderCopiesEx(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect* srcrect,
const SDL_Rect* dstrect,
const double angle,
const SDL_Point* center,
const SDL_RendererFlip flip,
int count)

That way instead of providing an array of pointers you will provide an address to an array with objects.

Code:

SDL_Rect srcRects[100];
SDL_Rect dstRects[100];
// fill them
SDL_RenderCopies(renderer, texture, srcRects, dstRects, 100);

I believe that these two small functions will take renderer of SDL2 into a new level. Who knows how many games were forced to abandon SDL2 in favor of pure Direct3D or OpenGL implementation due to performance issues from lack of batching.

Lack of batching for RenderCopy/RenderCopyEx

Jared Maddox

Guest

Posted: Sun Aug 02, 2015 8:07 pm

Quote:

Date: Sat, 01 Aug 2015 19:39:48 +0000
From: ".3lite"
To:
Subject: Re: [SDL] Lack of batching for RenderCopy/RenderCopyEx
Message-ID:
Content-Type: text/plain; charset="iso-8859-1"

To be honest guys I'm expecting something much simpler. Lets give an example
of following functions:

Code:
SDL_RenderCopies(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect** srcrect,
const SDL_Rect** dstrect)

Code:
SDL_RenderCopiesEx(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect** srcrect,
const SDL_Rect** dstrect,
const double angle,
const SDL_Point* center,
const SDL_RendererFlip flip)

Was center meant to point to an array, or just a single SDL_Point? C
has supported pass-by-value of structures for long enough that I
haven't easily been able to find out when: I think it was during the
K&R/C89 switch-over. I wouldn't use pass-by-value for *_Renderer or
*_Texture, but mostly because they might have hidden data.

Quote:

Date: Sat, 1 Aug 2015 15:21:04 -0700
From: Eric Wing
To:
Subject: Re: [SDL] Lack of batching for RenderCopy/RenderCopyEx
Message-ID:
<CA+Q62MAsW2ikZAPpjhN=
Content-Type: text/plain; charset=UTF-8

On 8/1/15, .3lite wrote:

<snip: see my reply above>

Quote:

I would suggest constraining the API as much as possible for speed. To
make SIMD or any vectorization to go fast, you generally want
predictable data layouts and no branches.

So for example, with

Quote:

SDL_RenderCopiesEx(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect** srcrect,
const SDL_Rect** dstrect,
const double angle,
const SDL_Point* center,
const SDL_RendererFlip flip)

I don't think these would be too bad, since for both the source and
destination you're looking at a mandatory memory read, mandatory
comparison, optional single-instruction jump, and optional constant
write to the same destination as the mandatory memory read. As long as
you prep your src/dest null-replacement SDL_Rect instances to the
texture & renderer beforehand, you should be fine (after all, srcrect
and destrect are likely to point to non-contiguous SDL_Rect, since
gather/store can potentially cut down on the total data transfers, and
the cache thrashing was likely to happen during copy operations as
well).

Quote:

Along that line of thought, we may want explicit array sizes for
srcrect/dstrect as additional parameters.

Agreed, especially since the only other way to find the end of the
array is to go hunting for a null-pointer.

Quote:

srcrect or dstrect arrays themselves being NULL/empty probably could
be handled efficiently by separating into specialized versions early
before entering into the inner loops.

Yeah, when I write a function with pointer args it usually starts off like this:

int func( type *arg )
{
if( arg )
{
}

return( -1 );
}

A quick optomization for even the least optimizing compilers is easy
to figure out.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

.3lite

Joined: 25 Oct 2013

Posts: 38

Posted: Sun Aug 02, 2015 8:34 pm

Jared Maddox wrote:

Actually it is meant to be a single SDL_Point. All rendering engines within SDL2 do use transformation matrices and transformation matrix does always require a state change flushing the current batch.

The batch is meant to be simple, efficient, and it should be up to the programmer to implement any kind of batching he likes.

Re: Lack of batching for RenderCopy/RenderCopyEx

rbanke

Joined: 16 Sep 2013

Posts: 2

Posted: Mon Aug 03, 2015 1:34 am

.3lite wrote:

Eric Wing wrote:

I agree. In fact I made a small mistake in my example. I mean't an array of objects, not array of pointers. Including amount of elements inside the array.

That is:

Code:

SDL_RenderCopies(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect* srcrect,
const SDL_Rect* dstrect,
int count)

Code:

SDL_RenderCopiesEx(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect* srcrect,
const SDL_Rect* dstrect,
const double angle,
const SDL_Point* center,
const SDL_RendererFlip flip,
int count)

That way instead of providing an array of pointers you will provide an address to an array with objects.

Code:

SDL_Rect srcRects[100];
SDL_Rect dstRects[100];
// fill them
SDL_RenderCopies(renderer, texture, srcRects, dstRects, 100);

I would love for something along these lines to be added. For the game I'm working on atm I only use two textures for all my assets and between drawing glyphs for text and sprites I'm seeing performance taking dives while testing drawing just our tilemap, gui & text. Being able to draw all of that with two calls instead of several thousand sounds much nicer Wink

Samote

Joined: 16 Feb 2015

Posts: 1

Posted: Mon Aug 03, 2015 3:03 pm

That would be awesome !

The SDL forums have moved to discourse.libsdl.org. This is just a read-only archive of the previous forums, to keep old links working.

The SDL forums have moved to discourse.libsdl.org.
This is just a read-only archive of the previous forums, to keep old links working.