SDL :: View topic - SDL_Renderer: Reducing the number of render calls

SDL
Simple Directmedia Layer Forums

SDL_Renderer: Reducing the number of render calls

kometbomb

Joined: 17 Mar 2010

Posts: 8

Posted: Fri Aug 08, 2014 5:39 am

Hello folks,

I tried to look for discussion about this but could not find any, pardon me if this has been discussed over an over again. Here goes.

Would it make sense to change the SDL_Renderer so that each RenderCopy()Â etc. call would actually add the drawn quad into a vertex buffer object (or whatever it is that people use in 2014) and then draw the big bunch of quads whenever RenderPresent() is called or when the used texture changes between RenderCopy calls and so on (it at least USED to be important to change the texture as few times as possible)? How the current code does it looks quite inefficient to me, unless modern hardware and drivers do something similar behind the scenes. It especially looks funny in the GLES driver with an absolutely minimal VBO of two triangles.

Generally, my idea works like this this:

1. Frame starts, the quad buffer is zeroed.

2. RenderCopy() with texture 1, added to the buffer
3. Another RenderCopy() with texture 1, added to the buffer

4. RenderCopy with texture 2, the buffer is sent to the GPU and is zeroed and the new quad is added to the buffer

5 ... more calls

6. RenderPresent() sends the buffer if there's anything there

7. Go to 1

Have there been plans for something like this or is the consensus that if one needs more performance, OpenGL etc. should be used directly?

-Tero

SDL_Renderer: Reducing the number of render calls

Jonny D

Joined: 12 Sep 2009

Posts: 932

Posted: Fri Aug 08, 2014 2:11 pm

This general idea has been discussed and it is good. Â It does take a bit of work, though, as SDL would have to take care to flush the VBO whenever a state change is issued.

The SDL rendering subsystem and API is very good for porting old projects, but if you really need more performance or flexibility at the moment, either look to OpenGL directly or SDL_gpu, which wraps OpenGL in a 2D API with this optimization already implemented.

Jonny D

On Fri, Aug 8, 2014 at 1:39 AM, Tero Lindeman wrote:

Quote:

SDL_Renderer: Reducing the number of render calls

kometbomb

Joined: 17 Mar 2010

Posts: 8

Posted: Fri Aug 08, 2014 4:19 pm

Quote:

OK, thank you for the tip, SDL_gpu seems promising. Though, what I reallylike about the vanilla SDL 2.0 renderer is that it also supports Direct3D which in my experience has better support in some drivers (I used to get
crashes when changing display modes under OpenGL/SDL). That of course might just be because of my limited experience. One thing that might be quite easy to improve about the SDL_Renderer is the SDL_RenderFillRects() routine because it seems all implementations are just
doing the same as RenderFillRect() many times over and missing the change tobuild a bigger VBO that contains all the rectangles. And it's not hard toguess I mentioned this because it would be a good base for a similar function
that takes two more parameters: source rects and the texture. Smile

Then it wouldbe up to the user to build the rectangle lists and keep track of state changes.-Tero>This general idea has been discussed and it is good. It does take a bit of

Quote:

work, though, as SDL would have to take care to flush the VBO whenever a
state change is issued.

The SDL rendering subsystem and API is very good for porting old projects,
but if you really need more performance or flexibility at the moment,
either look to OpenGL directly or SDL_gpu, which wraps OpenGL in a 2D API
with this optimization already implemented.

SDL_Renderer: Reducing the number of render calls

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Fri Aug 08, 2014 4:21 pm

2014-08-08 13:09 GMT-03:00, Tero Lindeman:

Quote:

One thing that might be quite easy to improve about the SDL_Renderer is the
SDL_RenderFillRects() routine because it seems all implementations are just
doing the same as RenderFillRect() many times over and missing the change
to
build a bigger VBO that contains all the rectangles. And it's not hard to
guess I mentioned this because it would be a good base for a similar
function
that takes two more parameters: source rects and the texture. Smile

Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

Good point, though honestly I doubt it's commonly used, it's likely
most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There's also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you're rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You'll still need a large
amount of blits to actually cause slow down, but even then.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL_Renderer: Reducing the number of render calls

kometbomb

Joined: 17 Mar 2010

Posts: 8

Posted: Fri Aug 08, 2014 5:49 pm

Quote:

The truth indeed is that it is not that bad to have a bunch of calls, at least on a computer that was bought at least after 2004. But I have a tiny suspicion this might be a relevant worry on Android and
other less-powerful platforms. I think a comparable situation for RenderCopyMany vs. the RenderDrawLines routineis a tile-based map engine where you have a screen full of 16x16 rects. Or, as in the case that prompted me to start talking about this, a font
renderer that takes all the characters from one texture filled with characters. So, it probably wouldn't be completely useless to have such a built-in routine, if the lines routine is considered useful.-Tero>Good point, though honestly I doubt it's commonly used, it's likely

Quote:

most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There's also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you're rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You'll still need a large
amount of blits to actually cause slow down, but even then.

SDL_Renderer: Reducing the number of render calls

Jared Maddox

Guest

Posted: Sun Aug 10, 2014 9:32 am

I spent so long looking for the link, I accidentally deleted my workspace file.

Quote:

Date: Fri, 8 Aug 2014 10:10:52 -0400
From: Jonathan Dearborn
To: SDL Development List
Subject: Re: [SDL] SDL_Renderer: Reducing the number of render calls
Message-ID:

Content-Type: text/plain; charset="utf-8"

This general idea has been discussed and it is good. It does take a bit of
work, though, as SDL would have to take care to flush the VBO whenever a
state change is issued.

The SDL rendering subsystem and API is very good for porting old projects,
but if you really need more performance or flexibility at the moment,
either look to OpenGL directly or SDL_gpu, which wraps OpenGL in a 2D API
with this optimization already implemented.

Jonny D

On Fri, Aug 8, 2014 at 1:39 AM, Tero Lindeman
wrote:

Quote:

Hello folks,

I tried to look for discussion about this but could not find any, pardon
me if this has been discussed over an over again. Here goes.

Would it make sense to change the SDL_Renderer so that each RenderCopy()
etc. call would actually add the drawn quad into a vertex buffer object (or
whatever it is that people use in 2014) and then draw the big bunch of
quads whenever RenderPresent() is called or when the used texture changes
between RenderCopy calls and so on (it at least USED to be important to
change the texture as few times as possible)? How the current code does it
looks quite inefficient to me, unless modern hardware and drivers do
something similar behind the scenes. It especially looks funny in the GLES
driver with an absolutely minimal VBO of two triangles.

Generally, my idea works like this this:

1. Frame starts, the quad buffer is zeroed.
2. RenderCopy() with texture 1, added to the buffer
3. Another RenderCopy() with texture 1, added to the buffer
4. RenderCopy with texture 2, the buffer is sent to the GPU and is zeroed
and the new quad is added to the buffer
5 ... more calls
6. RenderPresent() sends the buffer if there's anything there
7. Go to 1

Have there been plans for something like this or is the consensus that if
one needs more performance, OpenGL etc. should be used directly?

-Tero

There have been suggetsions of this before, and I've spent the time to
dig up the most recent instance that I actually remember.
Here's the month:
http://lists.libsdl.org/pipermail/sdl-libsdl.org/2013-April/date.html
And here's the first message, which was honestly not very informative:
http://lists.libsdl.org/pipermail/sdl-libsdl.org/2013-April/653855.html
And here's my suggestion for an algorithm:
http://lists.libsdl.org/pipermail/sdl-libsdl.org/2013-April/088109.html

tl;dr: If you want to buffer, then you probably want to batch, and for
SDL's Renderer API that means that you need to be careful not to
accidentally perform render B to a point when render A was supposed to
be done to that point first. This can be handled by grouping renders
into host nodes, and simply pushing a new node every time that you
have a "rendering collision". For some reason I was thinking that a
tree implementation was needed, so if you want a tree for it I can
provide you with one that I've been needing to write tests for (there
were objections to involving an external library).
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

krux

Joined: 07 Aug 2014

Posts: 5

Posted: Sun Aug 10, 2014 4:31 pm

I would love to have a SDL_RenderCopies that takes an array of source rects an array of target rects, but only one src texture, so that I can build up my own texture atlas. And don't underestimate the performance increase here. On modern hardware there is practically no difference in rendering 200 triangles or 2 triangles when you draw them in one single draw call. So the potential here is at least a factor of 100 if it is done right. And please no tree structured buffering. Don't declare the programmer stuped who can't properly setup batched rendering.

kometbomb

Joined: 17 Mar 2010

Posts: 8

Posted: Mon Aug 11, 2014 8:43 am

Thanks for the links to the earlier discussions.

IMHO a tree-based approach would be overkill compared to a very simple "batch polys until the texture/blending mode changes or the user wants to read pixel data" style batching, considering the worst case performance would be close to what it is now. SDL_RenderCopies with a single texture would be a nice solution that sits between the two extremes. I think I'll experiment with this and post results.

krux

Joined: 07 Aug 2014

Posts: 5

Posted: Tue Aug 12, 2014 2:12 am

kometbomb wrote:

nice to have somebody who is willing to spend some time here. I also spent some time thinking about how this could be done right. If a geometry shader is available and the gpu can to integers, then all vertex creation could be done in the geometry shader, passing only the raw arrays of rectangles. But sadly that's not an option for opengles

SDL_Renderer: Reducing the number of render calls

Jeffrey Carpenter

Guest

Posted: Wed Aug 13, 2014 9:31 pm

Quote:

On Aug 8, 2014, at 11:21, Sik the hedgehog wrote:

2014-08-08 13:09 GMT-03:00, Tero Lindeman:

Quote:

Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

If you use a function that uses SDL_RenderLine for dithered, linear gradient filled backgrounds, and then use it for three (roughly) 320x240 sized widgets (720ish +/- draw calls per frame if I remember right), you can easily start seeing performance issues. A (roughly) ~15..30fps+ diff can be seen.

This doesn't matter much to me on my Macbook (Intel Graphics 3000 with plenty of fps to spare), but certainly matters a great deal on my older single core AMD64 windev box (Geforce 6200), where I've seen fps drop as low as 4fps, and unable to peak greater than 10fps.

I certainly have slight concern with the performance of the code under iOS, but unfortunately, that test won't see the light of day for some time.

Admittedly, I haven't tried very hard at optimizing the function, nor are any of these tests done on a release build, so you might have to take my comment with a grain of salt? :-)

P.S. Sorry for my brevity, I'm emailing from my phone.

Cheers!

Quote:

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL_Renderer: Reducing the number of render calls

Sam Lantinga

Joined: 10 Sep 2009

Posts: 1765

Posted: Thu Aug 14, 2014 5:57 am

Don't use SDL_RenderLine for gradient backgrounds. Instead you should have a pre-filled gradient texture. If you want it dynamically a single color you can make it greyscale and use texture color mod to color it. If you want it multi-colored dynamically you could use a render target and render lines into that and then use that as your gradient. The texture doesn't need to be very big, either.

This will save you a huge amount of performance. The SDL line and point API functions are not designed for massive numbers of calls, just some additional work to fill in gaps between textures or do a little background decoration.

If you need something like a particle system or vector art, you should probably use OpenGL directly.

Cheers!

On Wed, Aug 13, 2014 at 2:31 PM, Jeffrey Carpenter wrote:

Quote:

On Aug 8, 2014, at 11:21, Sik the hedgehog wrote:

2014-08-08 13:09 GMT-03:00, Tero Lindeman:

Quote:

Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

If you use a function that uses SDL_RenderLine for dithered, linear gradient filled backgrounds, and then use it for three (roughly) 320x240 sized widgets (720ish +/- draw calls per frame if I remember right), you can easily start seeing performance issues. A (roughly) Â ~15..30fps+ diff can be seen.

This doesn't matter much to me on my Macbook (Intel Graphics 3000 with plenty of fps to spare), but certainly matters a great deal on my older single core AMD64 windev box (Geforce 6200), where I've seen fps drop as low as 4fps, and unable to peak greater than 10fps.

I certainly have slight concern with the performance of the code under iOS, but unfortunately, that test won't see the light of day for some time.

Admittedly, I haven't tried very hard at optimizing the function, nor are any of these tests done on a release build, so you might have to take my comment with a grain of salt? :-)

P.S. Sorry for my brevity, I'm emailing from my phone.

Cheers!

Quote:

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL_Renderer: Reducing the number of render calls

Jeffrey Carpenter

Guest

Posted: Fri Aug 15, 2014 2:33 am

On 2014/08/ 14, at 0:57, Sam Lantinga wrote:

Quote:

Thanks a lot for the suggestions -- it helps confirm what I had in mind for optimization once the time rolls around for it (I have the luxury of simply ignoring the issue for the time being). It is nice to know that I was thinking in the right direction :-)

During the time that I discovered the performance issue (a few months ago), I experimented with one-time rendering the gradient fill to a rendering target and then rendering from that, and sure enough, the performance improved dramatically for me. At this time, the thought occurred to me that I could probably just
use a regular texture for this.

I hadn't thought of using a greyscale texture -- interesting idea, I might just have to give it a shot... texture color modulation works wonders for bitmap fonts....

Quote:

This will save you a huge amount of performance. The SDL line and point API functions are not designed for massive numbers of calls, just some additional work to fill in gaps between textures or do a little background decoration.

Indeed, the API has been wonderful for my other needs! (mostly as a bits and pieces decorator).

Cheers,
Jeffrey Carpenter

Quote:

On Wed, Aug 13, 2014 at 2:31 PM, Jeffrey Carpenter wrote:

Quote:

On Aug 8, 2014, at 11:21, Sik the hedgehog wrote:

2014-08-08 13:09 GMT-03:00, Tero Lindeman:

Quote:

Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

Quote:

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

The SDL forums have moved to discourse.libsdl.org. This is just a read-only archive of the previous forums, to keep old links working.

The SDL forums have moved to discourse.libsdl.org.
This is just a read-only archive of the previous forums, to keep old links working.