The SDL forums have moved to discourse.libsdl.org.
This is just a read-only archive of the previous forums, to keep old links working.


SDL Forum Index
SDL
Simple Directmedia Layer Forums
SDL_Renderer: Reducing the number of render calls
kometbomb


Joined: 17 Mar 2010
Posts: 8
Hello folks,


I tried to look for discussion about this but could not find any, pardon me if this has been discussed over an over again. Here goes.

Would it make sense to change the SDL_Renderer so that each RenderCopy()  etc. call would actually add the drawn quad into a vertex buffer object (or whatever it is that people use in 2014) and then draw the big bunch of quads whenever RenderPresent() is called or when the used texture changes between RenderCopy calls and so on (it at least USED to be important to change the texture as few times as possible)? How the current code does it looks quite inefficient to me, unless modern hardware and drivers do something similar behind the scenes. It especially looks funny in the GLES driver with an absolutely minimal VBO of two triangles.


Generally, my idea works like this this:


1. Frame starts, the quad buffer is zeroed.

2. RenderCopy() with texture 1, added to the buffer
3. Another RenderCopy() with texture 1, added to the buffer

4. RenderCopy with texture 2, the buffer is sent to the GPU and is zeroed and the new quad is added to the buffer

5 ... more calls

6. RenderPresent() sends the buffer if there's anything there

7. Go to 1


Have there been plans for something like this or is the consensus that if one needs more performance, OpenGL etc. should be used directly?


-Tero
SDL_Renderer: Reducing the number of render calls
Jonny D


Joined: 12 Sep 2009
Posts: 932
This general idea has been discussed and it is good.  It does take a bit of work, though, as SDL would have to take care to flush the VBO whenever a state change is issued.

The SDL rendering subsystem and API is very good for porting old projects, but if you really need more performance or flexibility at the moment, either look to OpenGL directly or SDL_gpu, which wraps OpenGL in a 2D API with this optimization already implemented.


Jonny D







On Fri, Aug 8, 2014 at 1:39 AM, Tero Lindeman wrote:
Quote:
Hello folks,


I tried to look for discussion about this but could not find any, pardon me if this has been discussed over an over again. Here goes.

Would it make sense to change the SDL_Renderer so that each RenderCopy()  etc. call would actually add the drawn quad into a vertex buffer object (or whatever it is that people use in 2014) and then draw the big bunch of quads whenever RenderPresent() is called or when the used texture changes between RenderCopy calls and so on (it at least USED to be important to change the texture as few times as possible)? How the current code does it looks quite inefficient to me, unless modern hardware and drivers do something similar behind the scenes. It especially looks funny in the GLES driver with an absolutely minimal VBO of two triangles.


Generally, my idea works like this this:


1. Frame starts, the quad buffer is zeroed.

2. RenderCopy() with texture 1, added to the buffer
3. Another RenderCopy() with texture 1, added to the buffer

4. RenderCopy with texture 2, the buffer is sent to the GPU and is zeroed and the new quad is added to the buffer

5 ... more calls

6. RenderPresent() sends the buffer if there's anything there

7. Go to 1


Have there been plans for something like this or is the consensus that if one needs more performance, OpenGL etc. should be used directly?


-Tero


_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL_Renderer: Reducing the number of render calls
kometbomb


Joined: 17 Mar 2010
Posts: 8
Quote:
OK, thank you for the tip, SDL_gpu seems promising. Though, what I reallylike about the vanilla SDL 2.0 renderer is that it also supports Direct3D which in my experience has better support in some drivers (I used to get
crashes when changing display modes under OpenGL/SDL). That of course might just be because of my limited experience. One thing that might be quite easy to improve about the SDL_Renderer is the SDL_RenderFillRects() routine because it seems all implementations are just
doing the same as RenderFillRect() many times over and missing the change tobuild a bigger VBO that contains all the rectangles. And it's not hard toguess I mentioned this because it would be a good base for a similar function
that takes two more parameters: source rects and the texture. Smile Then it wouldbe up to the user to build the rectangle lists and keep track of state changes.-Tero>This general idea has been discussed and it is good. It does take a bit of
Quote:
work, though, as SDL would have to take care to flush the VBO whenever a
state change is issued.

The SDL rendering subsystem and API is very good for porting old projects,
but if you really need more performance or flexibility at the moment,
either look to OpenGL directly or SDL_gpu, which wraps OpenGL in a 2D API
with this optimization already implemented.

SDL_Renderer: Reducing the number of render calls
Sik


Joined: 26 Nov 2011
Posts: 905
2014-08-08 13:09 GMT-03:00, Tero Lindeman:
Quote:
One thing that might be quite easy to improve about the SDL_Renderer is the
SDL_RenderFillRects() routine because it seems all implementations are just
doing the same as RenderFillRect() many times over and missing the change
to
build a bigger VBO that contains all the rectangles. And it's not hard to
guess I mentioned this because it would be a good base for a similar
function
that takes two more parameters: source rects and the texture. Smile Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

Good point, though honestly I doubt it's commonly used, it's likely
most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There's also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you're rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You'll still need a large
amount of blits to actually cause slow down, but even then.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
SDL_Renderer: Reducing the number of render calls
kometbomb


Joined: 17 Mar 2010
Posts: 8
Quote:
The truth indeed is that it is not that bad to have a bunch of calls, at least on a computer that was bought at least after 2004. But I have a tiny suspicion this might be a relevant worry on Android and
other less-powerful platforms. I think a comparable situation for RenderCopyMany vs. the RenderDrawLines routineis a tile-based map engine where you have a screen full of 16x16 rects. Or, as in the case that prompted me to start talking about this, a font
renderer that takes all the characters from one texture filled with characters. So, it probably wouldn't be completely useless to have such a built-in routine, if the lines routine is considered useful.-Tero>Good point, though honestly I doubt it's commonly used, it's likely
Quote:
most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There's also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you're rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You'll still need a large
amount of blits to actually cause slow down, but even then.

SDL_Renderer: Reducing the number of render calls
Jared Maddox
Guest

I spent so long looking for the link, I accidentally deleted my workspace file.


Quote:
Date: Fri, 8 Aug 2014 10:10:52 -0400
From: Jonathan Dearborn
To: SDL Development List
Subject: Re: [SDL] SDL_Renderer: Reducing the number of render calls
Message-ID:

Content-Type: text/plain; charset="utf-8"

This general idea has been discussed and it is good. It does take a bit of
work, though, as SDL would have to take care to flush the VBO whenever a
state change is issued.

The SDL rendering subsystem and API is very good for porting old projects,
but if you really need more performance or flexibility at the moment,
either look to OpenGL directly or SDL_gpu, which wraps OpenGL in a 2D API
with this optimization already implemented.

Jonny D




On Fri, Aug 8, 2014 at 1:39 AM, Tero Lindeman
wrote:

Quote:
Hello folks,

I tried to look for discussion about this but could not find any, pardon
me if this has been discussed over an over again. Here goes.

Would it make sense to change the SDL_Renderer so that each RenderCopy()
etc. call would actually add the drawn quad into a vertex buffer object (or
whatever it is that people use in 2014) and then draw the big bunch of
quads whenever RenderPresent() is called or when the used texture changes
between RenderCopy calls and so on (it at least USED to be important to
change the texture as few times as possible)? How the current code does it
looks quite inefficient to me, unless modern hardware and drivers do
something similar behind the scenes. It especially looks funny in the GLES
driver with an absolutely minimal VBO of two triangles.

Generally, my idea works like this this:

1. Frame starts, the quad buffer is zeroed.
2. RenderCopy() with texture 1, added to the buffer
3. Another RenderCopy() with texture 1, added to the buffer
4. RenderCopy with texture 2, the buffer is sent to the GPU and is zeroed
and the new quad is added to the buffer
5 ... more calls
6. RenderPresent() sends the buffer if there's anything there
7. Go to 1

Have there been plans for something like this or is the consensus that if
one needs more performance, OpenGL etc. should be used directly?

-Tero


There have been suggetsions of this before, and I've spent the time to
dig up the most recent instance that I actually remember.
Here's the month:
http://lists.libsdl.org/pipermail/sdl-libsdl.org/2013-April/date.html
And here's the first message, which was honestly not very informative:
http://lists.libsdl.org/pipermail/sdl-libsdl.org/2013-April/653855.html
And here's my suggestion for an algorithm:
http://lists.libsdl.org/pipermail/sdl-libsdl.org/2013-April/088109.html

tl;dr: If you want to buffer, then you probably want to batch, and for
SDL's Renderer API that means that you need to be careful not to
accidentally perform render B to a point when render A was supposed to
be done to that point first. This can be handled by grouping renders
into host nodes, and simply pushing a new node every time that you
have a "rendering collision". For some reason I was thinking that a
tree implementation was needed, so if you want a tree for it I can
provide you with one that I've been needing to write tests for (there
were objections to involving an external library).
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
krux


Joined: 07 Aug 2014
Posts: 5
I would love to have a SDL_RenderCopies that takes an array of source rects an array of target rects, but only one src texture, so that I can build up my own texture atlas. And don't underestimate the performance increase here. On modern hardware there is practically no difference in rendering 200 triangles or 2 triangles when you draw them in one single draw call. So the potential here is at least a factor of 100 if it is done right. And please no tree structured buffering. Don't declare the programmer stuped who can't properly setup batched rendering.
kometbomb


Joined: 17 Mar 2010
Posts: 8
Thanks for the links to the earlier discussions.

IMHO a tree-based approach would be overkill compared to a very simple "batch polys until the texture/blending mode changes or the user wants to read pixel data" style batching, considering the worst case performance would be close to what it is now. SDL_RenderCopies with a single texture would be a nice solution that sits between the two extremes. I think I'll experiment with this and post results.
krux


Joined: 07 Aug 2014
Posts: 5
kometbomb wrote:
Thanks for the links to the earlier discussions.

IMHO a tree-based approach would be overkill compared to a very simple "batch polys until the texture/blending mode changes or the user wants to read pixel data" style batching, considering the worst case performance would be close to what it is now. SDL_RenderCopies with a single texture would be a nice solution that sits between the two extremes. I think I'll experiment with this and post results.


nice to have somebody who is willing to spend some time here. I also spent some time thinking about how this could be done right. If a geometry shader is available and the gpu can to integers, then all vertex creation could be done in the geometry shader, passing only the raw arrays of rectangles. But sadly that's not an option for opengles
SDL_Renderer: Reducing the number of render calls
Jeffrey Carpenter
Guest

Quote:
On Aug 8, 2014, at 11:21, Sik the hedgehog wrote:

2014-08-08 13:09 GMT-03:00, Tero Lindeman:
Quote:
One thing that might be quite easy to improve about the SDL_Renderer is the
SDL_RenderFillRects() routine because it seems all implementations are just
doing the same as RenderFillRect() many times over and missing the change
to
build a bigger VBO that contains all the rectangles. And it's not hard to
guess I mentioned this because it would be a good base for a similar
function
that takes two more parameters: source rects and the texture. Smile Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

Good point, though honestly I doubt it's commonly used, it's likely
most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There's also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you're rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You'll still need a large
amount of blits to actually cause slow down, but even then.

If you use a function that uses SDL_RenderLine for dithered, linear gradient filled backgrounds, and then use it for three (roughly) 320x240 sized widgets (720ish +/- draw calls per frame if I remember right), you can easily start seeing performance issues. A (roughly) ~15..30fps+ diff can be seen.

This doesn't matter much to me on my Macbook (Intel Graphics 3000 with plenty of fps to spare), but certainly matters a great deal on my older single core AMD64 windev box (Geforce 6200), where I've seen fps drop as low as 4fps, and unable to peak greater than 10fps.

I certainly have slight concern with the performance of the code under iOS, but unfortunately, that test won't see the light of day for some time.

Admittedly, I haven't tried very hard at optimizing the function, nor are any of these tests done on a release build, so you might have to take my comment with a grain of salt? :-)

P.S. Sorry for my brevity, I'm emailing from my phone.

Cheers!

Quote:
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
SDL_Renderer: Reducing the number of render calls
Sam Lantinga


Joined: 10 Sep 2009
Posts: 1765
Don't use SDL_RenderLine for gradient backgrounds. Instead you should have a pre-filled gradient texture. If you want it dynamically a single color you can make it greyscale and use texture color mod to color it. If you want it multi-colored dynamically you could use a render target and render lines into that and then use that as your gradient. The texture doesn't need to be very big, either.

This will save you a huge amount of performance. The SDL line and point API functions are not designed for massive numbers of calls, just some additional work to fill in gaps between textures or do a little background decoration.


If you need something like a particle system or vector art, you should probably use OpenGL directly.


Cheers!



On Wed, Aug 13, 2014 at 2:31 PM, Jeffrey Carpenter wrote:
Quote:

Quote:
On Aug 8, 2014, at 11:21, Sik the hedgehog wrote:

2014-08-08 13:09 GMT-03:00, Tero Lindeman:
Quote:
One thing that might be quite easy to improve about the SDL_Renderer is the
SDL_RenderFillRects() routine because it seems all implementations are just
doing the same as RenderFillRect() many times over and missing the change
to
build a bigger VBO that contains all the rectangles. And it's not hard to
guess I mentioned this because it would be a good base for a similar
function
that takes two more parameters: source rects and the texture. Smile Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

Good point, though honestly I doubt it's commonly used, it's likely
most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There's also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you're rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You'll still need a large
amount of blits to actually cause slow down, but even then.



If you use a function that uses SDL_RenderLine for dithered, linear gradient filled backgrounds, and then use it for three (roughly) 320x240 sized widgets (720ish +/- draw calls per frame if I remember right), you can easily start seeing performance issues. A (roughly)  ~15..30fps+ diff can be seen.

This doesn't matter much to me on my Macbook (Intel Graphics 3000 with plenty of fps to spare), but certainly matters a great deal on my older single core AMD64 windev box (Geforce 6200), where I've seen fps drop as low as 4fps, and unable to peak greater than 10fps.

I certainly have slight concern with the performance of the code under iOS, but unfortunately, that test won't see the light of day for some time.

Admittedly, I haven't tried very hard at optimizing the function, nor are any of these tests done on a release build, so you might have to take my comment with a grain of salt? :-)

P.S. Sorry for my brevity, I'm emailing from my phone.

Cheers!

Quote:
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL_Renderer: Reducing the number of render calls
Jeffrey Carpenter
Guest

On 2014/08/ 14, at 0:57, Sam Lantinga wrote:

Quote:
Don't use SDL_RenderLine for gradient backgrounds. Instead you should have a pre-filled gradient texture. If you want it dynamically a single color you can make it greyscale and use texture color mod to color it. If you want it multi-colored dynamically you could use a render target and render lines into that and then use that as your gradient. The texture doesn't need to be very big, either.


Thanks a lot for the suggestions -- it helps confirm what I had in mind for optimization once the time rolls around for it (I have the luxury of simply ignoring the issue for the time being). It is nice to know that I was thinking in the right direction :-)

During the time that I discovered the performance issue (a few months ago), I experimented with one-time rendering the gradient fill to a rendering target and then rendering from that, and sure enough, the performance improved dramatically for me. At this time, the thought occurred to me that I could probably just
use a regular texture for this.

I hadn't thought of using a greyscale texture -- interesting idea, I might just have to give it a shot... texture color modulation works wonders for bitmap fonts....

Quote:
This will save you a huge amount of performance. The SDL line and point API functions are not designed for massive numbers of calls, just some additional work to fill in gaps between textures or do a little background decoration.

Indeed, the API has been wonderful for my other needs! (mostly as a bits and pieces decorator).

Cheers,
Jeffrey Carpenter


Quote:


On Wed, Aug 13, 2014 at 2:31 PM, Jeffrey Carpenter wrote:

Quote:
On Aug 8, 2014, at 11:21, Sik the hedgehog wrote:

2014-08-08 13:09 GMT-03:00, Tero Lindeman:
Quote:
One thing that might be quite easy to improve about the SDL_Renderer is the
SDL_RenderFillRects() routine because it seems all implementations are just
doing the same as RenderFillRect() many times over and missing the change
to
build a bigger VBO that contains all the rectangles. And it's not hard to
guess I mentioned this because it would be a good base for a similar
function
that takes two more parameters: source rects and the texture. Smile Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

Good point, though honestly I doubt it's commonly used, it's likely
most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There's also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you're rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You'll still need a large
amount of blits to actually cause slow down, but even then.

If you use a function that uses SDL_RenderLine for dithered, linear gradient filled backgrounds, and then use it for three (roughly) 320x240 sized widgets (720ish +/- draw calls per frame if I remember right), you can easily start seeing performance issues. A (roughly) ~15..30fps+ diff can be seen.

This doesn't matter much to me on my Macbook (Intel Graphics 3000 with plenty of fps to spare), but certainly matters a great deal on my older single core AMD64 windev box (Geforce 6200), where I've seen fps drop as low as 4fps, and unable to peak greater than 10fps.

I certainly have slight concern with the performance of the code under iOS, but unfortunately, that test won't see the light of day for some time.

Admittedly, I haven't tried very hard at optimizing the function, nor are any of these tests done on a release build, so you might have to take my comment with a grain of salt? :-)

P.S. Sorry for my brevity, I'm emailing from my phone.

Cheers!

Quote:
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org