SDL :: View topic - External dependencies in the renderer?

SDL
Simple Directmedia Layer Forums

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Mon Apr 15, 2013 6:46 pm

Does anyone (particularly Sam and Ryan) have any objections to pulling an external library into SDL? Because I have an idea that could significantly improve the performance of SDL's 3d-accelerated rendering, but it would require a multimap. Neither SDL nor the C standard library has a multimap implementation, but I could build one with uthash and utarray, which are both fairly small and BSD-licensed.

Mason

External dependencies in the renderer?

Ryan C. Gordon

Guest

Posted: Tue Apr 16, 2013 1:20 am

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated rendering,
but it would require a multimap. Neither SDL nor the C standard library
has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/uthash/>, which are both fairly
small and BSD-licensed.

External dependencies in the renderer?

Jonathan Greig

Guest

Posted: Tue Apr 16, 2013 1:44 am

Ryan,
Can you elaborate on the reason why uthash is not attractive to you? Just wondering since I was looking at it possibly using it recently for the Embroidermodder 2 project. I came across it after looking at some hash benchmarks and the license is appealing. It's a single header so if the interface isn't to your liking, making a small wrapper around it should be fairly straight forward. Have you or Sam done any work on an SDL hash implementation?
- Swyped from my droid.

Quote:

On Apr 15, 2013 8:20 PM, "Ryan C. Gordon" wrote:

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library int... utarray <http://troydhanson.github.io/uthash/>, which are both fairly

small and BSD-licensed.

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Tue Apr 16, 2013 1:47 am

I think the problem is the fact it's an extra dependency.

That said, I'm not very fond of its use of macros at all :S (I guess
this is one place where C++ wins by far, templates would make this
trivial) I wonder if that's an issue too.

2013/4/15, Jonathan Greig:

Quote:

Ryan,
Can you elaborate on the reason why uthash is not attractive to you? Just
wondering since I was looking at it possibly using it recently for the
Embroidermodder 2 project. I came across it after looking at some hash
benchmarks and the license is appealing. It's a single header so if the
interface isn't to your liking, making a small wrapper around it should be
fairly straight forward. Have you or Sam done any work on an SDL hash
implementation?

- Swyped from my droid.

On Apr 15, 2013 8:20 PM, "Ryan C. Gordon" wrote:

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library int...

utarray
<http://troydhanson.github.io/**uthash/<http://troydhanson.github.io/uthash/>>,
which are both fairly
small and BSD-licensed.

I'd rather we have a simple hashtable implementation in SDL.

What's the plan?

--ryan.

______________________________**_________________
SDL mailing list

http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org<http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org>

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Jonathan Greig

Guest

Posted: Tue Apr 16, 2013 1:59 am

Sik,
I completely understand about the extra dependency issue, although with it being a single header, it should be hardly a problem shipping it with the SDL sources. At least that's the way I look at it.
I don't particularly care for macros either so maybe that could be part of it too.
- Swyped from my droid.

Quote:

On Apr 15, 2013 8:47 PM, "Sik the hedgehog" wrote:

I think the problem is the fact it's an extra dependency.

That said, I'm not very fond of its use of macros at all :S (I guess
this is one place where C++ wins by far, templates would make this
trivial) I wonder if that's an issue too.

2013/4/15, Jonathan Greig:

Quote:

Ryan,
Can you elaborate on the reason why uthash is not attractive to you? Just
wondering sinc...>> <[url=http://troydhanson.github.io/**uthash/]http://troydhanson.github.io/**uthash/[/url]<http://troydhanson.github.io/uthash/>>,

Quote:

which are both fairly
small and BSD-licensed.

I'd rather we have a simple hashtable imp...> ______________________________**_________________
SDL mailing list

[url=http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org]http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org[/url]<http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org>

_______________________________________________
SDL mailing list

http://lists...

External dependencies in the renderer?

John

Guest

Posted: Tue Apr 16, 2013 2:03 am

What is the optimization?

On 04/15/2013 02:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling an
external library into SDL? Because I have an idea that could significantly
improve the performance of SDL's 3d-accelerated rendering, but it would require
a multimap. Neither SDL nor the C standard library has a multimap
implementation, but I could build one with uthash and utarray
<http://troydhanson.github.io/uthash/>, which are both fairly small and
BSD-licensed.

Mason

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Andreas Schiffler

Guest

Posted: Tue Apr 16, 2013 2:08 am

Same gut feel here - seems reasonable to extend SDL functionality via a copy-and-add of the single uthash.h file as the uthash license allows redistribution in source form. Judging from its test coverage, the code seems reasonably stable so that SDL maintainers would not have to expect a lot of future updates in the SDL source tree of this file either.

In my view it really comes down what the user benefit would actually be over a custom implementation inside a SDL based App (if possible).

On 4/15/2013 6:59 PM, Jonathan Greig wrote:

Quote:

Ryan,
Can you elaborate on the reason why uthash is not attractive to you? Just
wondering sinc... >> <[url=http://troydhanson.github.io/**uthash/]http://troydhanson.github.io/**uthash/[/url]<http://troydhanson.github.io/uthash/>>,

Quote:

which are both fairly
small and BSD-licensed.

I'd rather we have a simple hashtable imp... > ______________________________**_________________
SDL mailing list

[url=http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org]http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org[/url]<http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org>

_______________________________________________
SDL mailing list

http://lists...

Quote:

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Tue Apr 16, 2013 2:41 am

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly. It does everything in Immediate Mode, which modern versions of OpenGL and Direct3D have moved away from because it's so slow. GLES doesn't even support Immediate Mode, so if you look at SDL's GLES renderer, it does the closest thing it can find to Immediate Mode, sending one call to OpenGL every time someone calls SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a minimum, and pass as much data as possible all at once in an array. Of course, that's not the way people use SDL; they use SDL to draw a bunch of sprites, one at a time. So to be fast, SDL has to keep track of the bookkeeping for them.

The way to do this is with a multimap, mapping textures to lists of drawing coordinates. You turn SDL_RenderCopy into an operation that adds a pair of rects to a texture's mapped list, and SDL_RenderPresent into an operation that iterates over the multimap and for each texture, builds two arrays of vertices (one for screen coordinates and one for texture coordinates) as buffers and passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering significantly, about 3x faster than stock SDL rendering. With a multimap in C, I could port this concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL doesn't, is Z-order. If you're no longer deterministically drawing in the order in which draw calls are received, but instead grouping them by texture, which are in turn sorted by hash order (essentially random,) you need a Z-order parameter to make sure the right things draw on top of the right things, and what you end up with is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred lines of code (plus the implementations of the hash and the dynamic array, because C doesn't have them built in) and it makes rendering *much* faster.

Mason

From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated rendering,
but it would require a multimap. Neither SDL nor the C standard library
has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/uthash/>, which are both fairly
small and BSD-licensed.

External dependencies in the renderer?

Scott Percival

Guest

Posted: Tue Apr 16, 2013 2:54 am

Looking at the GLES2 renderer (which is probably the cleanest implementation we've got right now), isn't there an expectation with the design of the API that the RenderCopy operation is carried out immediately? So in theory, couldn't someone could call RenderCopy with an SDL_Texture * and two rects, then mess with the contents of the texture, then call RenderPresent?

On 16 April 2013 10:41, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated rendering,
but it would require a multimap. Neither SDL nor the C standard library
has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/uthash/>, which are both fairly
small and BSD-licensed.

I'd rather we have a simple hashtable implementation in SDL.

What's the plan?

--ryan.

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Ryan C. Gordon

Guest

Posted: Tue Apr 16, 2013 2:55 am

Quote:

Can you elaborate on the reason why uthash is not attractive to you?

I haven't even clicked on the link, so I can't say anything about
uthash. As an external piece of code, I'm hesitant to add it to SDL,
since that has caused annoyances in the past, unless there was a really
good reason.

(Doubly-so for a hashtable. I mean, a hashtable? Do we really need to
scour the internet for a hashtable?)

I imagine it's probably a fine piece of code in itself, though.

--ryan.

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Tue Apr 16, 2013 3:01 am

Um, is a hashtable needed for this idea as opposed to a regular array?
I mean, you're literally just adding entries to a queue, you don't
even need to retrieve them back. As for the Z order, just assign an
unique Z to each entry and be done with it. Sure, you may run out of
range, but at that point you probably have queued up enough primitives
to be worth flushing the batch.

Also yeah, I wonder about the textures too, although I guess you can
always force a flush in that case.

2013/4/15, Ryan C. Gordon:

Quote:

Can you elaborate on the reason why uthash is not attractive to you?

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Tue Apr 16, 2013 3:09 am

A hashtable is needed because this is *not* just a queue. To get good performance out of it, it has to be grouped by texture. The idea is that you select each texture once, and perform all of the drawing for it all at once. What we have now is just a queue, and it's horribly slow. On a complicated scene, it's the difference between a few dozen API calls, or a few tens of thousands of them. (Yes, I have rendered scenes that involved with SDL.)

Mason

From: Sik the hedgehog
To: SDL Development List
Sent: Monday, April 15, 2013 8:01 PM
Subject: Re: [SDL] External dependencies in the renderer?

Um, is a hashtable needed for this idea as opposed to a regular array?
I mean, you're literally just adding entries to a queue, you don't
even need to retrieve them back. As for the Z order, just assign an
unique Z to each entry and be done with it. Sure, you may run out of
range, but at that point you probably have queued up enough primitives
to be worth flushing the batch.

Also yeah, I wonder about the textures too, although I guess you can
always force a flush in that case.

2013/4/15, Ryan C. Gordon:

Quote:

Can you elaborate on the reason why uthash is not attractive to you?

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

John

Guest

Posted: Tue Apr 16, 2013 3:36 am

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed primitive.
The implementation is effectively an "intermediate mode" layer unto itself. The
layer is a massive `todo` buffer with three phases: queue everything, analyze
(re-order) the queue, then execute the queue as a batch. If you don't choose the
batch size wisely, it's possible to lose any parallelism that you might have had
when GL calls were mixed in with scene graph calls. The second challenge is to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly. It does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D have
moved away from because it's so slow. GLES doesn't even support Immediate Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can find to
Immediate Mode, sending one call to OpenGL every time someone calls SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of course,
that's not the way people use SDL; they use SDL to draw a bunch of sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for them.

The way to do this is with a multimap, mapping textures to lists of drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair of
rects to a texture's mapped list, and SDL_RenderPresent into an operation that
iterates over the multimap and for each texture, builds two arrays of vertices
(one for screen coordinates and one for texture coordinates) as buffers and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering significantly, about
3x faster than stock SDL rendering. With a multimap in C, I could port this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL doesn't, is
Z-order. If you're no longer deterministically drawing in the order in which
draw calls are received, but instead grouping them by texture, which are in turn
sorted by hash order (essentially random,) you need a Z-order parameter to make
sure the right things draw on top of the right things, and what you end up with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred lines of
code (plus the implementations of the hash and the dynamic array, because C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated rendering,
but it would require a multimap. Neither SDL nor the C standard library
has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/uthash/>, which are both fairly
small and BSD-licensed.

I'd rather we have a simple hashtable implementation in SDL.

What's the plan?

--ryan.

_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

gabomdq

Joined: 28 Jul 2011

Posts: 495

Location: Argentina

Posted: Tue Apr 16, 2013 3:45 am

I'm beating my own drum by saying this, but the SDL_RenderGeometry function I made may be a better compromise to enhance rendering speed, assuming the task at hand implies rendering multiple parts of the same texture. If you are rendering a low number of quads out of each texture, it'll probably give you the same performance as regular SDL_RenderCopy (it has no need for a hash table though).

Anyway, it'll probably come down to the same sort of arguments we saw before, and the "why don't you do it in OpenGL" will eventually pop up Smile

2013/4/16 John

Quote:

Ok, so the optimization assumes that a rendering bottleneck is the cost of switching textures, and intends to minimize the number texture switches by delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The biggest challenge is you need to cache the entire GL state for each delayed primitive. The implementation is effectively an "intermediate mode" layer unto itself. The layer is a massive `todo` buffer with three phases: queue everything, analyze (re-order) the queue, then execute the queue as a batch. If you don't choose the batch size wisely, it's possible to lose any parallelism that you might have had when GL calls were mixed in with scene graph calls. The second challenge is to support transparency and other effects that depend on multiple passes in a specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly. It does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D have
moved away from because it's so slow. GLES doesn't even support Immediate Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can find to
Immediate Mode, sending one call to OpenGL every time someone calls SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of course,
that's not the way people use SDL; they use SDL to draw a bunch of sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for them.

The way to do this is with a multimap, mapping textures to lists of drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair of
rects to a texture's mapped list, and SDL_RenderPresent into an operation that
iterates over the multimap and for each texture, builds two arrays of vertices
(one for screen coordinates and one for texture coordinates) as buffers and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering significantly, about
3x faster than stock SDL rendering. With a multimap in C, I could port this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL doesn't, is
Z-order. If you're no longer deterministically drawing in the order in which
draw calls are received, but instead grouping them by texture, which are in turn
sorted by hash order (essentially random,) you need a Z-order parameter to make
sure the right things draw on top of the right things, and what you end up with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred lines of
code (plus the implementations of the hash and the dynamic array, because C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:
> Does anyone (particularly Sam and Ryan) have any objections to pulling
> an external library into SDL? Because I have an idea that could
> significantly improve the performance of SDL's 3d-accelerated rendering,
> but it would require a multimap. Neither SDL nor the C standard library
> has a multimap implementation, but I could build one with uthash and
> utarray <http://troydhanson.github.io/uthash/>, which are both fairly
> small and BSD-licensed.

I'd rather we have a simple hashtable implementation in SDL.

What's the plan?

--ryan.

_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

--
Gabriel.

External dependencies in the renderer?

Scott Percival

Guest

Posted: Tue Apr 16, 2013 3:46 am

Blimey, forgot about transparency. John's right, if you start including semitransparent objects into your queue, then you can't just throw them in the texture-centric batch and let the depth test sort them out; you'd have to run a separate pass afterwards in sequential painting order.

On 16 April 2013 11:36, John wrote:

Quote:

Ok, so the optimization assumes that a rendering bottleneck is the cost of switching textures, and intends to minimize the number texture switches by delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The biggest challenge is you need to cache the entire GL state for each delayed primitive. The implementation is effectively an "intermediate mode" layer unto itself. The layer is a massive `todo` buffer with three phases: queue everything, analyze (re-order) the queue, then execute the queue as a batch. If you don't choose the batch size wisely, it's possible to lose any parallelism that you might have had when GL calls were mixed in with scene graph calls. The second challenge is to support transparency and other effects that depend on multiple passes in a specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly. It does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D have
moved away from because it's so slow. GLES doesn't even support Immediate Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can find to
Immediate Mode, sending one call to OpenGL every time someone calls SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of course,
that's not the way people use SDL; they use SDL to draw a bunch of sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for them.

The way to do this is with a multimap, mapping textures to lists of drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair of
rects to a texture's mapped list, and SDL_RenderPresent into an operation that
iterates over the multimap and for each texture, builds two arrays of vertices
(one for screen coordinates and one for texture coordinates) as buffers and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering significantly, about
3x faster than stock SDL rendering. With a multimap in C, I could port this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL doesn't, is
Z-order. If you're no longer deterministically drawing in the order in which
draw calls are received, but instead grouping them by texture, which are in turn
sorted by hash order (essentially random,) you need a Z-order parameter to make
sure the right things draw on top of the right things, and what you end up with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred lines of
code (plus the implementations of the hash and the dynamic array, because C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:
> Does anyone (particularly Sam and Ryan) have any objections to pulling
> an external library into SDL? Because I have an idea that could
> significantly improve the performance of SDL's 3d-accelerated rendering,
> but it would require a multimap. Neither SDL nor the C standard library
> has a multimap implementation, but I could build one with uthash and
> utarray <http://troydhanson.github.io/uthash/>, which are both fairly
> small and BSD-licensed.

I'd rather we have a simple hashtable implementation in SDL.

What's the plan?

--ryan.

_______________________________________________
SDL mailing list

<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Tue Apr 16, 2013 4:48 am

Another thing is that scenes that complex will most likely have many
textures anyway which is bound to completely negate the advantage. And
yeah, considering the SDL renderer would be most likely used to render
sprites in 2D, proper transparency support is pretty much a must (even
if you don't draw "proper" translucent stuff you may be bound to be
doing it with antialiased borders).

Coming to think on it, this also means sprites *must* be rendered in
order, otherwise the depth buffer will completely screw up the
transparency. Given the order of the primitives is completely up to
the GPU, there isn't much that can be done short of multiple calls.

2013/4/16, Scott Percival:

Quote:

Blimey, forgot about transparency. John's right, if you start including
semitransparent objects into your queue, then you can't just throw them in
the texture-centric batch and let the depth test sort them out; you'd have
to run a separate pass afterwards in sequential painting order.

On 16 April 2013 11:36, John wrote:

Quote:

Ok, so the optimization assumes that a rendering bottleneck is the cost
of
switching textures, and intends to minimize the number texture switches
by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive. The implementation is effectively an "intermediate mode" layer
unto itself. The layer is a massive `todo` buffer with three phases:
queue
everything, analyze (re-order) the queue, then execute the queue as a
batch. If you don't choose the batch size wisely, it's possible to lose
any
parallelism that you might have had when GL calls were mixed in with
scene
graph calls. The second challenge is to support transparency and other
effects that depend on multiple passes in a specific order, or that play
games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly.
It does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D have
moved away from because it's so slow. GLES doesn't even support
Immediate Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture's mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter
to make
sure the right things draw on top of the right things, and what you end
up with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn't have them built in) and it makes rendering *much* faster.

Mason

------------------------------**------------------------------**
--------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to

pulling

Quote:

an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash and
utarray

<http://troydhanson.github.io/**uthash/<http://troydhanson.github.io/uthash/>>,
which are both fairly

Quote:

small and BSD-licensed.

I'd rather we have a simple hashtable implementation in SDL.

What's the plan?

--ryan.

______________________________**_________________
SDL mailing list
<mailto:
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org<http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org>

______________________________**_________________
SDL mailing list

http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org<http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org>

______________________________**_________________

SDL mailing list

http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org<http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org>

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Tue Apr 16, 2013 4:58 am

Yeah. That's what the Z order is there for.

From: Sik the hedgehog
To: SDL Development List
Sent: Monday, April 15, 2013 9:48 PM
Subject: Re: [SDL] External dependencies in the renderer?

Another thing is that scenes that complex will most likely have many
textures anyway which is bound to completely negate the advantage. And
yeah, considering the SDL renderer would be most likely used to render
sprites in 2D, proper transparency support is pretty much a must (even
if you don't draw "proper" translucent stuff you may be bound to be
doing it with antialiased borders).

Coming to think on it, this also means sprites *must* be rendered in
order, otherwise the depth buffer will completely screw up the
transparency. Given the order of the primitives is completely up to
the GPU, there isn't much that can be done short of multiple calls.

2013/4/16, Scott Percival:

Quote:

Blimey, forgot about transparency. John's right, if you start including
semitransparent objects into your queue, then you can't just throw them in
the texture-centric batch and let the depth test sort them out; you'd have
to run a separate pass afterwards in sequential painting order.

On 16 April 2013 11:36, John wrote:

Quote:

Ok, so the optimization assumes that a rendering bottleneck is the cost
of
switching textures, and intends to minimize the number texture switches
by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive. The implementation is effectively an "intermediate mode" layer
unto itself. The layer is a massive `todo` buffer with three phases:
queue
everything, analyze (re-order) the queue, then execute the queue as a
batch. If you don't choose the batch size wisely, it's possible to lose
any
parallelism that you might have had when GL calls were mixed in with
scene
graph calls. The second challenge is to support transparency and other
effects that depend on multiple passes in a specific order, or that play
games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly.
It does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D have
moved away from because it's so slow. GLES doesn't even support
Immediate Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture's mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter
to make
sure the right things draw on top of the right things, and what you end
up with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn't have them built in) and it makes rendering *much* faster.

Mason

------------------------------**------------------------------**
--------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to

pulling

Quote:

an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash and
utarray

<[url=http://troydhanson.github.io/**uthash/]http://troydhanson.github.io/**uthash/[/url]<http://troydhanson.github.io/uthash/>>,
which are both fairly

Quote:

small and BSD-licensed.

I'd rather we have a simple hashtable implementation in SDL.

What's the plan?

--ryan.

______________________________**_________________
SDL mailing list
<mailto:
[url=http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org]http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org[/url]<http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org>

______________________________**_________________
SDL mailing list

[url=http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org]http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org[/url]<http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org>

______________________________**_________________

SDL mailing list

[url=http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org]http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org[/url]<http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org>

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Tue Apr 16, 2013 5:03 am

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of *drawing calls* by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL state.
All you need to keep cached is the map of textures to arrays of coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order. I've
been using this for a while now. The system works.

From: John
To:
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed primitive.
The implementation is effectively an "intermediate mode" layer unto itself. The
layer is a massive `todo` buffer with three phases: queue everything, analyze
(re-order) the queue, then execute the queue as a batch. If you don't choose the
batch size wisely, it's possible to lose any parallelism that you might have had
when GL calls were mixed in with scene graph calls. The second challenge is to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly. It does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D have
moved away from because it's so slow. GLES doesn't even support Immediate Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can find to
Immediate Mode, sending one call to OpenGL every time someone calls SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of course,
that's not the way people use SDL; they use SDL to draw a bunch of sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for them.

The way to do this is with a multimap, mapping textures to lists of drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair of
rects to a texture's mapped list, and SDL_RenderPresent into an operation that
iterates over the multimap and for each texture, builds two arrays of vertices
(one for screen coordinates and one for texture coordinates) as buffers and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering significantly, about
3x faster than stock SDL rendering. With a multimap in C, I could port this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL doesn't, is
Z-order. If you're no longer deterministically drawing in the order in which
draw calls are received, but instead grouping them by texture, which are in turn
sorted by hash order (essentially random,) you need a Z-order parameter to make
sure the right things draw on top of the right things, and what you end up with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred lines of
code (plus the implementations of the hash and the dynamic array, because C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated rendering,
but it would require a multimap. Neither SDL nor the C standard library
has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/uthash/>, which are both fairly
small and BSD-licensed.

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Tue Apr 16, 2013 5:09 am

Also, if having many textures "is bound to completely negate the advantage,"
why do I see a 3x performance improvement when rendering big, complex
scenes this way?

Don't assume, don't guess; measure. I've measured it, and this way works.

From: Sik the hedgehog
To: SDL Development List
Sent: Monday, April 15, 2013 9:48 PM
Subject: Re: [SDL] External dependencies in the renderer?

Another thing is that scenes that complex will most likely have many
textures anyway which is bound to completely negate the advantage. And
yeah, considering the SDL renderer would be most likely used to render
sprites in 2D, proper transparency support is pretty much a must (even
if you don't draw "proper" translucent stuff you may be bound to be
doing it with antialiased borders).

Coming to think on it, this also means sprites *must* be rendered in
order, otherwise the depth buffer will completely screw up the
transparency. Given the order of the primitives is completely up to
the GPU, there isn't much that can be done short of multiple calls.

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Tue Apr 16, 2013 5:25 am

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler:

Quote:

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of *drawing calls* by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL state.
All you need to keep cached is the map of textures to arrays of coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order. I've
been using this for a while now. The system works.

________________________________
From: John
To:
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an "intermediate mode" layer unto itself.
The
layer is a massive `todo` buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don't choose
the
batch size wisely, it's possible to lose any parallelism that you might have
had
when GL calls were mixed in with scene graph calls. The second challenge is
to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly. It
does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D
have
moved away from because it's so slow. GLES doesn't even support Immediate
Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture's mapped list, and SDL_RenderPresent into an operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter to
make
sure the right things draw on top of the right things, and what you end up
with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array, because
C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/uthash/>, which are both fairly
small and BSD-licensed.

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Forest Hale

Guest

Posted: Tue Apr 16, 2013 6:58 am

Somehow this turned into a scenegraph discussion, to which I recommend this reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn't it kind of implicit in a 2D graphics API that your draw order is sacred? You usually want things to overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing, it only needs to skip issuing state calls that are the same, and some things that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the vertex buffer before drawing any of them, for a considerable savings in driver overhead).

On 04/15/2013 10:25 PM, Sik the hedgehog wrote:

Quote:

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of *drawing calls* by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL state.
All you need to keep cached is the map of textures to arrays of coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order. I've
been using this for a while now. The system works.

________________________________
From: John
To:
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an "intermediate mode" layer unto itself.
The
layer is a massive `todo` buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don't choose
the
batch size wisely, it's possible to lose any parallelism that you might have
had
when GL calls were mixed in with scene graph calls. The second challenge is
to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly. It
does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D
have
moved away from because it's so slow. GLES doesn't even support Immediate
Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture's mapped list, and SDL_RenderPresent into an operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter to
make
sure the right things draw on top of the right things, and what you end up
with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array, because
C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/uthash/>, which are both fairly
small and BSD-licensed.

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Tue Apr 16, 2013 7:30 am

If we want to be blunt, the real issue here isn't switching to a
scenegraph (besides the complexity it may bring - it's debatable
whether it's worth it or just tell users to use OpenGL directly for
those extreme cases) but bringing in an external dependency to SDL...

2013/4/16, Forest Hale:

Quote:

Somehow this turned into a scenegraph discussion, to which I recommend this
reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn't it kind of implicit in a 2D
graphics API that your draw order is sacred? You usually want things to
overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing, it
only needs to skip issuing state calls that are the same, and some things
that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the
vertex buffer before drawing any of them, for a considerable savings in
driver overhead).

On 04/15/2013 10:25 PM, Sik the hedgehog wrote:

Quote:

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of *drawing calls* by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL
state.
All you need to keep cached is the map of textures to arrays of
coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.
I've
been using this for a while now. The system works.

________________________________
From: John
To:
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost
of
switching textures, and intends to minimize the number texture switches
by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The
biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an "intermediate mode" layer unto
itself.
The
layer is a massive `todo` buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don't
choose
the
batch size wisely, it's possible to lose any parallelism that you might
have
had
when GL calls were mixed in with scene graph calls. The second challenge
is
to
support transparency and other effects that depend on multiple passes in
a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly.
It
does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D
have
moved away from because it's so slow. GLES doesn't even support
Immediate
Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to
a
minimum, and pass as much data as possible all at once in an array. Of
course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a
pair
of
rects to a texture's mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which
are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter
to
make
sure the right things draw on top of the right things, and what you end
up
with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to

pulling

Quote:

an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/uthash/>, which are both

fairly

Quote:

small and BSD-licensed.

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Jared Maddox

Guest

Posted: Tue Apr 16, 2013 8:28 am

Quote:

Date: Tue, 16 Apr 2013 00:01:55 -0300
From: Sik the hedgehog
To: SDL Development List
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:

Content-Type: text/plain; charset=UTF-8

Um, is a hashtable needed for this idea as opposed to a regular array?
I mean, you're literally just adding entries to a queue, you don't
even need to retrieve them back. As for the Z order, just assign an
unique Z to each entry and be done with it. Sure, you may run out of
range, but at that point you probably have queued up enough primitives
to be worth flushing the batch.

As long as you have access to uintptr_t, there should be few to zero
concerns about the range of anything that can be described as an id
number, and it's pretty easy to describe z-order as an id in this
case.

Quote:

Date: Mon, 15 Apr 2013 23:58:24 -0700
From: Forest Hale
To:
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:
Content-Type: text/plain; charset=ISO-8859-1

Somehow this turned into a scenegraph discussion, to which I recommend this
reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn't it kind of implicit in a 2D
graphics API that your draw order is sacred? You usually want things to
overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing, it
only needs to skip issuing state calls that are the same, and some things
that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the
vertex buffer before drawing any of them, for a considerable savings in
driver overhead).

After double checking the header files to make certain I was
remembering correctly, it looks like a dirty-rect queue should work
fairly well. Algorithm:
1) Check the current render command against the dirty-rect(s) of the
current queue node, if
1a: They overlap, then create a new queue node and make this commend
the first entry in that node, else
1b: Add this command to the current node, and expand the node's
dirty-rect(s) to cover the new area.

You can use a new node every time you use a different texture, you can
combine multiple textures in a single node (after all, you know they
won't overlap), you can send point & line data either the same way, or
with custom nodes, you can provide hook functions to shoe-horn your
own rendering system into the queue, etc. This would also be a first
step towards the oft-requested (albeit somewhat bone-headed) feature
of issuing rendering calls from whichever thread you want.

The main issue is how the system would work. I think that what Mason's
suggesting would require that you look through nodes until you find a
dirty-rect collision (even a partial dirty-rect collision would count,
and the only thing that gets looked at is the actual coordinates,
whether e.g. the texture is or isn't the same doesn't matter). At that
point you go back to the most-recently-checked node that used the same
texture and add your command there, or add your command in a new node
if there wasn't a previous node.

That should (I think) provide the correct sequencing, while also
ensuring that you reuse textures as few times as you can get away
with.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Forest Hale

Guest

Posted: Tue Apr 16, 2013 8:58 am

If SDL Renderer is turning into a beast, it should be punted to its own library, much like SDL_mixer and so on.

As a matter of practicality however, I think it is fine being in the core, so long as it stays simple and direct.

My understanding of the problem that warranted this discussion is that it is abusive about Draw calls and texture switches or some such? That has nothing at all to do with draw order, and if the draw
order is less important than performance then the app should take care of sorting them first, it isn't the duty of SDL to fix an app performance issue.

As far as some underlying technical details of GL and D3D APIs, I would recommend implementing a draw queue (fully buffered API) that is flushed to real calls to the driver after enough vertex data
has accumulated to make it worthwhile, this also allows multiple consecutive draws to be merged if their state is the same, a lot of optimizations can be done once you have that "lookahead" capability
inherent in the flush routine.

On 04/16/2013 12:30 AM, Sik the hedgehog wrote:

Quote:

Somehow this turned into a scenegraph discussion, to which I recommend this
reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn't it kind of implicit in a 2D
graphics API that your draw order is sacred? You usually want things to
overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing, it
only needs to skip issuing state calls that are the same, and some things
that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the
vertex buffer before drawing any of them, for a considerable savings in
driver overhead).

On 04/15/2013 10:25 PM, Sik the hedgehog wrote:

Quote:

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of *drawing calls* by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL
state.
All you need to keep cached is the map of textures to arrays of
coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.
I've
been using this for a while now. The system works.

________________________________
From: John
To:
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost
of
switching textures, and intends to minimize the number texture switches
by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The
biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an "intermediate mode" layer unto
itself.
The
layer is a massive `todo` buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don't
choose
the
batch size wisely, it's possible to lose any parallelism that you might
have
had
when GL calls were mixed in with scene graph calls. The second challenge
is
to
support transparency and other effects that depend on multiple passes in
a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly.
It
does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D
have
moved away from because it's so slow. GLES doesn't even support
Immediate
Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to
a
minimum, and pass as much data as possible all at once in an array. Of
course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a
pair
of
rects to a texture's mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which
are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter
to
make
sure the right things draw on top of the right things, and what you end
up
with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to

pulling

Quote:

an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/uthash/>, which are both

fairly

Quote:

small and BSD-licensed.

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Tue Apr 16, 2013 9:08 am

The problem is that SDL issues calls for every thing you draw, i.e. it
does things the naive way rather than being 100% optimized for GPUs
(which really are more optimized towards rendering entire complex 3D
scenes than rendering generic 2D stuff). He wants to change the way
the renderer works so it fits that better, the problem being he wants
to pull in an external dependency (uthash).

2013/4/16, Forest Hale:

Quote:

If SDL Renderer is turning into a beast, it should be punted to its own
library, much like SDL_mixer and so on.

As a matter of practicality however, I think it is fine being in the core,
so long as it stays simple and direct.

My understanding of the problem that warranted this discussion is that it is
abusive about Draw calls and texture switches or some such? That has
nothing at all to do with draw order, and if the draw
order is less important than performance then the app should take care of
sorting them first, it isn't the duty of SDL to fix an app performance
issue.

As far as some underlying technical details of GL and D3D APIs, I would
recommend implementing a draw queue (fully buffered API) that is flushed to
real calls to the driver after enough vertex data
has accumulated to make it worthwhile, this also allows multiple consecutive
draws to be merged if their state is the same, a lot of optimizations can be
done once you have that "lookahead" capability
inherent in the flush routine.

On 04/16/2013 12:30 AM, Sik the hedgehog wrote:

Quote:

Somehow this turned into a scenegraph discussion, to which I recommend
this
reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn't it kind of implicit in a 2D
graphics API that your draw order is sacred? You usually want things to
overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing,
it
only needs to skip issuing state calls that are the same, and some
things
that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the
vertex buffer before drawing any of them, for a considerable savings in
driver overhead).

On 04/15/2013 10:25 PM, Sik the hedgehog wrote:

Quote:

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of *drawing calls* by delaying
primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple
expedient
of flushing the to-do buffer if a call comes in that changes the GL
state.
All you need to keep cached is the map of textures to arrays of
coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.
I've
been using this for a while now. The system works.

________________________________
From: John
To:
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the
cost
of
switching textures, and intends to minimize the number texture
switches
by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The
biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an "intermediate mode" layer unto
itself.
The
layer is a massive `todo` buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don't
choose
the
batch size wisely, it's possible to lose any parallelism that you
might
have
had
when GL calls were mixed in with scene graph calls. The second
challenge
is
to
support transparency and other effects that depend on multiple passes
in
a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it
bluntly.
It
does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D
have
moved away from because it's so slow. GLES doesn't even support
Immediate
Mode,
so if you look at SDL's GLES renderer, it does the closest thing it
can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls
to
a
minimum, and pass as much data as possible all at once in an array.
Of
course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping
for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a
pair
of
rects to a texture's mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as
buffers
and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could
port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order
in
which
draw calls are received, but instead grouping them by texture, which
are
in turn
sorted by hash order (essentially random,) you need a Z-order
parameter
to
make
sure the right things draw on top of the right things, and what you
end
up
with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few
hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to

pulling

Quote:

an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash

and

Quote:

utarray <http://troydhanson.github.io/uthash/>, which are both

fairly

Quote:

small and BSD-licensed.

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

--
LordHavoc
Author of DarkPlaces Quake1 engine -
http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Forest Hale

Guest

Posted: Tue Apr 16, 2013 9:23 am

If you buffer draws, you get higher performance.

If you additionally sort them, you get even higher performance but break the most basic assumption of a 2D graphics API - that things occur in the order specified.

I see no reason to use uthash here, I do see great reason to buffer things.

Why is uthash still the subject of this discussion? We're not going to reach a conclusion on the broad topic of outside dependencies, it's better to focus on the specific problem at hand.

On 04/16/2013 02:08 AM, Sik the hedgehog wrote:

Quote:

If SDL Renderer is turning into a beast, it should be punted to its own
library, much like SDL_mixer and so on.

As a matter of practicality however, I think it is fine being in the core,
so long as it stays simple and direct.

My understanding of the problem that warranted this discussion is that it is
abusive about Draw calls and texture switches or some such? That has
nothing at all to do with draw order, and if the draw
order is less important than performance then the app should take care of
sorting them first, it isn't the duty of SDL to fix an app performance
issue.

As far as some underlying technical details of GL and D3D APIs, I would
recommend implementing a draw queue (fully buffered API) that is flushed to
real calls to the driver after enough vertex data
has accumulated to make it worthwhile, this also allows multiple consecutive
draws to be merged if their state is the same, a lot of optimizations can be
done once you have that "lookahead" capability
inherent in the flush routine.

On 04/16/2013 12:30 AM, Sik the hedgehog wrote:

Quote:

Somehow this turned into a scenegraph discussion, to which I recommend
this
reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn't it kind of implicit in a 2D
graphics API that your draw order is sacred? You usually want things to
overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing,
it
only needs to skip issuing state calls that are the same, and some
things
that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the
vertex buffer before drawing any of them, for a considerable savings in
driver overhead).

On 04/15/2013 10:25 PM, Sik the hedgehog wrote:

Quote:

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of *drawing calls* by delaying
primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple
expedient
of flushing the to-do buffer if a call comes in that changes the GL
state.
All you need to keep cached is the map of textures to arrays of
coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.
I've
been using this for a while now. The system works.

________________________________
From: John
To:
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the
cost
of
switching textures, and intends to minimize the number texture
switches
by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The
biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an "intermediate mode" layer unto
itself.
The
layer is a massive `todo` buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don't
choose
the
batch size wisely, it's possible to lose any parallelism that you
might
have
had
when GL calls were mixed in with scene graph calls. The second
challenge
is
to
support transparency and other effects that depend on multiple passes
in
a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it
bluntly.
It
does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D
have
moved away from because it's so slow. GLES doesn't even support
Immediate
Mode,
so if you look at SDL's GLES renderer, it does the closest thing it
can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls
to
a
minimum, and pass as much data as possible all at once in an array.
Of
course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping
for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a
pair
of
rects to a texture's mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as
buffers
and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could
port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order
in
which
draw calls are received, but instead grouping them by texture, which
are
in turn
sorted by hash order (essentially random,) you need a Z-order
parameter
to
make
sure the right things draw on top of the right things, and what you
end
up
with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few
hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to

pulling

Quote:

an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash

and

Quote:

utarray <http://troydhanson.github.io/uthash/>, which are both

fairly

Quote:

small and BSD-licensed.

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

--
LordHavoc
Author of DarkPlaces Quake1 engine -
http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Constantin Berhard

Guest

Posted: Tue Apr 16, 2013 1:49 pm

to my mind came the following idea:

have a todo queue, but only for one texture. Flush it when it's full or when another texture should be rendered.
-> if speed is an issue for a specific program, the programmer can sort the calls by texture and get the speedup
-> we don't need a complicated system or an external dependency in SDL

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Nathaniel J Fries

Joined: 30 Mar 2010

Posts: 444

Posted: Tue Apr 16, 2013 2:32 pm

Optimization is a task of the programmer, not the library.
That said, SDL's interface is too high-level to enable the programmer to optimize render performance.

I see a couple options here:
1) Add another function for rendering the same texture multiple times:

Code:

int SDL_RenderCopyMulti(SDL_Renderer *, SDL_Texture *, int nTimes, const SDL_Rect *);

2) Add a "sprite batch" API:

Code:

typedef struct SDL_SpriteBatch SDL_SpriteBatch;
SDL_SpriteBatch * SDL_CreateSpriteBatch(SDL_Renderer *);
void SDL_DestroySpriteBatch(SDL_SpriteBatch *);
int SDL_BatchCopy(SDL_SpriteBatch *, SDL_Texture *, const SDL_Rect *);
int SDL_BatchFlush(SDL_SpriteBatch *);

External dependencies in the renderer?

John

Guest

Posted: Tue Apr 16, 2013 2:51 pm

GL likes to generate texture ids incrementing from 1. I don't recall whether
that's standard or reliable. If it is, you wouldn't want a general purpose hash
table to map texture ids.

On 04/15/2013 11:09 PM, Mason Wheeler wrote:

Quote:

A hashtable is needed because this is *not* just a queue. To get good
performance out of it, it has to be grouped by texture. The idea is that you
select each texture once, and perform all of the drawing for it all at once.
What we have now is just a queue, and it's horribly slow. On a complicated
scene, it's the difference between a few dozen API calls, or a few tens of
thousands of them. (Yes, I have rendered scenes that involved with SDL.)

Mason

--------------------------------------------------------------------------------
*From:* Sik the hedgehog
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 8:01 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

Um, is a hashtable needed for this idea as opposed to a regular array?
I mean, you're literally just adding entries to a queue, you don't
even need to retrieve them back. As for the Z order, just assign an
unique Z to each entry and be done with it. Sure, you may run out of
range, but at that point you probably have queued up enough primitives
to be worth flushing the batch.

Also yeah, I wonder about the textures too, although I guess you can
always force a flush in that case.

2013/4/15, Ryan C. Gordon <mailto:>:

Quote:

Can you elaborate on the reason why uthash is not attractive to you?

I haven't even clicked on the link, so I can't say anything about
uthash. As an external piece of code, I'm hesitant to add it to SDL,
since that has caused annoyances in the past, unless there was a really
good reason.

(Doubly-so for a hashtable. I mean, a hashtable? Do we really need to
scour the internet for a hashtable?)

I imagine it's probably a fine piece of code in itself, though.

--ryan.

_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Tue Apr 16, 2013 4:41 pm

It's not "ordering by Z and texture" but "grouping by Z and texture". Every render with a Z of 1 will get sent before every render with a Z of 2, and so on. That's why I said you end up with an array of multimaps.

Mason

From: Sik the hedgehog
To: Mason Wheeler; SDL Development List
Sent: Monday, April 15, 2013 10:25 PM
Subject: Re: [SDL] External dependencies in the renderer?

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler:

Quote:

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of *drawing calls* by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL state.
All you need to keep cached is the map of textures to arrays of coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order. I've
been using this for a while now. The system works.

________________________________
From: John
To:
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an "intermediate mode" layer unto itself.
The
layer is a massive `todo` buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don't choose
the
batch size wisely, it's possible to lose any parallelism that you might have
had
when GL calls were mixed in with scene graph calls. The second challenge is
to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly. It
does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D
have
moved away from because it's so slow. GLES doesn't even support Immediate
Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture's mapped list, and SDL_RenderPresent into an operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter to
make
sure the right things draw on top of the right things, and what you end up
with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array, because
C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon
*To:* SDL Development List
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/uthash/>, which are both fairly
small and BSD-licensed.

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

John

Guest

Posted: Tue Apr 16, 2013 5:29 pm

That sounds like ordering by Z to me, no?

The GLES device vendors advise against implementing your own depth sorting
because the GPU depth test does it much faster, more efficiently, can correctly
handle overlaps, and runs in parallel with the CPU.

Also, z is floating point in transformed view coordinates which means there may
not be many duplicate z values to group by.

Have you measured the cost of switching the active texture unit? The number of
switches that will be saved by this optimization is easy to calculate, it's
roughly the number of primitives minus the number of textures.

On 04/16/2013 12:41 PM, Mason Wheeler wrote:

Quote:

It's not "ordering by Z and texture" but "grouping by Z and texture". Every
render with a Z of 1 will get sent before every render with a Z of 2, and so
on. That's why I said you end up with an array of multimaps.

Mason

--------------------------------------------------------------------------------
*From:* Sik the hedgehog
*To:* Mason Wheeler; SDL Development List

*Sent:* Monday, April 15, 2013 10:25 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler <mailto:>:

Quote:

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of *drawing calls* by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL state.
All you need to keep cached is the map of textures to arrays of coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order. I've
been using this for a while now. The system works.

________________________________
From: John <mailto:>
To: <mailto:
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an "intermediate mode" layer unto itself.
The
layer is a massive `todo` buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don't choose
the
batch size wisely, it's possible to lose any parallelism that you might have
had
when GL calls were mixed in with scene graph calls. The second challenge is
to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly. It
does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D
have
moved away from because it's so slow. GLES doesn't even support Immediate
Mode,
so if you look at SDL's GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture's mapped list, and SDL_RenderPresent into an operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter to
make
sure the right things draw on top of the right things, and what you end up
with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array, because
C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------
*From:* Ryan C. Gordon <mailto:>
*To:* SDL Development List <mailto:>
*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/uthash/>, which are both fairly
small and BSD-licensed.

I'd rather we have a simple hashtable implementation in SDL.

What's the plan?

--ryan.

_______________________________________________
SDL mailing list
<mailto:

<mailto: <mailto:>

Quote:

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Tue Apr 16, 2013 11:37 pm

The problem is that I think the idea is to use a single batch for
everything... Again, I'm not sure at all that this kind of Z ordering
is reliable in that case. The problem is that the safest way is
sending one thing at a time, i.e. one draw call per SDL function,
which is the very thing we're trying to avoid...

Also yeah, the Z range is why I said we could run out of them. On PCs
we have 24-bit depth buffer, OK (though somebody could still attempt
to set 16-bit, and I guess on 2D this could make sense), but on mobile
I wonder how the Z range is handled (especially on referred renderers
as opposed to standard rasterizer ones).

And yes, OpenGL numerates textures from 1 onwards (this is true for
all objects, really), but remember you can create gaps by deleting
textures, and OpenGL will attempt to fill those if I recall correctly
(I'm not sure about the details).

2013/4/16, John:

Quote:

That sounds like ordering by Z to me, no?

The GLES device vendors advise against implementing your own depth sorting
because the GPU depth test does it much faster, more efficiently, can
correctly
handle overlaps, and runs in parallel with the CPU.

Also, z is floating point in transformed view coordinates which means there
may
not be many duplicate z values to group by.

Have you measured the cost of switching the active texture unit? The number
of
switches that will be saved by this optimization is easy to calculate, it's

roughly the number of primitives minus the number of textures.

On 04/16/2013 12:41 PM, Mason Wheeler wrote:

Quote:

It's not "ordering by Z and texture" but "grouping by Z and texture".
Every
render with a Z of 1 will get sent before every render with a Z of 2, and
so
on. That's why I said you end up with an array of multimaps.

Mason

--------------------------------------------------------------------------------
*From:* Sik the hedgehog
*To:* Mason Wheeler; SDL Development List

*Sent:* Monday, April 15, 2013 10:25 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler
<mailto:>:

Quote:

primitives

Quote:

and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple

expedient

Quote:

of flushing the to-do buffer if a call comes in that changes the GL

state.

Quote:

All you need to keep cached is the map of textures to arrays of

coordinates.

Quote:

And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.

I've

Quote:

been using this for a while now. The system works.

________________________________
From: John

<mailto:>

Quote:

To: <mailto:
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost

Quote:

switching textures, and intends to minimize the number texture switches

Quote:

delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The

biggest

Quote:

challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an "intermediate mode" layer unto

itself.

Quote:

The
layer is a massive `todo` buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don't

choose

Quote:

the
batch size wisely, it's possible to lose any parallelism that you might

have

Quote:

had
when GL calls were mixed in with scene graph calls. The second

challenge is

Quote:

to
support transparency and other effects that depend on multiple passes

in a

Quote:

specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly.

Quote:

does
everything in Immediate Mode, which modern versions of OpenGL and

Direct3D

Quote:

have
moved away from because it's so slow. GLES doesn't even support

Immediate

Quote:

Mode,
so if you look at SDL's GLES renderer, it does the closest thing it

can

Quote:

find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to

Quote:

minimum, and pass as much data as possible all at once in an array.

Quote:

course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping

for

Quote:

them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a

pair

Quote:

of
rects to a texture's mapped list, and SDL_RenderPresent into an

operation

Quote:

that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as

buffers

Quote:

and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering

significantly,

Quote:

about
3x faster than stock SDL rendering. With a multimap in C, I could

port

Quote:

this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order

Quote:

which
draw calls are received, but instead grouping them by texture, which

are

Quote:

in turn
sorted by hash order (essentially random,) you need a Z-order

parameter to

Quote:

make
sure the right things draw on top of the right things, and what you

end up

Quote:

with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few

hundred

Quote:

lines of
code (plus the implementations of the hash and the dynamic array,

because

Quote:

C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------

Quote:

*From:* Ryan C. Gordon

<mailto:>

Quote:

*To:* SDL Development List

<mailto:>

Quote:

*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to

pulling

Quote:

an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash

and

Quote:

utarray <http://troydhanson.github.io/uthash/>, which are both

fairly

Quote:

small and BSD-licensed.

I'd rather we have a simple hashtable implementation in SDL.

What's the plan?

--ryan.

_______________________________________________
SDL mailing list
<mailto:

<mailto: <mailto:>

Quote:

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Wed Apr 17, 2013 12:49 am

OK, since it's apparently not clear from my original proposal, I wasn't
talking about sending Z coordinates to OpenGL or Direct3D in any way.
I was talking about using them on the SDL side. You'd end up with a
certain number of layers, (most 2D games draw 4 or 5 distinct layers
IME,) and each layer would have its own Z number.

Each Z layer would have its own texture-to-coordinates multimap.
When it's time to render everything, it looks like this (pseudocode):

for each multimap in layers:
for each texture in multimap:
CreateCoordArrays(multimap[texture])
SelectTexture(texture)
RenderArrays

It's really that simple, in concept. Everything draws on top of what
it's supposed to draw on top of. There's no need to send Z ordering
to the GPU. There's no atrociously slow one-API-render-per-call.
I've tested it. It works, and it's about 3x faster than the current system
on large, complicated scenes.

There are only two real downsides: 1) it requires a multimap to work
properly, which we need a library for because libc provides neither a
multimap implementation nor the fundamental primitives needed to
build one (a map and a dynamic array).
And 2) SDL_RenderCopy does not currently have a Z parameter on
it, which is needed to make layering work correctly.

Mason

From: Sik the hedgehog
To: SDL Development List
Sent: Tuesday, April 16, 2013 4:37 PM
Subject: Re: [SDL] External dependencies in the renderer?

The problem is that I think the idea is to use a single batch for
everything... Again, I'm not sure at all that this kind of Z ordering
is reliable in that case. The problem is that the safest way is
sending one thing at a time, i.e. one draw call per SDL function,
which is the very thing we're trying to avoid...

Also yeah, the Z range is why I said we could run out of them. On PCs
we have 24-bit depth buffer, OK (though somebody could still attempt
to set 16-bit, and I guess on 2D this could make sense), but on mobile
I wonder how the Z range is handled (especially on referred renderers
as opposed to standard rasterizer ones).

And yes, OpenGL numerates textures from 1 onwards (this is true for
all objects, really), but remember you can create gaps by deleting
textures, and OpenGL will attempt to fill those if I recall correctly
(I'm not sure about the details).

2013/4/16, John:

Quote:

That sounds like ordering by Z to me, no?

The GLES device vendors advise against implementing your own depth sorting
because the GPU depth test does it much faster, more efficiently, can
correctly
handle overlaps, and runs in parallel with the CPU.

Also, z is floating point in transformed view coordinates which means there
may
not be many duplicate z values to group by.

Have you measured the cost of switching the active texture unit? The number
of
switches that will be saved by this optimization is easy to calculate, it's

roughly the number of primitives minus the number of textures.

On 04/16/2013 12:41 PM, Mason Wheeler wrote:

Quote:

It's not "ordering by Z and texture" but "grouping by Z and texture".
Every
render with a Z of 1 will get sent before every render with a Z of 2, and
so
on. That's why I said you end up with an array of multimaps.

Mason

--------------------------------------------------------------------------------
*From:* Sik the hedgehog
*To:* Mason Wheeler; SDL Development List

*Sent:* Monday, April 15, 2013 10:25 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler
<mailto:>:

Quote:

primitives

Quote:

and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple

expedient

Quote:

of flushing the to-do buffer if a call comes in that changes the GL

state.

Quote:

All you need to keep cached is the map of textures to arrays of

coordinates.

Quote:

And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.

I've

Quote:

been using this for a while now. The system works.

________________________________
From: John

<mailto:>

Quote:

To: <mailto:
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost

Quote:

switching textures, and intends to minimize the number texture switches

Quote:

delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The

biggest

Quote:

challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an "intermediate mode" layer unto

itself.

Quote:

The
layer is a massive `todo` buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don't

choose

Quote:

the
batch size wisely, it's possible to lose any parallelism that you might

have

Quote:

had
when GL calls were mixed in with scene graph calls. The second

challenge is

Quote:

to
support transparency and other effects that depend on multiple passes

in a

Quote:

specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly.

Quote:

does
everything in Immediate Mode, which modern versions of OpenGL and

Direct3D

Quote:

have
moved away from because it's so slow. GLES doesn't even support

Immediate

Quote:

Mode,
so if you look at SDL's GLES renderer, it does the closest thing it

can

Quote:

find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to

Quote:

minimum, and pass as much data as possible all at once in an array.

Quote:

course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping

for

Quote:

them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a

pair

Quote:

of
rects to a texture's mapped list, and SDL_RenderPresent into an

operation

Quote:

that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as

buffers

Quote:

and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering

significantly,

Quote:

about
3x faster than stock SDL rendering. With a multimap in C, I could

port

Quote:

this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order

Quote:

which
draw calls are received, but instead grouping them by texture, which

are

Quote:

in turn
sorted by hash order (essentially random,) you need a Z-order

parameter to

Quote:

make
sure the right things draw on top of the right things, and what you

end up

Quote:

with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few

hundred

Quote:

lines of
code (plus the implementations of the hash and the dynamic array,

because

Quote:

C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------

Quote:

*From:* Ryan C. Gordon

<mailto:>

Quote:

*To:* SDL Development List

<mailto:>

Quote:

*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to

pulling

Quote:

an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash

and

Quote:

utarray <http://troydhanson.github.io/uthash/>, which are both

fairly

Quote:

small and BSD-licensed.

I'd rather we have a simple hashtable implementation in SDL.

What's the plan?

--ryan.

_______________________________________________
SDL mailing list
<mailto:

<mailto: <mailto:>

Quote:

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Wed Apr 17, 2013 1:59 am

Not just SDL_RenderCopy, nothing in the rendering API has.

It looks like basically what you're doing is just telling the API that
the order doesn't matter as long as specific groups are, if I'm
understanding correctly (which isn't how the SDL API works). Indeed
that's a valid optimization but not one that would work with the
current API, if that's the case. Is that correct?

2013/4/16, Mason Wheeler:

Quote:

OK, since it's apparently not clear from my original proposal, I wasn't
talking about sending Z coordinates to OpenGL or Direct3D in any way.
I was talking about using them on the SDL side.Â You'd end up with a
certain number of layers, (most 2D games draw 4 or 5 distinct layers
IME,) and each layer would have its own Z number.

Each Z layer would have its own texture-to-coordinates multimap.
When it's time to render everything, it looks like this (pseudocode):

for each multimap in layers:
Â Â for each texture in multimap:
Â Â Â Â Â CreateCoordArrays(multimap[texture])
Â Â Â Â Â SelectTexture(texture)
Â Â Â Â Â RenderArrays

It's really that simple, in concept.Â Everything draws on top of what
it's supposed to draw on top of.Â There's no need to send Z ordering
to the GPU.Â There's no atrociously slow one-API-render-per-call.
I've tested it.Â It works, and it's about 3x faster than the current system
on large, complicated scenes.

There are only two real downsides: 1) it requires a multimap to work
properly, which we need a library for because libc provides neither a
multimap implementation nor the fundamental primitives needed to
build one(a map and a dynamic array).
And 2) SDL_RenderCopy does not currently have a Z parameter on
it, whichis needed to make layering work correctly.

Mason

________________________________
From: Sik the hedgehog
To: SDL Development List
Sent: Tuesday, April 16, 2013 4:37 PM
Subject: Re: [SDL] External dependencies in the renderer?

The problem is that I think the idea is to use a single batch for
everything... Again, I'm not sure at all that this kind of Z ordering
is reliable in that case. The problem is that the safest way is
sending one thing at a time, i.e. one draw call per SDL function,
which is the very thing we're trying to avoid...

Also yeah, the Z range is why I said we could run out of them. On PCs
we have 24-bit depth buffer, OK (though somebody could still attempt
to set 16-bit, and I guess on 2D this could make sense), but on mobile
I wonder how the Z range is handled (especially on referred renderers
as opposed to standard rasterizer ones).

And yes, OpenGL numerates textures from 1 onwards (this is true for
all objects, really), but remember you can create gaps by deleting
textures, and OpenGL will attempt to fill those if I recall correctly
(I'm not sure about the details).

2013/4/16, John:

Quote:

That sounds like ordering by Z to me, no?

The GLES device vendors advise against implementing your own depth sorting
because the GPU depth test does it much faster, more efficiently, can
correctly
handle overlaps, and runs in parallel with the CPU.

Also, z is floating point in transformed view coordinates which means
there
may
not be many duplicate z values to group by.

Have you measured the cost of switching the active texture unit? The
number
of
switches that will be saved by this optimization is easy to calculate,
it's

roughly the number of primitives minus the number of textures.

On 04/16/2013 12:41 PM, Mason Wheeler wrote:

Quote:

It's not "ordering by Z and texture" but "grouping by Z and texture".
Every
render with a Z of 1 will get sent before every render with a Z of 2, and
so
on.Â That's why I said you end up with an array of multimaps.

Mason

--------------------------------------------------------------------------------
*From:* Sik the hedgehog
*To:* Mason Wheeler; SDL Development List

*Sent:* Monday, April 15, 2013 10:25 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler
<mailto:>:
Â > Not exactly. The optimization assumes that the principal rendering
Â > bottleneck is the overhead involved in sending scene data to the
Â > graphics card, which assumption is borne out by testing data.Â It
Â > intends to minimize the number of *drawing calls* by delaying
primitives
Â > and sending them in batches, ordered by Z and texture.
Â >
Â >
Â > You avoid having to cache "the entire GL state" by the simple
expedient
Â > of flushing the to-do buffer if a call comes in that changes the GL
state.
Â > All you need to keep cached is the map of textures to arrays of
coordinates.
Â > And transparency works fine as long as you have a Z parameter to order
Â > by.Â Things get drawn on top of each other in the prescribed order.
I've
Â > been using this for a while now.Â The system works.
Â >
Â >
Â >
Â > ________________________________
Â >Â From: John
<mailto:>
Â > To: <mailto:
Â > Sent: Monday, April 15, 2013 8:36 PM
Â > Subject: Re: [SDL] External dependencies in the renderer?
Â >
Â >
Â > Ok, so the optimization assumes that a rendering bottleneck is the
cost
of
Â > switching textures, and intends to minimize the number texture
switches
by
Â > delaying primitives, then re-ordering them by texture and Z.
Â >
Â > I've seen this before. It can be done, but there are caveats. The
biggest
Â > challenge is you need to cache the entire GL state for each delayed
Â > primitive.
Â > The implementation is effectively an "intermediate mode" layer unto
itself.
Â > The
Â > layer is a massive `todo` buffer with three phases: queue everything,
Â > analyze
Â > (re-order) the queue, then execute the queue as a batch. If you don't
choose
Â > the
Â > batch size wisely, it's possible to lose any parallelism that you
might
have
Â > had
Â > when GL calls were mixed in with scene graph calls. The second
challenge is
Â > to
Â > support transparency and other effects that depend on multiple passes
in a
Â > specific order, or that play games with the z-buffer (or other tests.)
Â >
Â >
Â >
Â > On 04/15/2013 10:41 PM, Mason Wheeler wrote:
Â >> Here's the basic idea.
Â >>
Â >> The internals of SDL's rendering API are atrocious, to put it
bluntly.
Â It
Â >> does
Â >> everything in Immediate Mode, which modern versions of OpenGL and
Direct3D
Â >> have
Â >> moved away from because it's so slow.Â GLES doesn't even support
Immediate
Â >> Mode,
Â >> so if you look at SDL's GLES renderer, it does the closest thing it
can
Â >> find to
Â >> Immediate Mode, sending one call to OpenGL every time someone calls
Â >> SDL_RenderCopy.
Â >>
Â >> The way to do rendering fast is to keep the number of library calls
to
a
Â >> minimum, and pass as much data as possible all at once in an array.
Of
Â >> course,
Â >> that's not the way people use SDL; they use SDL to draw a bunch of
Â >> sprites, one
Â >> at a time.Â So to be fast, SDL has to keep track of the bookkeeping
for
Â >> them.
Â >>
Â >> The way to do this is with a multimap, mapping textures to lists of
Â >> drawing
Â >> coordinates.Â You turn SDL_RenderCopy into an operation that adds a
pair
Â >> of
Â >> rects to a texture's mapped list, and SDL_RenderPresent into an
operation
Â >> that
Â >> iterates over the multimap and for each texture, builds two arrays of
Â >> vertices
Â >> (one for screen coordinates and one for texture coordinates) as
buffers
Â >> and
Â >> passes them to the renderer all at once.
Â >>
Â >> I've got a Delphi implementation that sped up my rendering
significantly,
Â >> about
Â >> 3x faster than stock SDL rendering.Â With a multimap in C, I could
port
Â >> this
Â >> concept to the SDL internals.
Â >>
Â >> The one tricky thing here, the concept that my renderer has that SDL
Â >> doesn't, is
Â >> Z-order.Â If you're no longer deterministically drawing in the order
in
Â >> which
Â >> draw calls are received, but instead grouping them by texture, which
are
Â >> in turn
Â >> sorted by hash order (essentially random,) you need a Z-order
parameter to
Â >> make
Â >> sure the right things draw on top of the right things, and what you
end up
Â >> with
Â >> is an array of multimaps.
Â >>
Â >> I know it probably sounds very complicated, but it's only a few
hundred
Â >> lines of
Â >> code (plus the implementations of the hash and the dynamic array,
because
Â >> C
Â >> doesn't have them built in) and it makes rendering *much* faster.
Â >>
Â >> Mason
Â >>
Â >>
--------------------------------------------------------------------------------
Â >> *From:* Ryan C. Gordon
<mailto:>
Â >> *To:* SDL Development List
<mailto:>
Â >> *Sent:* Monday, April 15, 2013 6:20 PM
Â >> *Subject:* Re: [SDL] External dependencies in the renderer?
Â >>
Â >> On 4/15/13 2:46 PM, Mason Wheeler wrote:
Â >>Â > Does anyone (particularly Sam and Ryan) have any objections to
pulling
Â >>Â > an external library into SDL?Â Because I have an idea that could
Â >>Â > significantly improve the performance of SDL's 3d-accelerated
Â >> rendering,
Â >>Â > but it would require a multimap.Â Neither SDL nor the C standard
Â >> library
Â >>Â > has a multimap implementation, but I could build one with uthash
and
Â >>Â > utarray <http://troydhanson.github.io/uthash/>, which are both
fairly
Â >>Â > small and BSD-licensed.
Â >>
Â >> I'd rather we have a simple hashtable implementation in SDL.
Â >>
Â >> What's the plan?
Â >>
Â >> --ryan.
Â >>
Â >>
Â >>
Â >> _______________________________________________
Â >> SDL mailing list
Â >> <mailto:
<mailto: <mailto:>
Â >> http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Â >>
Â >>
Â >>
Â >>
Â >> _______________________________________________
Â >> SDL mailing list
Â >> <mailto:
Â >> http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Â >>
Â > _______________________________________________
Â > SDL mailing list
Â > <mailto:
Â > http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Jared Maddox

Guest

Posted: Wed Apr 17, 2013 2:51 am

Can you guys please trim your replies? The quotes are getting way too long.

Quote:

Date: Tue, 16 Apr 2013 02:23:02 -0700
From: Forest Hale
To:
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:
Content-Type: text/plain; charset=ISO-8859-1

If you buffer draws, you get higher performance.

If you additionally sort them, you get even higher performance but break the
most basic assumption of a 2D graphics API - that things occur in the order
specified.

I see no reason to use uthash here, I do see great reason to buffer things.

Why is uthash still the subject of this discussion? We're not going to
reach a conclusion on the broad topic of outside dependencies, it's better
to focus on the specific problem at hand.

Because Mason wants to SORT things, instead of just buffering them.

Quote:

Date: Tue, 16 Apr 2013 07:32:38 -0700
From: "Nathaniel J Fries"
To:
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:
Content-Type: text/plain; charset="iso-8859-1"

Optimization is a task of the programmer, not the library.
That said, SDL's interface is too high-level to enable the programmer to
optimize render performance.

I see a couple options here:
1) Add another function for rendering the same texture multiple times:

Code:

int SDL_RenderCopyMulti(SDL_Renderer *, SDL_Texture *, int nTimes, const
SDL_Rect *);

I think that's a good idea.

Quote:

2) Add a "sprite batch" API:

Code:

typedef struct SDL_SpriteBatch SDL_SpriteBatch;
SDL_SpriteBatch * SDL_CreateSpriteBatch(SDL_Renderer *);
void SDL_DestroySpriteBatch(SDL_SpriteBatch *);
int SDL_BatchCopy(SDL_SpriteBatch *, SDL_Texture *, const SDL_Rect *);
int SDL_BatchFlush(SDL_SpriteBatch *);

Eh, I think this should really go in an external library.

Quote:

Date: Tue, 16 Apr 2013 09:41:29 -0700 (PDT)
From: Mason Wheeler
To: Sik the hedgehog, SDL Development
List
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:

Content-Type: text/plain; charset="iso-8859-1"

It's not "ordering by Z and texture" but "grouping by Z and texture".? Every
render with a Z of 1 will get sent before every render with a Z of 2, and so
on.? That's why I said you end up with an array of multimaps.

Mason

A multimap is a high-level structure, and this would be written in C,
so you're thinking about this incorrectly. I don't support your
particular perspective on this (I prefer Forest's "buffer it"
approach, because it preserves API behavior), but what you're talking
about can be done trivially with any data structure, and if you use
sorting data structures (such as balanced binary search trees) then
you can do it pretty quickly.

However, regardless of that, there's something that still true: you're
talking about ordering by Z, as well as ordering by texture. If you
don't understand why we keep repeating this then you need to go back
to whatever dictionary you're using and try to find a way to reconcile
"Every render with a Z of 1 will get sent before every render with a Z
of 2" and "Sorting by Z". If you don't understand how those two things
are the same, then you should drop this line of enquiry until you do
understand it.

Quote:

Date: Tue, 16 Apr 2013 17:48:58 -0700 (PDT)
From: Mason Wheeler
To: SDL Development List
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:

Content-Type: text/plain; charset="iso-8859-1"

Quote:

Each Z layer would have its own texture-to-coordinates multimap.
When it's time to render everything, it looks like this (pseudocode):

for each multimap in layers:
?? for each texture in multimap:
????? CreateCoordArrays(multimap[texture])
????? SelectTexture(texture)
????? RenderArrays

This is, in fact, ordering by Z and texture, which you said was not
being done. You need to recheck your terminology.

Quote:

It's really that simple, in concept.? Everything draws on top of what
it's supposed to draw on top of.? There's no need to send Z ordering
to the GPU.? There's no atrociously slow one-API-render-per-call.
I've tested it.? It works, and it's about 3x faster than the current system
on large, complicated scenes.

The current SDL2 api is a 2d api. As a result, call 2 draws on top of
call 1, meaning that render calls do actually matter. So this isn't
how you do things? That's fine, but don't try to force your system
down everyone else's throats. I think that the api should be improved
to reduce the number of calls, but preserving draw order is required.

In case you've forgotten, the api is currently "locked", and since
this has the potential to break api behavior for existing games, the
change is not acceptable. Buffering (such as that provided by the
queue method that I suggested earlier) is fine as long as it preserves
draw order, but what you're suggesting is not reliably acceptable.

Quote:

There are only two real downsides: 1) it requires a multimap to work
properly, which we need a library for because libc provides neither a
multimap implementation nor the fundamental primitives needed to
build one(a map and a dynamic array).

Actually, all that you need is a searchable data structure. This
covers everything from arrays, to linked lists, to trees, and doesn't
even have to be sortable. Even then, C does provide some array sorting
functions (e.g. qsort) which can be used to implement this. Thus, C
provides a route to a concept demo.

If you want decent speeds then you want a sorted tree, so that rules
out C's standard library, but at the end of the day a customizable
tree (where you can have multiple customizations) is all that's
needed. After all, a map is just the association of one value with a
data slot, and any searchable data structure does that fine, including
trees. A dynamic array is a general enough term that it can cover any
extensible data structure, and since tree insertions are quicker than
copying a large block of memory when you need to perform an extension,
you might as well use a tree there too.

So, no need for a "map" nor for a "dynamic array", all that's needed
for YOUR preference is a balanced tree.

Quote:

And 2) SDL_RenderCopy does not currently have a Z parameter on
it, whichis needed to make layering work correctly.

Adding Z would be a backwards-compatibility break, which is now forbidden.

Just in case it got lost in the conversation, here's my suggestion
again, which unlike Mason's should presumably maintain compatibility
with the current version:

1) Search through the queue, from most recent node to oldest node,
looking for collisions between the current call's bounding box and the
bounding box of the queue nodes.
2) If a collision is found, or the oldest node is reached without a
collision, add the current command to the node that was most recently
encountered, which also used the same texture as the command, and
expand that node's bounding box.
3) If no node has been found that uses the same texture, add the
command in a new node.

Point, line, and rectangle render commands would go into the same
queue. The main issue would be where the queue should be flushed, I
figure that belongs in SDL_RenderPresent. That one is required on all
platforms, right? It looks like (with the possible exception of the
software renderer) all of the platforms need that to reliably render.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Wed Apr 17, 2013 3:02 am

2013/4/16, Jared Maddox:

Quote:

Point, line, and rectangle render commands would go into the same
queue. The main issue would be where the queue should be flushed, I
figure that belongs in SDL_RenderPresent. That one is required on all
platforms, right? It looks like (with the possible exception of the
software renderer) all of the platforms need that to reliably render.

I believe the software renderer still needs SDL_RenderPresent
(otherwise how does SDL know that it can safely draw the surface on
the window?).

As for flushing, there'd be two points where this should happen for
correct behavior:

1) In SDL_RenderPresent, right before it does its job.
2) When the buffer becomes so big that going any bigger would nullify
the benefits.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Wed Apr 17, 2013 3:55 am

I don't actually think point 2 is valid. The bigger and more complicated the
scene, the more this scheme benefits it. If you're only drawing 20 sprites
per frame, it doesn't matter how inefficient your drawing techniques are; on
modern hardware you'll get good performance anyway. But if you're drawing
20,000 or 200,000, that's when you'll really see the benefit of something
like this.

The only real "hard limit" you'd see for the whole thing getting too big is
when *the whole thing* gets too big, when you start to run into system-level
limitations. And at that point, you've got bigger problems to worry about.

The second point at which you would want to flush the buffer is when the
rendering state changes. To keep complexity down, the buffer operates
under the assumption that everything draws in the same way. If you change
the transparency settings, for example, or (even more obviously) change the
active rendering target, you need to execute all existing buffered draw
commands first and then start over with a clean slate.

Mason

From: Sik the hedgehog
To: SDL Development List
Sent: Tuesday, April 16, 2013 8:02 PM
Subject: Re: [SDL] External dependencies in the renderer?

2013/4/16, Jared Maddox:

Quote:

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Wed Apr 17, 2013 4:37 am

Point 2 does matter in real hardware, sadly. If you try to send too
big of a batch, it'll end up being slower as you'll overwhelm the GPU
trying to transfer all that data into its own memory (in particular,
memory latency will become a massive issue here). I recall the general
suggestion is to not use values larger than 16-bit for indices (i.e.
that makes for 64K entries max in a buffer object), to give an idea.

(there's a debate about whether anybody will ever reach that point -
but I guess that 10,000~15,000 entries probably make a good place to
break up, if you consider most of them will be quads and thereby eat
up four vertices each, although I guess you can optimize this to reuse
primitives and use transformations to work around it instead, but even
then that just doubles the acceptable limit)

I don't think translucency parameters affect the state though. You
could just feed those in the buffer itself and let the shader handle
it (if you're doing this method you definitely are going the shader
route anyway). In this sense textures really should be the only state
change, unless I'm missing something. (oh, and yes, changing the
shader is bad too as it can't be parallelized at all)

2013/4/17, Mason Wheeler:

Quote:

I don't actually think point 2 is valid.Â The bigger and more complicated
the
scene, the more this scheme benefits it.Â If you're only drawing 20 sprites
per frame, it doesn't matter how inefficient your drawing techniques are; on
modern hardware you'll get good performance anyway.Â But if you're drawing
20,000 or 200,000, that's when you'll really see the benefit of something
like this.

The only real "hard limit" you'd see for the whole thing getting too big is
when *the whole thing* gets too big, when you start to run into system-level
limitations.Â And at that point, you've got bigger problems to worry about.

The second point at which you would want to flush the buffer is when the
rendering state changes.Â To keep complexity down, the buffer operates
under the assumption that everything draws in the same way. If you change
the transparency settings, for example, or (even more obviously) change the
active rendering target, you need to execute all existing buffered draw
commands first and then start over with a clean slate.

Mason

________________________________
From: Sik the hedgehog
To: SDL Development List
Sent: Tuesday, April 16, 2013 8:02 PM
Subject: Re: [SDL] External dependencies in the renderer?

2013/4/16, Jared Maddox:

Quote:

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Driedfruit

Guest

Posted: Wed Apr 17, 2013 8:21 am

Quote:

I see a couple options here:
1) Add another function for rendering the same texture multiple times:

Code:

int SDL_RenderCopyMulti(SDL_Renderer *, SDL_Texture *, int nTimes,
const SDL_Rect *);

Just my 2 cents, but this would be *very* lovely to have in any case.

--
driedfruit
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Jonny D

Joined: 12 Sep 2009

Posts: 932

Posted: Wed Apr 17, 2013 12:51 pm

This optimization is a good thing, but the benefit does not scale directly with the number of sprites drawn. When you get into the thousands of sprites, flushing the buffer every few thousand sprites is of negligible cost. This approach helps avoid the incremental cost of several OpenGL calls and a blocking "flush" for every single sprite (the way SDL currently does it).

Sik's point 2 is relevant and applies even more clearly when your buffer is a simple fixed-size array. At some point, you have to decide how much memory you want to allocate for this buffer, and it must be flushed before it overflows. As far as that goes, I'd rather not be allocating memory as I assume the map does. Also, Mason is right that you have to flush before every state change that could change the rendering.

As was said before, we need to guarantee rendering order because the OpenGL depth test is not enough to make alpha blending work in the right order. Z layers is an okay concept, but not terribly widespread in practice. It would be strange to make the SDL API embrace such a high level concept that doesn't apply to most applications.

Jonny D

On Mon, Apr 15, 2013 at 8:04 PM, Driedfruit wrote:

Quote:

> I see a couple options here:

Quote:

1) Add another function for rendering the same texture multiple times:

Code:

int SDL_RenderCopyMulti(SDL_Renderer *, SDL_Texture *, int nTimes,
const SDL_Rect *);

Re: External dependencies in the renderer?

Nathaniel J Fries

Joined: 30 Mar 2010

Posts: 444

Posted: Wed Apr 17, 2013 10:30 pm

Driedfruit wrote:

Quote:

I see a couple options here:
1) Add another function for rendering the same texture multiple times:

Code:

int SDL_RenderCopyMulti(SDL_Renderer *, SDL_Texture *, int nTimes,
const SDL_Rect *);

Just my 2 cents, but this would be *very* lovely to have in any case.

Aye, this would be helpful for an old idea of mine as well.
And I just realized I botched that function definition, that function call is not terribly useful, taking only a destination rect (ditto for the other one).

Code:

int SDL_RenderCopyMulti(SDL_Renderer *, SDL_Texture *, int, const SDL_Rect *, const SDL_Rect *);

External dependencies in the renderer?

Forest Hale

Guest

Posted: Thu Apr 18, 2013 5:24 am

Z buffering does not solve problems for blended transparency, only alpha test, blended transparency still requires sorting back to front, which totally wrecks any texture batching optimizations.

On 04/16/2013 05:48 PM, Mason Wheeler wrote:

Quote:

OK, since it's apparently not clear from my original proposal, I wasn't
talking about sending Z coordinates to OpenGL or Direct3D in any way.
I was talking about using them on the SDL side. You'd end up with a
certain number of layers, (most 2D games draw 4 or 5 distinct layers
IME,) and each layer would have its own Z number.

Each Z layer would have its own texture-to-coordinates multimap.
When it's time to render everything, it looks like this (pseudocode):

for each multimap in layers:
for each texture in multimap:
CreateCoordArrays(multimap[texture])
SelectTexture(texture)
RenderArrays

It's really that simple, in concept. Everything draws on top of what
it's supposed to draw on top of. There's no need to send Z ordering
to the GPU. There's no atrociously slow one-API-render-per-call.
I've tested it. It works, and it's about 3x faster than the current system
on large, complicated scenes.

There are only two real downsides: 1) it requires a multimap to work
properly, which we need a library for because libc provides neither a
multimap implementation nor the fundamental primitives needed to
build one(a map and a dynamic array).
And 2) SDL_RenderCopy does not currently have a Z parameter on
it, whichis needed to make layering work correctly.

Mason

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
*From:* Sik the hedgehog
*To:* SDL Development List
*Sent:* Tuesday, April 16, 2013 4:37 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

The problem is that I think the idea is to use a single batch for
everything... Again, I'm not sure at all that this kind of Z ordering
is reliable in that case. The problem is that the safest way is
sending one thing at a time, i.e. one draw call per SDL function,
which is the very thing we're trying to avoid...

Also yeah, the Z range is why I said we could run out of them. On PCs
we have 24-bit depth buffer, OK (though somebody could still attempt
to set 16-bit, and I guess on 2D this could make sense), but on mobile
I wonder how the Z range is handled (especially on referred renderers
as opposed to standard rasterizer ones).

And yes, OpenGL numerates textures from 1 onwards (this is true for
all objects, really), but remember you can create gaps by deleting
textures, and OpenGL will attempt to fill those if I recall correctly
(I'm not sure about the details).

2013/4/16, John <mailto:>:

Quote:

That sounds like ordering by Z to me, no?

The GLES device vendors advise against implementing your own depth sorting
because the GPU depth test does it much faster, more efficiently, can
correctly
handle overlaps, and runs in parallel with the CPU.

Also, z is floating point in transformed view coordinates which means there
may
not be many duplicate z values to group by.

Have you measured the cost of switching the active texture unit? The number
of
switches that will be saved by this optimization is easy to calculate, it's

roughly the number of primitives minus the number of textures.

On 04/16/2013 12:41 PM, Mason Wheeler wrote:

Quote:

It's not "ordering by Z and texture" but "grouping by Z and texture".
Every
render with a Z of 1 will get sent before every render with a Z of 2, and
so
on. That's why I said you end up with an array of multimaps.

Mason

--------------------------------------------------------------------------------
*From:* Sik the hedgehog <mailto:>
*To:* Mason Wheeler <mailto:>; SDL Development List
<mailto:>
*Sent:* Monday, April 15, 2013 10:25 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler <mailto:
<mailto: <mailto:>>:

Quote:

primitives

Quote:

and sending them in batches, ordered by Z and texture.

You avoid having to cache "the entire GL state" by the simple

expedient

Quote:

of flushing the to-do buffer if a call comes in that changes the GL

state.

Quote:

All you need to keep cached is the map of textures to arrays of

coordinates.

Quote:

And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.

I've

Quote:

been using this for a while now. The system works.

________________________________
From: John <mailto:

<mailto: <mailto:>>

Quote:

To: <mailto: <mailto: <mailto:>
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost

Quote:

switching textures, and intends to minimize the number texture switches

Quote:

delaying primitives, then re-ordering them by texture and Z.

I've seen this before. It can be done, but there are caveats. The

biggest

Quote:

challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an "intermediate mode" layer unto

itself.

Quote:

The
layer is a massive `todo` buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don't

choose

Quote:

the
batch size wisely, it's possible to lose any parallelism that you might

have

Quote:

had
when GL calls were mixed in with scene graph calls. The second

challenge is

Quote:

to
support transparency and other effects that depend on multiple passes

in a

Quote:

specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Quote:

Here's the basic idea.

The internals of SDL's rendering API are atrocious, to put it bluntly.

Quote:

does
everything in Immediate Mode, which modern versions of OpenGL and

Direct3D

Quote:

have
moved away from because it's so slow. GLES doesn't even support

Immediate

Quote:

Mode,
so if you look at SDL's GLES renderer, it does the closest thing it

can

Quote:

find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to

Quote:

minimum, and pass as much data as possible all at once in an array.

Quote:

course,
that's not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping

for

Quote:

them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a

pair

Quote:

of
rects to a texture's mapped list, and SDL_RenderPresent into an

operation

Quote:

that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as

buffers

Quote:

and
passes them to the renderer all at once.

I've got a Delphi implementation that sped up my rendering

significantly,

Quote:

about
3x faster than stock SDL rendering. With a multimap in C, I could

port

Quote:

this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn't, is
Z-order. If you're no longer deterministically drawing in the order

Quote:

which
draw calls are received, but instead grouping them by texture, which

are

Quote:

in turn
sorted by hash order (essentially random,) you need a Z-order

parameter to

Quote:

make
sure the right things draw on top of the right things, and what you

end up

Quote:

with
is an array of multimaps.

I know it probably sounds very complicated, but it's only a few

hundred

Quote:

lines of
code (plus the implementations of the hash and the dynamic array,

because

Quote:

C
doesn't have them built in) and it makes rendering *much* faster.

Mason

--------------------------------------------------------------------------------

Quote:

*From:* Ryan C. Gordon <mailto:

<mailto: <mailto:>>

Quote:

*To:* SDL Development List <mailto:

<mailto: <mailto:>>

Quote:

*Sent:* Monday, April 15, 2013 6:20 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Quote:

Does anyone (particularly Sam and Ryan) have any objections to

pulling

Quote:

an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL's 3d-accelerated

rendering,

Quote:

but it would require a multimap. Neither SDL nor the C standard

library

Quote:

has a multimap implementation, but I could build one with uthash

and

Quote:

utarray <http://troydhanson.github.io/uthash/>, which are both

fairly

Quote:

small and BSD-licensed.

I'd rather we have a simple hashtable implementation in SDL.

What's the plan?

--ryan.

_______________________________________________
SDL mailing list
<mailto: <mailto: <mailto:>

<mailto: <mailto: <mailto: <mailto:>>

Quote:

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list
<mailto: <mailto: <mailto:>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list
<mailto: <mailto: <mailto:>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Thu Apr 18, 2013 5:34 am

But he insists it's still a huge speed up.

But yeah, I was thinking, and it's very likely a lot of programmers
will just leave blending turned on, and unless you keep track of all
the pixels in the texture or something you'll have to assume
translucency conflicts can happen. The only "easy" workaround would be
that dirty rectangles-like suggestion from earlier.

Of course there's also the question about how much overlap is between
each draw (i.e. how much you draw on top of what's already drawn).

2013/4/18, Forest Hale:

Quote:

Z buffering does not solve problems for blended transparency, only alpha
test, blended transparency still requires sorting back to front, which
totally wrecks any texture batching optimizations.

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Thu Apr 18, 2013 4:50 pm

*facepalm*

Did you not read what I just wrote?

Did you *seriously* not read it at all? I just got through explaining that my
implementation DOES NOT USE Z-BUFFERING; that it does work by
sorting back to front, that it I've been using it and timed it and it speeds
things up by a factor of 3 on large scenes?

And now you reply and say "no, you can't use Z-buffering; you need to
sort back to front to avoid screwing up blended transparency, and you
can't do *that* or it will wreck the performance gains"?!?

Seriously?

Mason

From: Forest Hale
To:
Sent: Wednesday, April 17, 2013 10:24 PM
Subject: Re: [SDL] External dependencies in the renderer?

Z buffering does not solve problems for blended transparency, only alpha test, blended transparency still requires sorting back to front, which totally wrecks any texture batching optimizations.

On 04/16/2013 05:48 PM, Mason Wheeler wrote:

Quote:

Nathaniel J Fries

Joined: 30 Mar 2010

Posts: 444

Posted: Thu Apr 18, 2013 10:12 pm

The terminology disagreement is due to the same term ("z") being seen in two different contexts.

Z layering (or Z ordering) is a technique in 2D rendering used to separate layers. Any item with the same layer ("z") can be rendered in any order, and the resulting graphical output would be the same (or at least close enough for the programmer's needs). Most 2D games use this technique (either explicitly or simply by a proper ordering of draw operations) in order to prevent a ground tile from being rendered on top of the player and other such issues.

Z buffering is a technique in 3D rendering (better termed depth buffering) that allows the programmer to define the depth of objects, which is often used by hardware to cull the scene.

Transluscency is an issue for depth buffering, since an opaque texture drawn behind a transluscent texture will be culled.
It is not usually an issue for Z layering (unless this is implemented using the hardware's depth buffer), since culling is not the purpose (render order is).

What Mason is suggesting is Z layering, and not Z buffering, which means that nothing is culled.

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Thu Apr 18, 2013 10:22 pm

Yes, that's correct.

Mason

From: Nathaniel J Fries
To:
Sent: Thursday, April 18, 2013 3:12 PM
Subject: Re: [SDL] External dependencies in the renderer?

The terminology disagreement is due to the same term ("z") being seen in two different contexts.

Z layering (or Z ordering) is a technique in 2D rendering used to separate layers. Any item with the same layer ("z") can be rendered in any order, and the resulting graphical output would be the same (or at least close enough for the programmer's needs). Most 2D games use this technique (either explicitly or simply by a proper ordering of draw operations) in order to prevent a ground tile from being rendered on top of the player and other such issues.

Z buffering is a technique in 3D rendering (better termed depth buffering) that allows the programmer to define the depth of objects, which is often used by hardware to cull the scene.

Transluscency is an issue for depth buffering, since an opaque texture drawn behind a transluscent texture will be culled.
It is not usually an issue for Z layering (unless this is implemented using the hardware's depth buffer), since culling is not the purpose (render order is).

What Mason is suggesting is Z layering, and not Z buffering, which means that nothing is culled.

Nate Fries

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Nathaniel J Fries

Joined: 30 Mar 2010

Posts: 444

Posted: Fri Apr 19, 2013 12:55 am

Despite understanding what you're referring to, I don't like the idea of having z-ordering in the SDL core library.

However, the problem you identified is fairly important. Although SDL is a generic solution, and generic solutions can never be as optimal as hand-tailored ones, by not providing a means of optimizing for the case of order-independent rendering of the same texture (which is quite common in 2D games, especially if spritesheets are used), SDL unnecessarily reduces framerates in OpenGL and Direct3D.

I would again refer to the notion of a simple function to perform this, along the lines of SDL_RenderCopyMulti (or SDL_RenderNCopy, etc).

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Fri Apr 19, 2013 2:20 am

The problem with that is that it forces the developer to do essentially the same thing I'm proposing, just on their end.

If you have a scene with a bunch of sprites in it, they're most likely not ordered by texture, and certainly not *grouped* by texture. That's not a natural way to set it up, and not something someone's going to do unless they're specifically trying to do what I'm trying to do here. Which means that at draw time, at some point, someone somewhere has to translate the list of what's being drawn into some sort of structure that's grouped by texture--such as a multimap.

As long as "group by texture" has to be done one way or another in order to get the performance benefits we're talking about here, why force it to be outside of the API and require every developer to reinvent the wheel? That's what libraries are *for*, isn't it?

Mason

From: Nathaniel J Fries
To:
Sent: Thursday, April 18, 2013 5:56 PM
Subject: Re: [SDL] External dependencies in the renderer?

Despite understanding what you're referring to, I don't like the idea of having z-ordering in the SDL core library.

However, the problem you identified is fairly important. Although SDL is a generic solution, and generic solutions can never be as optimal as hand-tailored ones, by not providing a means of optimizing for the case of order-independent rendering of the same texture (which is quite common in 2D games, especially if spritesheets are used), SDL unnecessarily reduces framerates in OpenGL and Direct3D.

I would again refer to the notion of a simple function to perform this, along the lines of SDL_RenderCopyMulti (or SDL_RenderNCopy, etc).

Nate Fries

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Sik

Joined: 26 Nov 2011

Posts: 905

Posted: Fri Apr 19, 2013 3:17 am

First of all, before we continue: does *anywhere* in the OpenGL or
Direct3D specs say that primitives are guaranteed to be rendered in
the *same* order as they're sent? Because otherwise I really doubt
that ordering is going to work reliably. It may work on some systems
but break miserably on others.

(and that reminds me: we'll need patches for all hardware-accelerated
renderers if we want to accept this method Razz

)

2013/4/18, Mason Wheeler:

Quote:

The problem with that is that it forces the developer to do essentially the
same thing I'm proposing, just on their end.

If you have a scene with a bunch of sprites in it, they're most likely not
ordered by texture, and certainly not *grouped* by texture.Â That's not a
natural way to set it up, and not something someone's going to do unless
they're specifically trying to do what I'm trying to do here.Â Which means
that at draw time, at some point, someone somewhere has to translate the
list of what's being drawn into some sort of structure that's grouped by
texture--such as a multimap.

As long as "group by texture" has to be done one way or another in order to
get the performance benefits we're talking about here, why force it to be
outside of the API and require every developer to reinvent the wheel?
That's what libraries are *for*, isn't it?

Mason

________________________________
From: Nathaniel J Fries
To:
Sent: Thursday, April 18, 2013 5:56 PM
Subject: Re: [SDL] External dependencies in the renderer?

Despite understanding what you're referring to, I don't like the idea of
having z-ordering in the SDL core library.

However, the problem you identified is fairly important. Although SDL is a
generic solution, and generic solutions can never be as optimal as
hand-tailored ones, by not providing a means of optimizing for the case of
order-independent rendering of the same texture (which is quite common in 2D
games, especially if spritesheets are used), SDL unnecessarily reduces
framerates in OpenGL and Direct3D.

I would again refer to the notion of a simple function to perform this,
along the lines of SDL_RenderCopyMulti (or SDL_RenderNCopy, etc).

________________________________

Nate Fries
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Mason Wheeler

Guest

Posted: Fri Apr 19, 2013 5:53 am

Back to this again?

Do you understand the difference between ordering and grouping?

Mason

From: Sik the hedgehog
To: Mason Wheeler; SDL Development List
Sent: Thursday, April 18, 2013 8:17 PM
Subject: Re: [SDL] External dependencies in the renderer?

First of all, before we continue: does *anywhere* in the OpenGL or
Direct3D specs say that primitives are guaranteed to be rendered in
the *same* order as they're sent? Because otherwise I really doubt
that ordering is going to work reliably. It may work on some systems
but break miserably on others.

(and that reminds me: we'll need patches for all hardware-accelerated
renderers if we want to accept this method Razz

)

2013/4/18, Mason Wheeler:

Quote:

The problem with that is that it forces the developer to do essentially the
same thing I'm proposing, just on their end.

If you have a scene with a bunch of sprites in it, they're most likely not
ordered by texture, and certainly not *grouped* by texture. That's not a
natural way to set it up, and not something someone's going to do unless
they're specifically trying to do what I'm trying to do here. Which means
that at draw time, at some point, someone somewhere has to translate the
list of what's being drawn into some sort of structure that's grouped by
texture--such as a multimap.

As long as "group by texture" has to be done one way or another in order to
get the performance benefits we're talking about here, why force it to be
outside of the API and require every developer to reinvent the wheel?
That's what libraries are *for*, isn't it?

Mason

________________________________
From: Nathaniel J Fries
To:
Sent: Thursday, April 18, 2013 5:56 PM
Subject: Re: [SDL] External dependencies in the renderer?

Despite understanding what you're referring to, I don't like the idea of
having z-ordering in the SDL core library.

However, the problem you identified is fairly important. Although SDL is a
generic solution, and generic solutions can never be as optimal as
hand-tailored ones, by not providing a means of optimizing for the case of
order-independent rendering of the same texture (which is quite common in 2D
games, especially if spritesheets are used), SDL unnecessarily reduces
framerates in OpenGL and Direct3D.

I would again refer to the notion of a simple function to perform this,
along the lines of SDL_RenderCopyMulti (or SDL_RenderNCopy, etc).

________________________________

Nate Fries
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

John

Guest

Posted: Fri Apr 19, 2013 2:57 pm

On 04/18/2013 11:17 PM, Sik the hedgehog wrote:

Quote:

First of all, before we continue: does *anywhere* in the OpenGL or
Direct3D specs say that primitives are guaranteed to be rendered in
the *same* order as they're sent?

Yes. Otherwise it'd be impossible to composite anything reliably without
flushing after every primitive.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

John

Guest

Posted: Fri Apr 19, 2013 3:10 pm

We all understand the difference. You have proposed to re-order primitives
according to their texture, and that is why we are discussing "ordering".

On 04/19/2013 01:53 AM, Mason Wheeler wrote:

Quote:

Back to this again?

Do you understand the difference between ordering and grouping?

Mason

--------------------------------------------------------------------------------
*From:* Sik the hedgehog
*To:* Mason Wheeler; SDL Development List

*Sent:* Thursday, April 18, 2013 8:17 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

First of all, before we continue: does *anywhere* in the OpenGL or
Direct3D specs say that primitives are guaranteed to be rendered in
the *same* order as they're sent? Because otherwise I really doubt
that ordering is going to work reliably. It may work on some systems
but break miserably on others.

(and that reminds me: we'll need patches for all hardware-accelerated
renderers if we want to accept this method Razz

)

2013/4/18, Mason Wheeler <mailto:>:

Quote:

The problem with that is that it forces the developer to do essentially the
same thing I'm proposing, just on their end.

If you have a scene with a bunch of sprites in it, they're most likely not
ordered by texture, and certainly not *grouped* by texture. That's not a
natural way to set it up, and not something someone's going to do unless
they're specifically trying to do what I'm trying to do here. Which means
that at draw time, at some point, someone somewhere has to translate the
list of what's being drawn into some sort of structure that's grouped by
texture--such as a multimap.

As long as "group by texture" has to be done one way or another in order to
get the performance benefits we're talking about here, why force it to be
outside of the API and require every developer to reinvent the wheel?
That's what libraries are *for*, isn't it?

Mason

________________________________
From: Nathaniel J Fries <mailto:>
To: <mailto:
Sent: Thursday, April 18, 2013 5:56 PM
Subject: Re: [SDL] External dependencies in the renderer?

Despite understanding what you're referring to, I don't like the idea of
having z-ordering in the SDL core library.

However, the problem you identified is fairly important. Although SDL is a
generic solution, and generic solutions can never be as optimal as
hand-tailored ones, by not providing a means of optimizing for the case of
order-independent rendering of the same texture (which is quite common in 2D
games, especially if spritesheets are used), SDL unnecessarily reduces
framerates in OpenGL and Direct3D.

I would again refer to the notion of a simple function to perform this,
along the lines of SDL_RenderCopyMulti (or SDL_RenderNCopy, etc).

________________________________

Nate Fries
_______________________________________________
SDL mailing list
<mailto:
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Re: External dependencies in the renderer?

Nathaniel J Fries

Joined: 30 Mar 2010

Posts: 444

Posted: Fri Apr 19, 2013 3:59 pm

Mason Wheeler wrote:

The programmer would need to implement ordering on top of the library anyway in order to feed the graphics to SDL in the right order. And whatever APIs SDL could expose to access the underlying ordering structure, many programmers would probably find less preferable to the way they've always done it. So you have a situation of redundant ordering.

Grouping is what is actually needed, but grouping without considering order effectively negates order, so SDL must either group and order or do neither.

I proposed earlier a spritebatch mechanism for SDL which would do all this internally, but it was suggested that this be an extension library; however to even implement that in a non-hackish manner, SDL would still need to provide an interface for rendering the same texture multiple times.

External dependencies in the renderer?

Forest Hale

Guest

Posted: Fri Apr 19, 2013 10:44 pm

I did read what you said, but my interpretation was that you wanted to blast all primitives out sequentially by texture without regard to Z layer, and then have the Z buffer hardware sort it out.
This interpretation was incorrect, I apologize.

Is there a reason to use a multimap rather than a radix sort? Presumably a sort key consisting of several bytes of state (Z layer, blendfunc, texture) would achieve the desired results.

On 04/18/2013 09:44 AM, Mason Wheeler wrote:

Quote:

*facepalm*

Did you not read what I just wrote?

Did you *seriously* not read it at all? I just got through explaining that my
implementation DOES NOT USE Z-BUFFERING; that it does work by
sorting back to front, that it I've been using it and timed it and it speeds
things up by a factor of 3 on largescenes?

And now you reply and say "no, you can't use Z-buffering; you need to
sort back to front to avoid screwing up blended transparency, and you
can't do *that* or it will wreck the performance gains"?!?

Seriously?

Mason

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
*From:* Forest Hale
*To:*
*Sent:* Wednesday, April 17, 2013 10:24 PM
*Subject:* Re: [SDL] External dependencies in the renderer?

Z buffering does not solve problems for blended transparency, only alpha test, blended transparency still requires sorting back to front, which totally wrecks any texture batching optimizations.

On 04/16/2013 05:48 PM, Mason Wheeler wrote:

Quote:

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Jared Maddox

Guest

Posted: Sat Apr 20, 2013 2:24 am

Quote:

Date: Thu, 18 Apr 2013 19:20:36 -0700 (PDT)
From: Mason Wheeler
To: ""
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:

Content-Type: text/plain; charset="iso-8859-1"

The problem with that is that it forces the developer to do essentially the
same thing I'm proposing, just on their end.

It's still perfectly fine. Any language that supports C's qsort, C++'s
std::map, or any half-way similar functionality, will by definition
provide all of the primitives needed to actually implement a solution
to this. It's not a big deal once they recognize that they need a
sorted data structure. If SDL provided a generic C-language tree
implementation then it would certainly be more convenient to everyone,
but that's a minor thing.

Quote:

If you have a scene with a bunch of sprites in it, they're most likely not
ordered by texture, and certainly not *grouped* by texture.? That's not a
natural way to set it up, and not something someone's going to do unless
they're specifically trying to do what I'm trying to do here.? Which means
that at draw time, at some point, someone somewhere has to translate the
list of what's being drawn into some sort of structure that's grouped by
texture--such as a multimap.

I've written 3d code that does the job. Depending on the complexity of
your modelling, you can do this in C++'s standard containers in as
little as ~50 lines of code (and that's a very-ballpark estimate, I
use a lot of whitespace in my code).

Quote:

As long as "group by texture" has to be done one way or another in order to
get the performance benefits we're talking about here, why force it to be
outside of the API and require every developer to reinvent the wheel??
That's what libraries are *for*, isn't it?

You know, perhaps I'm confusing you with someone else, but I seem to
remember you wanting to rip OUT portions of SDL. Now you're trying to
add in parts that the rest of us consider only partially appropriate,
DESPITE already having been told that it requires a forbidden API
change?

Quote:

Date: Fri, 19 Apr 2013 09:00:09 -0700
From: "Nathaniel J Fries"
To:
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:
Content-Type: text/plain; charset="iso-8859-1"

Quote:

Grouping is what is actually needed, but grouping without considering order
effectively negates order, so SDL must either group and order or do
neither.

I proposed earlier a spritebatch mechanism for SDL which would do all this
internally, but it was suggested that this be an extension library; however
to even implement that in a non-hackish manner, SDL would still need to
provide an interface for rendering the same texture multiple times.

For that matter, if the "multi-render" function were added then that
would be enough for my "buffering-render" suggestion to be implemented
with an external library. It's a really straightforward optimization,
and doesn't need to break the API.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Re: External dependencies in the renderer?

Nathaniel J Fries

Joined: 30 Mar 2010

Posts: 444

Posted: Sat Apr 20, 2013 2:32 pm

Jared Maddox wrote:

Quote:

I'm not honestly a fan of the spritebatch mechanism, I'd rather group them myself and call a multi-rendering function.
However, the spritebatch mechanism already has real-world uses (in fact, it is the simplest method for rendering 2D graphics in Microsoft's XNA), whereas a simple multi-copy function does not AFAIK.

External dependencies in the renderer?

Pallav Nawani

Joined: 19 May 2011

Posts: 122

Location: Dehradun, India

Posted: Mon Apr 22, 2013 7:36 am

If Mason wants to implement a portable, performance improving
optimization in the SDL renderer pipeline,
I totally don't see a problem with it.

Some may not want an external Hash Table implementation - okay, but I
don't see why SDL shouldn't be rendering stuff faster than it already is.
I don't understand the opposition - at all.

If it doesn't work - well, that's what source control is for.

On 4/20/2013 7:54 AM, Jared Maddox wrote:

Quote:

Date: Fri, 19 Apr 2013 09:00:09 -0700
From: "Nathaniel J Fries"
To:
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:
Content-Type: text/plain; charset="iso-8859-1"

Grouping is what is actually needed, but grouping without considering order
effectively negates order, so SDL must either group and order or do
neither.

I proposed earlier a spritebatch mechanism for SDL which would do all this
internally, but it was suggested that this be an extension library; however
to even implement that in a non-hackish manner, SDL would still need to
provide an interface for rendering the same texture multiple times.

--
*Pallav Nawani*
*Game Designer/CEO*
http://www.ironcode.com
Twitter: http://twitter.com/Ironcode_Gaming
Facebook: http://www.facebook.com/Ironcode.Gaming
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

Scott Percival

Guest

Posted: Mon Apr 22, 2013 10:25 am

There's no opposition against the idea of faster draw calls. What you're seeing here is interest in finding the best way to implement the optimisation, both for the developers who'll use it and the SDL maintainers.

The renderer API was designed to use the unbatched painter's algorithm approach to blitting. As discussed, it's non-trivial to cache a bunch of these draw calls when there's zero guarantee that the state will remain the same between each one. The worst case is that you'll end up breaking a pile of software which relies on the expectation that blits will happen immediately after calling SDL_RenderCopy. Hence the discussion about adding a new batch rendering method alongside the old one.

The SDL 2.0 API has been frozen, and there is released software using this API; now is exactly the wrong time to be cavalier about breaking things.

On 22 April 2013 15:36, Pallav Nawani wrote:

Quote:

If Mason wants to implement a portable, performance improving optimization in the SDL renderer pipeline,
I totally don't see a problem with it.

Some may not want an external Hash Table implementation - okay, but I don't see why SDL shouldn't be rendering stuff faster than it already is.
I don't understand the opposition - at all.

If it doesn't work - well, that's what source control is for.

On 4/20/2013 7:54 AM, Jared Maddox wrote:

Quote:

Date: Thu, 18 Apr 2013 19:20:36 -0700 (PDT)
From: Mason Wheeler
To: ""
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:
<1366338036.95197..ne1.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"

The problem with that is that it forces the developer to do essentially the
same thing I'm proposing, just on their end.

Quote:

Date: Fri, 19 Apr 2013 09:00:09 -0700
From: "Nathaniel J Fries"
To:
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID: <1366387208.m2f.36714@forums.libsdl.org>
Content-Type: text/plain; charset="iso-8859-1"

Grouping is what is actually needed, but grouping without considering order
effectively negates order, so SDL must either group and order or do
neither.

I proposed earlier a spritebatch mechanism for SDL which would do all this
internally, but it was suggested that this be an extension library; however
to even implement that in a non-hackish manner, SDL would still need to
provide an interface for rendering the same texture multiple times.

The SDL forums have moved to discourse.libsdl.org. This is just a read-only archive of the previous forums, to keep old links working.

The SDL forums have moved to discourse.libsdl.org.
This is just a read-only archive of the previous forums, to keep old links working.