External dependencies in the renderer? |
Mason Wheeler
Guest
|
Does anyone (particularly Sam and Ryan) have any objections to pulling an external library into SDL? Because I have an idea that could significantly improve the performance of SDL's 3d-accelerated rendering, but it would require a multimap. Neither SDL nor the C standard library has a multimap implementation, but I could build one with uthash and utarray, which are both fairly small and BSD-licensed.
Mason |
|||||||||||
|
External dependencies in the renderer? |
Ryan C. Gordon
Guest
|
On 4/15/13 2:46 PM, Mason Wheeler wrote:
I'd rather we have a simple hashtable implementation in SDL. What's the plan? --ryan. _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
External dependencies in the renderer? |
Jonathan Greig
Guest
|
Ryan,
Can you elaborate on the reason why uthash is not attractive to you? Just wondering since I was looking at it possibly using it recently for the Embroidermodder 2 project. I came across it after looking at some hash benchmarks and the license is appealing. It's a single header so if the interface isn't to your liking, making a small wrapper around it should be fairly straight forward. Have you or Sam done any work on an SDL hash implementation? - Swyped from my droid.
|
|||||||||||||||||
|
External dependencies in the renderer? |
Sik
|
I think the problem is the fact it's an extra dependency.
That said, I'm not very fond of its use of macros at all :S (I guess this is one place where C++ wins by far, templates would make this trivial) I wonder if that's an issue too. 2013/4/15, Jonathan Greig:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||
|
External dependencies in the renderer? |
Jonathan Greig
Guest
|
Sik,
I completely understand about the extra dependency issue, although with it being a single header, it should be hardly a problem shipping it with the SDL sources. At least that's the way I look at it. I don't particularly care for macros either so maybe that could be part of it too. - Swyped from my droid.
|
|||||||||||||||||||
|
External dependencies in the renderer? |
John
Guest
|
What is the optimization?
On 04/15/2013 02:46 PM, Mason Wheeler wrote:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
External dependencies in the renderer? |
Andreas Schiffler
Guest
|
Same gut feel here - seems reasonable to extend SDL functionality via a copy-and-add of the single uthash.h file as the uthash license allows redistribution in source form. Judging from its test coverage, the code seems reasonably stable so that SDL maintainers would not have to expect a lot of future updates in the SDL source tree of this file either.
In my view it really comes down what the user benefit would actually be over a custom implementation inside a SDL based App (if possible). On 4/15/2013 6:59 PM, Jonathan Greig wrote:
|
|||||||||||||||||||||||
|
External dependencies in the renderer? |
Mason Wheeler
Guest
|
Here's the basic idea.
The internals of SDL's rendering API are atrocious, to put it bluntly. It does everything in Immediate Mode, which modern versions of OpenGL and Direct3D have moved away from because it's so slow. GLES doesn't even support Immediate Mode, so if you look at SDL's GLES renderer, it does the closest thing it can find to Immediate Mode, sending one call to OpenGL every time someone calls SDL_RenderCopy. The way to do rendering fast is to keep the number of library calls to a minimum, and pass as much data as possible all at once in an array. Of course, that's not the way people use SDL; they use SDL to draw a bunch of sprites, one at a time. So to be fast, SDL has to keep track of the bookkeeping for them. The way to do this is with a multimap, mapping textures to lists of drawing coordinates. You turn SDL_RenderCopy into an operation that adds a pair of rects to a texture's mapped list, and SDL_RenderPresent into an operation that iterates over the multimap and for each texture, builds two arrays of vertices (one for screen coordinates and one for texture coordinates) as buffers and passes them to the renderer all at once. I've got a Delphi implementation that sped up my rendering significantly, about 3x faster than stock SDL rendering. With a multimap in C, I could port this concept to the SDL internals. The one tricky thing here, the concept that my renderer has that SDL doesn't, is Z-order. If you're no longer deterministically drawing in the order in which draw calls are received, but instead grouping them by texture, which are in turn sorted by hash order (essentially random,) you need a Z-order parameter to make sure the right things draw on top of the right things, and what you end up with is an array of multimaps. I know it probably sounds very complicated, but it's only a few hundred lines of code (plus the implementations of the hash and the dynamic array, because C doesn't have them built in) and it makes rendering *much* faster. Mason From: Ryan C. Gordon To: SDL Development List Sent: Monday, April 15, 2013 6:20 PM Subject: Re: [SDL] External dependencies in the renderer? On 4/15/13 2:46 PM, Mason Wheeler wrote:
I'd rather we have a simple hashtable implementation in SDL. What's the plan? --ryan. _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
External dependencies in the renderer? |
Scott Percival
Guest
|
Looking at the GLES2 renderer (which is probably the cleanest implementation we've got right now), isn't there an expectation with the design of the API that the RenderCopy operation is carried out immediately? So in theory, couldn't someone could call RenderCopy with an SDL_Texture * and two rects, then mess with the contents of the texture, then call RenderPresent?
On 16 April 2013 10:41, Mason Wheeler wrote:
|
|||||||||||||||
|
External dependencies in the renderer? |
Ryan C. Gordon
Guest
|
I haven't even clicked on the link, so I can't say anything about uthash. As an external piece of code, I'm hesitant to add it to SDL, since that has caused annoyances in the past, unless there was a really good reason. (Doubly-so for a hashtable. I mean, a hashtable? Do we really need to scour the internet for a hashtable?) I imagine it's probably a fine piece of code in itself, though. --ryan. _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
External dependencies in the renderer? |
Sik
|
Um, is a hashtable needed for this idea as opposed to a regular array?
I mean, you're literally just adding entries to a queue, you don't even need to retrieve them back. As for the Z order, just assign an unique Z to each entry and be done with it. Sure, you may run out of range, but at that point you probably have queued up enough primitives to be worth flushing the batch. Also yeah, I wonder about the textures too, although I guess you can always force a flush in that case. 2013/4/15, Ryan C. Gordon:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||
|
External dependencies in the renderer? |
Mason Wheeler
Guest
|
A hashtable is needed because this is *not* just a queue. To get good performance out of it, it has to be grouped by texture. The idea is that you select each texture once, and perform all of the drawing for it all at once. What we have now is just a queue, and it's horribly slow. On a complicated scene, it's the difference between a few dozen API calls, or a few tens of thousands of them. (Yes, I have rendered scenes that involved with SDL.)
Mason From: Sik the hedgehog To: SDL Development List Sent: Monday, April 15, 2013 8:01 PM Subject: Re: [SDL] External dependencies in the renderer? Um, is a hashtable needed for this idea as opposed to a regular array? I mean, you're literally just adding entries to a queue, you don't even need to retrieve them back. As for the Z order, just assign an unique Z to each entry and be done with it. Sure, you may run out of range, but at that point you probably have queued up enough primitives to be worth flushing the batch. Also yeah, I wonder about the textures too, although I guess you can always force a flush in that case. 2013/4/15, Ryan C. Gordon:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||
|
External dependencies in the renderer? |
John
Guest
|
Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by delaying primitives, then re-ordering them by texture and Z. I've seen this before. It can be done, but there are caveats. The biggest challenge is you need to cache the entire GL state for each delayed primitive. The implementation is effectively an "intermediate mode" layer unto itself. The layer is a massive `todo` buffer with three phases: queue everything, analyze (re-order) the queue, then execute the queue as a batch. If you don't choose the batch size wisely, it's possible to lose any parallelism that you might have had when GL calls were mixed in with scene graph calls. The second challenge is to support transparency and other effects that depend on multiple passes in a specific order, or that play games with the z-buffer (or other tests.) On 04/15/2013 10:41 PM, Mason Wheeler wrote:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||
|
External dependencies in the renderer? |
gabomdq
|
I'm beating my own drum by saying this, but the SDL_RenderGeometry function I made may be a better compromise to enhance rendering speed, assuming the task at hand implies rendering multiple parts of the same texture. If you are rendering a low number of quads out of each texture, it'll probably give you the same performance as regular SDL_RenderCopy (it has no need for a hash table though).
Anyway, it'll probably come down to the same sort of arguments we saw before, and the "why don't you do it in OpenGL" will eventually pop up 2013/4/16 John
-- Gabriel. |
|||||||||||||||
|
External dependencies in the renderer? |
Scott Percival
Guest
|
Blimey, forgot about transparency. John's right, if you start including semitransparent objects into your queue, then you can't just throw them in the texture-centric batch and let the depth test sort them out; you'd have to run a separate pass afterwards in sequential painting order.
On 16 April 2013 11:36, John wrote:
|
|||||||||||||||
|
External dependencies in the renderer? |
Sik
|
Another thing is that scenes that complex will most likely have many
textures anyway which is bound to completely negate the advantage. And yeah, considering the SDL renderer would be most likely used to render sprites in 2D, proper transparency support is pretty much a must (even if you don't draw "proper" translucent stuff you may be bound to be doing it with antialiased borders). Coming to think on it, this also means sprites *must* be rendered in order, otherwise the depth buffer will completely screw up the transparency. Given the order of the primitives is completely up to the GPU, there isn't much that can be done short of multiple calls. 2013/4/16, Scott Percival:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||||||
|
External dependencies in the renderer? |
Mason Wheeler
Guest
|
Yeah. That's what the Z order is there for.
From: Sik the hedgehog To: SDL Development List Sent: Monday, April 15, 2013 9:48 PM Subject: Re: [SDL] External dependencies in the renderer? Another thing is that scenes that complex will most likely have many textures anyway which is bound to completely negate the advantage. And yeah, considering the SDL renderer would be most likely used to render sprites in 2D, proper transparency support is pretty much a must (even if you don't draw "proper" translucent stuff you may be bound to be doing it with antialiased borders). Coming to think on it, this also means sprites *must* be rendered in order, otherwise the depth buffer will completely screw up the transparency. Given the order of the primitives is completely up to the GPU, there isn't much that can be done short of multiple calls. 2013/4/16, Scott Percival:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||||||
|
External dependencies in the renderer? |
Mason Wheeler
Guest
|
Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the graphics card, which assumption is borne out by testing data. It intends to minimize the number of *drawing calls* by delaying primitives and sending them in batches, ordered by Z and texture. You avoid having to cache "the entire GL state" by the simple expedient of flushing the to-do buffer if a call comes in that changes the GL state. All you need to keep cached is the map of textures to arrays of coordinates. And transparency works fine as long as you have a Z parameter to order by. Things get drawn on top of each other in the prescribed order. I've been using this for a while now. The system works. From: John To: Sent: Monday, April 15, 2013 8:36 PM Subject: Re: [SDL] External dependencies in the renderer? Ok, so the optimization assumes that a rendering bottleneck is the cost of switching textures, and intends to minimize the number texture switches by delaying primitives, then re-ordering them by texture and Z. I've seen this before. It can be done, but there are caveats. The biggest challenge is you need to cache the entire GL state for each delayed primitive. The implementation is effectively an "intermediate mode" layer unto itself. The layer is a massive `todo` buffer with three phases: queue everything, analyze (re-order) the queue, then execute the queue as a batch. If you don't choose the batch size wisely, it's possible to lose any parallelism that you might have had when GL calls were mixed in with scene graph calls. The second challenge is to support transparency and other effects that depend on multiple passes in a specific order, or that play games with the z-buffer (or other tests.) On 04/15/2013 10:41 PM, Mason Wheeler wrote:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||
|
External dependencies in the renderer? |
External dependencies in the renderer? |
Sik
|
Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)? Otherwise ordering by Z is pretty much eventually going to break in the future. 2013/4/16, Mason Wheeler:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||
|
External dependencies in the renderer? |
Forest Hale
Guest
|
Somehow this turned into a scenegraph discussion, to which I recommend this reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]] On a topic of correctness however, isn't it kind of implicit in a 2D graphics API that your draw order is sacred? You usually want things to overlap in a specific way. An order-preserving technique has no need of hashes or any such thing, it only needs to skip issuing state calls that are the same, and some things that are not order-dependent can be combined regardless (like using glBufferSubData to write multiple quads into the vertex buffer before drawing any of them, for a considerable savings in driver overhead). On 04/15/2013 10:25 PM, Sik the hedgehog wrote:
-- LordHavoc Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces Co-designer of Nexuiz - http://alientrap.org/nexuiz "War does not prove who is right, it proves who is left." - Unknown "Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass "A game is a series of interesting choices." - Sid Meier _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||
|
External dependencies in the renderer? |
Sik
|
If we want to be blunt, the real issue here isn't switching to a
scenegraph (besides the complexity it may bring - it's debatable whether it's worth it or just tell users to use OpenGL directly for those extreme cases) but bringing in an external dependency to SDL... 2013/4/16, Forest Hale:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||||||||
|
External dependencies in the renderer? |
Jared Maddox
Guest
|
As long as you have access to uintptr_t, there should be few to zero concerns about the range of anything that can be described as an id number, and it's pretty easy to describe z-order as an id in this case.
After double checking the header files to make certain I was remembering correctly, it looks like a dirty-rect queue should work fairly well. Algorithm: 1) Check the current render command against the dirty-rect(s) of the current queue node, if 1a: They overlap, then create a new queue node and make this commend the first entry in that node, else 1b: Add this command to the current node, and expand the node's dirty-rect(s) to cover the new area. You can use a new node every time you use a different texture, you can combine multiple textures in a single node (after all, you know they won't overlap), you can send point & line data either the same way, or with custom nodes, you can provide hook functions to shoe-horn your own rendering system into the queue, etc. This would also be a first step towards the oft-requested (albeit somewhat bone-headed) feature of issuing rendering calls from whichever thread you want. The main issue is how the system would work. I think that what Mason's suggesting would require that you look through nodes until you find a dirty-rect collision (even a partial dirty-rect collision would count, and the only thing that gets looked at is the actual coordinates, whether e.g. the texture is or isn't the same doesn't matter). At that point you go back to the most-recently-checked node that used the same texture and add your command there, or add your command in a new node if there wasn't a previous node. That should (I think) provide the correct sequencing, while also ensuring that you reuse textures as few times as you can get away with. _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||
|
External dependencies in the renderer? |
Forest Hale
Guest
|
If SDL Renderer is turning into a beast, it should be punted to its own library, much like SDL_mixer and so on.
As a matter of practicality however, I think it is fine being in the core, so long as it stays simple and direct. My understanding of the problem that warranted this discussion is that it is abusive about Draw calls and texture switches or some such? That has nothing at all to do with draw order, and if the draw order is less important than performance then the app should take care of sorting them first, it isn't the duty of SDL to fix an app performance issue. As far as some underlying technical details of GL and D3D APIs, I would recommend implementing a draw queue (fully buffered API) that is flushed to real calls to the driver after enough vertex data has accumulated to make it worthwhile, this also allows multiple consecutive draws to be merged if their state is the same, a lot of optimizations can be done once you have that "lookahead" capability inherent in the flush routine. On 04/16/2013 12:30 AM, Sik the hedgehog wrote:
-- LordHavoc Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces Co-designer of Nexuiz - http://alientrap.org/nexuiz "War does not prove who is right, it proves who is left." - Unknown "Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass "A game is a series of interesting choices." - Sid Meier _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||||||||||
|
External dependencies in the renderer? |
Sik
|
The problem is that SDL issues calls for every thing you draw, i.e. it
does things the naive way rather than being 100% optimized for GPUs (which really are more optimized towards rendering entire complex 3D scenes than rendering generic 2D stuff). He wants to change the way the renderer works so it fits that better, the problem being he wants to pull in an external dependency (uthash). 2013/4/16, Forest Hale:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||||||||||||||
|
External dependencies in the renderer? |
Forest Hale
Guest
|
If you buffer draws, you get higher performance.
If you additionally sort them, you get even higher performance but break the most basic assumption of a 2D graphics API - that things occur in the order specified. I see no reason to use uthash here, I do see great reason to buffer things. Why is uthash still the subject of this discussion? We're not going to reach a conclusion on the broad topic of outside dependencies, it's better to focus on the specific problem at hand. On 04/16/2013 02:08 AM, Sik the hedgehog wrote:
-- LordHavoc Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces Co-designer of Nexuiz - http://alientrap.org/nexuiz "War does not prove who is right, it proves who is left." - Unknown "Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass "A game is a series of interesting choices." - Sid Meier _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||||||||||||||||
|
External dependencies in the renderer? |
Constantin Berhard
Guest
|
to my mind came the following idea:
have a todo queue, but only for one texture. Flush it when it's full or when another texture should be rendered. -> if speed is an issue for a specific program, the programmer can sort the calls by texture and get the speedup -> we don't need a complicated system or an external dependency in SDL _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||
|
Nathaniel J Fries
|
Optimization is a task of the programmer, not the library.
That said, SDL's interface is too high-level to enable the programmer to optimize render performance. I see a couple options here: 1) Add another function for rendering the same texture multiple times:
2) Add a "sprite batch" API:
|
|||||||||||||||
|
External dependencies in the renderer? |
John
Guest
|
GL likes to generate texture ids incrementing from 1. I don't recall whether
that's standard or reliable. If it is, you wouldn't want a general purpose hash table to map texture ids. On 04/15/2013 11:09 PM, Mason Wheeler wrote:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||
|
External dependencies in the renderer? |
Mason Wheeler
Guest
|
It's not "ordering by Z and texture" but "grouping by Z and texture". Every render with a Z of 1 will get sent before every render with a Z of 2, and so on. That's why I said you end up with an array of multimaps.
Mason From: Sik the hedgehog To: Mason Wheeler; SDL Development List Sent: Monday, April 15, 2013 10:25 PM Subject: Re: [SDL] External dependencies in the renderer? Is there any guarantee in OpenGL at all that primitives are drawn in the order they appear in the buffer (which would seem inefficient)? Otherwise ordering by Z is pretty much eventually going to break in the future. 2013/4/16, Mason Wheeler:
|
|||||||||||||||||||||
|
External dependencies in the renderer? |
John
Guest
|
That sounds like ordering by Z to me, no?
The GLES device vendors advise against implementing your own depth sorting because the GPU depth test does it much faster, more efficiently, can correctly handle overlaps, and runs in parallel with the CPU. Also, z is floating point in transformed view coordinates which means there may not be many duplicate z values to group by. Have you measured the cost of switching the active texture unit? The number of switches that will be saved by this optimization is easy to calculate, it's roughly the number of primitives minus the number of textures. On 04/16/2013 12:41 PM, Mason Wheeler wrote:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||||||
|
External dependencies in the renderer? |
Sik
|
The problem is that I think the idea is to use a single batch for
everything... Again, I'm not sure at all that this kind of Z ordering is reliable in that case. The problem is that the safest way is sending one thing at a time, i.e. one draw call per SDL function, which is the very thing we're trying to avoid... Also yeah, the Z range is why I said we could run out of them. On PCs we have 24-bit depth buffer, OK (though somebody could still attempt to set 16-bit, and I guess on 2D this could make sense), but on mobile I wonder how the Z range is handled (especially on referred renderers as opposed to standard rasterizer ones). And yes, OpenGL numerates textures from 1 onwards (this is true for all objects, really), but remember you can create gaps by deleting textures, and OpenGL will attempt to fill those if I recall correctly (I'm not sure about the details). 2013/4/16, John:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
External dependencies in the renderer? |
Mason Wheeler
Guest
|
OK, since it's apparently not clear from my original proposal, I wasn't
talking about sending Z coordinates to OpenGL or Direct3D in any way. I was talking about using them on the SDL side. You'd end up with a certain number of layers, (most 2D games draw 4 or 5 distinct layers IME,) and each layer would have its own Z number. Each Z layer would have its own texture-to-coordinates multimap. When it's time to render everything, it looks like this (pseudocode): for each multimap in layers: for each texture in multimap: CreateCoordArrays(multimap[texture]) SelectTexture(texture) RenderArrays It's really that simple, in concept. Everything draws on top of what it's supposed to draw on top of. There's no need to send Z ordering to the GPU. There's no atrociously slow one-API-render-per-call. I've tested it. It works, and it's about 3x faster than the current system on large, complicated scenes. There are only two real downsides: 1) it requires a multimap to work properly, which we need a library for because libc provides neither a multimap implementation nor the fundamental primitives needed to build one (a map and a dynamic array). And 2) SDL_RenderCopy does not currently have a Z parameter on it, which is needed to make layering work correctly. Mason From: Sik the hedgehog To: SDL Development List Sent: Tuesday, April 16, 2013 4:37 PM Subject: Re: [SDL] External dependencies in the renderer? The problem is that I think the idea is to use a single batch for everything... Again, I'm not sure at all that this kind of Z ordering is reliable in that case. The problem is that the safest way is sending one thing at a time, i.e. one draw call per SDL function, which is the very thing we're trying to avoid... Also yeah, the Z range is why I said we could run out of them. On PCs we have 24-bit depth buffer, OK (though somebody could still attempt to set 16-bit, and I guess on 2D this could make sense), but on mobile I wonder how the Z range is handled (especially on referred renderers as opposed to standard rasterizer ones). And yes, OpenGL numerates textures from 1 onwards (this is true for all objects, really), but remember you can create gaps by deleting textures, and OpenGL will attempt to fill those if I recall correctly (I'm not sure about the details). 2013/4/16, John:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
External dependencies in the renderer? |
Sik
|
Not just SDL_RenderCopy, nothing in the rendering API has.
It looks like basically what you're doing is just telling the API that the order doesn't matter as long as specific groups are, if I'm understanding correctly (which isn't how the SDL API works). Indeed that's a valid optimization but not one that would work with the current API, if that's the case. Is that correct? 2013/4/16, Mason Wheeler:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||
|
External dependencies in the renderer? |
Jared Maddox
Guest
|
Can you guys please trim your replies? The quotes are getting way too long.
Because Mason wants to SORT things, instead of just buffering them.
I think that's a good idea.
Eh, I think this should really go in an external library.
A multimap is a high-level structure, and this would be written in C, so you're thinking about this incorrectly. I don't support your particular perspective on this (I prefer Forest's "buffer it" approach, because it preserves API behavior), but what you're talking about can be done trivially with any data structure, and if you use sorting data structures (such as balanced binary search trees) then you can do it pretty quickly. However, regardless of that, there's something that still true: you're talking about ordering by Z, as well as ordering by texture. If you don't understand why we keep repeating this then you need to go back to whatever dictionary you're using and try to find a way to reconcile "Every render with a Z of 1 will get sent before every render with a Z of 2" and "Sorting by Z". If you don't understand how those two things are the same, then you should drop this line of enquiry until you do understand it.
This is, in fact, ordering by Z and texture, which you said was not being done. You need to recheck your terminology.
The current SDL2 api is a 2d api. As a result, call 2 draws on top of call 1, meaning that render calls do actually matter. So this isn't how you do things? That's fine, but don't try to force your system down everyone else's throats. I think that the api should be improved to reduce the number of calls, but preserving draw order is required. In case you've forgotten, the api is currently "locked", and since this has the potential to break api behavior for existing games, the change is not acceptable. Buffering (such as that provided by the queue method that I suggested earlier) is fine as long as it preserves draw order, but what you're suggesting is not reliably acceptable.
Actually, all that you need is a searchable data structure. This covers everything from arrays, to linked lists, to trees, and doesn't even have to be sortable. Even then, C does provide some array sorting functions (e.g. qsort) which can be used to implement this. Thus, C provides a route to a concept demo. If you want decent speeds then you want a sorted tree, so that rules out C's standard library, but at the end of the day a customizable tree (where you can have multiple customizations) is all that's needed. After all, a map is just the association of one value with a data slot, and any searchable data structure does that fine, including trees. A dynamic array is a general enough term that it can cover any extensible data structure, and since tree insertions are quicker than copying a large block of memory when you need to perform an extension, you might as well use a tree there too. So, no need for a "map" nor for a "dynamic array", all that's needed for YOUR preference is a balanced tree.
Adding Z would be a backwards-compatibility break, which is now forbidden. Just in case it got lost in the conversation, here's my suggestion again, which unlike Mason's should presumably maintain compatibility with the current version: 1) Search through the queue, from most recent node to oldest node, looking for collisions between the current call's bounding box and the bounding box of the queue nodes. 2) If a collision is found, or the oldest node is reached without a collision, add the current command to the node that was most recently encountered, which also used the same texture as the command, and expand that node's bounding box. 3) If no node has been found that uses the same texture, add the command in a new node. Point, line, and rectangle render commands would go into the same queue. The main issue would be where the queue should be flushed, I figure that belongs in SDL_RenderPresent. That one is required on all platforms, right? It looks like (with the possible exception of the software renderer) all of the platforms need that to reliably render. _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||||||||
|
External dependencies in the renderer? |
Sik
|
2013/4/16, Jared Maddox:
I believe the software renderer still needs SDL_RenderPresent (otherwise how does SDL know that it can safely draw the surface on the window?). As for flushing, there'd be two points where this should happen for correct behavior: 1) In SDL_RenderPresent, right before it does its job. 2) When the buffer becomes so big that going any bigger would nullify the benefits. _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
External dependencies in the renderer? |
Mason Wheeler
Guest
|
I don't actually think point 2 is valid. The bigger and more complicated the
scene, the more this scheme benefits it. If you're only drawing 20 sprites per frame, it doesn't matter how inefficient your drawing techniques are; on modern hardware you'll get good performance anyway. But if you're drawing 20,000 or 200,000, that's when you'll really see the benefit of something like this. The only real "hard limit" you'd see for the whole thing getting too big is when *the whole thing* gets too big, when you start to run into system-level limitations. And at that point, you've got bigger problems to worry about. The second point at which you would want to flush the buffer is when the rendering state changes. To keep complexity down, the buffer operates under the assumption that everything draws in the same way. If you change the transparency settings, for example, or (even more obviously) change the active rendering target, you need to execute all existing buffered draw commands first and then start over with a clean slate. Mason From: Sik the hedgehog To: SDL Development List Sent: Tuesday, April 16, 2013 8:02 PM Subject: Re: [SDL] External dependencies in the renderer? 2013/4/16, Jared Maddox:
I believe the software renderer still needs SDL_RenderPresent (otherwise how does SDL know that it can safely draw the surface on the window?). As for flushing, there'd be two points where this should happen for correct behavior: 1) In SDL_RenderPresent, right before it does its job. 2) When the buffer becomes so big that going any bigger would nullify the benefits. _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
External dependencies in the renderer? |
Sik
|
Point 2 does matter in real hardware, sadly. If you try to send too
big of a batch, it'll end up being slower as you'll overwhelm the GPU trying to transfer all that data into its own memory (in particular, memory latency will become a massive issue here). I recall the general suggestion is to not use values larger than 16-bit for indices (i.e. that makes for 64K entries max in a buffer object), to give an idea. (there's a debate about whether anybody will ever reach that point - but I guess that 10,000~15,000 entries probably make a good place to break up, if you consider most of them will be quads and thereby eat up four vertices each, although I guess you can optimize this to reuse primitives and use transformations to work around it instead, but even then that just doubles the acceptable limit) I don't think translucency parameters affect the state though. You could just feed those in the buffer itself and let the shader handle it (if you're doing this method you definitely are going the shader route anyway). In this sense textures really should be the only state change, unless I'm missing something. (oh, and yes, changing the shader is bad too as it can't be parallelized at all) 2013/4/17, Mason Wheeler:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||
|
External dependencies in the renderer? |
Driedfruit
Guest
|
Just my 2 cents, but this would be *very* lovely to have in any case. -- driedfruit _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
External dependencies in the renderer? |
Jonny D
|
This optimization is a good thing, but the benefit does not scale directly with the number of sprites drawn. When you get into the thousands of sprites, flushing the buffer every few thousand sprites is of negligible cost. This approach helps avoid the incremental cost of several OpenGL calls and a blocking "flush" for every single sprite (the way SDL currently does it).
Sik's point 2 is relevant and applies even more clearly when your buffer is a simple fixed-size array. At some point, you have to decide how much memory you want to allocate for this buffer, and it must be flushed before it overflows. As far as that goes, I'd rather not be allocating memory as I assume the map does. Also, Mason is right that you have to flush before every state change that could change the rendering. As was said before, we need to guarantee rendering order because the OpenGL depth test is not enough to make alpha blending work in the right order. Z layers is an okay concept, but not terribly widespread in practice. It would be strange to make the SDL API embrace such a high level concept that doesn't apply to most applications. Jonny D On Mon, Apr 15, 2013 at 8:04 PM, Driedfruit wrote:
|
|||||||||||||||
|
Re: External dependencies in the renderer? |
Nathaniel J Fries
|
Aye, this would be helpful for an old idea of mine as well. And I just realized I botched that function definition, that function call is not terribly useful, taking only a destination rect (ditto for the other one).
|
|||||||||||||||||
|
External dependencies in the renderer? |
Forest Hale
Guest
|
Z buffering does not solve problems for blended transparency, only alpha test, blended transparency still requires sorting back to front, which totally wrecks any texture batching optimizations.
On 04/16/2013 05:48 PM, Mason Wheeler wrote:
-- LordHavoc Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces Co-designer of Nexuiz - http://alientrap.org/nexuiz "War does not prove who is right, it proves who is left." - Unknown "Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass "A game is a series of interesting choices." - Sid Meier _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
External dependencies in the renderer? |
Sik
|
But he insists it's still a huge speed up.
But yeah, I was thinking, and it's very likely a lot of programmers will just leave blending turned on, and unless you keep track of all the pixels in the texture or something you'll have to assume translucency conflicts can happen. The only "easy" workaround would be that dirty rectangles-like suggestion from earlier. Of course there's also the question about how much overlap is between each draw (i.e. how much you draw on top of what's already drawn). 2013/4/18, Forest Hale:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
External dependencies in the renderer? |
Mason Wheeler
Guest
|
*facepalm*
Did you not read what I just wrote? Did you *seriously* not read it at all? I just got through explaining that my implementation DOES NOT USE Z-BUFFERING; that it does work by sorting back to front, that it I've been using it and timed it and it speeds things up by a factor of 3 on large scenes? And now you reply and say "no, you can't use Z-buffering; you need to sort back to front to avoid screwing up blended transparency, and you can't do *that* or it will wreck the performance gains"?!? Seriously? Mason From: Forest Hale To: Sent: Wednesday, April 17, 2013 10:24 PM Subject: Re: [SDL] External dependencies in the renderer? Z buffering does not solve problems for blended transparency, only alpha test, blended transparency still requires sorting back to front, which totally wrecks any texture batching optimizations. On 04/16/2013 05:48 PM, Mason Wheeler wrote:
|
|||||||||||||
|
Nathaniel J Fries
|
The terminology disagreement is due to the same term ("z") being seen in two different contexts.
Z layering (or Z ordering) is a technique in 2D rendering used to separate layers. Any item with the same layer ("z") can be rendered in any order, and the resulting graphical output would be the same (or at least close enough for the programmer's needs). Most 2D games use this technique (either explicitly or simply by a proper ordering of draw operations) in order to prevent a ground tile from being rendered on top of the player and other such issues. Z buffering is a technique in 3D rendering (better termed depth buffering) that allows the programmer to define the depth of objects, which is often used by hardware to cull the scene. Transluscency is an issue for depth buffering, since an opaque texture drawn behind a transluscent texture will be culled. It is not usually an issue for Z layering (unless this is implemented using the hardware's depth buffer), since culling is not the purpose (render order is). What Mason is suggesting is Z layering, and not Z buffering, which means that nothing is culled. |
|||||||||||
|
External dependencies in the renderer? |
Mason Wheeler
Guest
|
Yes, that's correct.
Mason From: Nathaniel J Fries To: Sent: Thursday, April 18, 2013 3:12 PM Subject: Re: [SDL] External dependencies in the renderer? The terminology disagreement is due to the same term ("z") being seen in two different contexts. Z layering (or Z ordering) is a technique in 2D rendering used to separate layers. Any item with the same layer ("z") can be rendered in any order, and the resulting graphical output would be the same (or at least close enough for the programmer's needs). Most 2D games use this technique (either explicitly or simply by a proper ordering of draw operations) in order to prevent a ground tile from being rendered on top of the player and other such issues. Z buffering is a technique in 3D rendering (better termed depth buffering) that allows the programmer to define the depth of objects, which is often used by hardware to cull the scene. Transluscency is an issue for depth buffering, since an opaque texture drawn behind a transluscent texture will be culled. It is not usually an issue for Z layering (unless this is implemented using the hardware's depth buffer), since culling is not the purpose (render order is). What Mason is suggesting is Z layering, and not Z buffering, which means that nothing is culled. Nate Fries _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||
|
Nathaniel J Fries
|
Despite understanding what you're referring to, I don't like the idea of having z-ordering in the SDL core library.
However, the problem you identified is fairly important. Although SDL is a generic solution, and generic solutions can never be as optimal as hand-tailored ones, by not providing a means of optimizing for the case of order-independent rendering of the same texture (which is quite common in 2D games, especially if spritesheets are used), SDL unnecessarily reduces framerates in OpenGL and Direct3D. I would again refer to the notion of a simple function to perform this, along the lines of SDL_RenderCopyMulti (or SDL_RenderNCopy, etc). |
|||||||||||
|
External dependencies in the renderer? |
Mason Wheeler
Guest
|
The problem with that is that it forces the developer to do essentially the same thing I'm proposing, just on their end.
If you have a scene with a bunch of sprites in it, they're most likely not ordered by texture, and certainly not *grouped* by texture. That's not a natural way to set it up, and not something someone's going to do unless they're specifically trying to do what I'm trying to do here. Which means that at draw time, at some point, someone somewhere has to translate the list of what's being drawn into some sort of structure that's grouped by texture--such as a multimap. As long as "group by texture" has to be done one way or another in order to get the performance benefits we're talking about here, why force it to be outside of the API and require every developer to reinvent the wheel? That's what libraries are *for*, isn't it? Mason From: Nathaniel J Fries To: Sent: Thursday, April 18, 2013 5:56 PM Subject: Re: [SDL] External dependencies in the renderer? Despite understanding what you're referring to, I don't like the idea of having z-ordering in the SDL core library. However, the problem you identified is fairly important. Although SDL is a generic solution, and generic solutions can never be as optimal as hand-tailored ones, by not providing a means of optimizing for the case of order-independent rendering of the same texture (which is quite common in 2D games, especially if spritesheets are used), SDL unnecessarily reduces framerates in OpenGL and Direct3D. I would again refer to the notion of a simple function to perform this, along the lines of SDL_RenderCopyMulti (or SDL_RenderNCopy, etc). Nate Fries _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||
|
External dependencies in the renderer? |
Sik
|
First of all, before we continue: does *anywhere* in the OpenGL or
Direct3D specs say that primitives are guaranteed to be rendered in the *same* order as they're sent? Because otherwise I really doubt that ordering is going to work reliably. It may work on some systems but break miserably on others. (and that reminds me: we'll need patches for all hardware-accelerated renderers if we want to accept this method ) 2013/4/18, Mason Wheeler:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
External dependencies in the renderer? |
Mason Wheeler
Guest
|
Back to this again?
Do you understand the difference between ordering and grouping? Mason From: Sik the hedgehog To: Mason Wheeler; SDL Development List Sent: Thursday, April 18, 2013 8:17 PM Subject: Re: [SDL] External dependencies in the renderer? First of all, before we continue: does *anywhere* in the OpenGL or Direct3D specs say that primitives are guaranteed to be rendered in the *same* order as they're sent? Because otherwise I really doubt that ordering is going to work reliably. It may work on some systems but break miserably on others. (and that reminds me: we'll need patches for all hardware-accelerated renderers if we want to accept this method ) 2013/4/18, Mason Wheeler:
|
|||||||||||||
|
External dependencies in the renderer? |
John
Guest
|
On 04/18/2013 11:17 PM, Sik the hedgehog wrote:
Yes. Otherwise it'd be impossible to composite anything reliably without flushing after every primitive. _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
External dependencies in the renderer? |
John
Guest
|
We all understand the difference. You have proposed to re-order primitives
according to their texture, and that is why we are discussing "ordering". On 04/19/2013 01:53 AM, Mason Wheeler wrote:
SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||
|
Re: External dependencies in the renderer? |
Nathaniel J Fries
|
The programmer would need to implement ordering on top of the library anyway in order to feed the graphics to SDL in the right order. And whatever APIs SDL could expose to access the underlying ordering structure, many programmers would probably find less preferable to the way they've always done it. So you have a situation of redundant ordering. Grouping is what is actually needed, but grouping without considering order effectively negates order, so SDL must either group and order or do neither. I proposed earlier a spritebatch mechanism for SDL which would do all this internally, but it was suggested that this be an extension library; however to even implement that in a non-hackish manner, SDL would still need to provide an interface for rendering the same texture multiple times. |
|||||||||||||
|
External dependencies in the renderer? |
Forest Hale
Guest
|
I did read what you said, but my interpretation was that you wanted to blast all primitives out sequentially by texture without regard to Z layer, and then have the Z buffer hardware sort it out.
This interpretation was incorrect, I apologize. Is there a reason to use a multimap rather than a radix sort? Presumably a sort key consisting of several bytes of state (Z layer, blendfunc, texture) would achieve the desired results. On 04/18/2013 09:44 AM, Mason Wheeler wrote:
-- LordHavoc Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces Co-designer of Nexuiz - http://alientrap.org/nexuiz "War does not prove who is right, it proves who is left." - Unknown "Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass "A game is a series of interesting choices." - Sid Meier _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||
|
External dependencies in the renderer? |
Jared Maddox
Guest
|
It's still perfectly fine. Any language that supports C's qsort, C++'s std::map, or any half-way similar functionality, will by definition provide all of the primitives needed to actually implement a solution to this. It's not a big deal once they recognize that they need a sorted data structure. If SDL provided a generic C-language tree implementation then it would certainly be more convenient to everyone, but that's a minor thing.
I've written 3d code that does the job. Depending on the complexity of your modelling, you can do this in C++'s standard containers in as little as ~50 lines of code (and that's a very-ballpark estimate, I use a lot of whitespace in my code).
You know, perhaps I'm confusing you with someone else, but I seem to remember you wanting to rip OUT portions of SDL. Now you're trying to add in parts that the rest of us consider only partially appropriate, DESPITE already having been told that it requires a forbidden API change?
For that matter, if the "multi-render" function were added then that would be enough for my "buffering-render" suggestion to be implemented with an external library. It's a really straightforward optimization, and doesn't need to break the API. _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||
|
Re: External dependencies in the renderer? |
Nathaniel J Fries
|
I'm not honestly a fan of the spritebatch mechanism, I'd rather group them myself and call a multi-rendering function. However, the spritebatch mechanism already has real-world uses (in fact, it is the simplest method for rendering 2D graphics in Microsoft's XNA), whereas a simple multi-copy function does not AFAIK. |
|||||||||||||||
|
External dependencies in the renderer? |
Pallav Nawani
|
If Mason wants to implement a portable, performance improving
optimization in the SDL renderer pipeline, I totally don't see a problem with it. Some may not want an external Hash Table implementation - okay, but I don't see why SDL shouldn't be rendering stuff faster than it already is. I don't understand the opposition - at all. If it doesn't work - well, that's what source control is for. On 4/20/2013 7:54 AM, Jared Maddox wrote:
-- *Pallav Nawani* *Game Designer/CEO* http://www.ironcode.com Twitter: http://twitter.com/Ironcode_Gaming Facebook: http://www.facebook.com/Ironcode.Gaming _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||
|
External dependencies in the renderer? |
Scott Percival
Guest
|
There's no opposition against the idea of faster draw calls. What you're seeing here is interest in finding the best way to implement the optimisation, both for the developers who'll use it and the SDL maintainers.
The renderer API was designed to use the unbatched painter's algorithm approach to blitting. As discussed, it's non-trivial to cache a bunch of these draw calls when there's zero guarantee that the state will remain the same between each one. The worst case is that you'll end up breaking a pile of software which relies on the expectation that blits will happen immediately after calling SDL_RenderCopy. Hence the discussion about adding a new batch rendering method alongside the old one. The SDL 2.0 API has been frozen, and there is released software using this API; now is exactly the wrong time to be cavalier about breaking things. On 22 April 2013 15:36, Pallav Nawani wrote:
|
|||||||||||||||||||||||
|