![]() |
Lack of batching for RenderCopy/RenderCopyEx | ![]() |
![]() |
![]() |
.3lite
![]() |
![]() |
I mean I know that batching won't change anything within a software renderer, but it would be nice to have it nonetheless - after all, why not use available Direct3D and OpenGL back-ends when possible?
|
||||||||||
|
![]() |
Lack of batching for RenderCopy/RenderCopyEx | ![]() |
Sik
![]() |
![]() |
2015-08-01 13:09 GMT-03:00, .3lite:
Mostly that people attempted to automatically batch the calls to the existing functions and somehow expect that to improve all programs (despite said programs constantly changing textures most likely). A new function would be probably the best and easiest option (actually an equivalent for the scaling/rotation variant would be neat too). It can even be made to just fall back to non-batched functions in backends where batching wasn't implemented yet (worst case it just takes about the same amount of time, best case it improves by a lot). _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
||||||||||||
|
![]() |
Lack of batching for RenderCopy/RenderCopyEx | ![]() |
Ethan Lee
Guest
![]() |
![]() |
Odds are it'd look a lot like the XNA SpriteBatch when using SpriteSortMode.Deferred:
https://msdn.microsoft.com/en-us/library/microsoft.xna.framework.graphics.spritebatch.aspx https://msdn.microsoft.com/en-us/library/microsoft.xna.framework.graphics.spritesortmode.aspx https://github.com/flibitijibibo/FNA/blob/master/src/Graphics/SpriteBatch.cs You could either require that all copies in a single batch use one texture, like RenderCopyBatched(SDL_Texture*, SDL_Rect**), or do RenderBatchBegin/RenderBatchEnd hints that try to generate batches of RenderCopy calls on SDL's end at the cost of having things split up more than you might expect. -Ethan On 8/1/15 3:06 PM, Sik the hedgehog [url=mailto:][/url] wrote:
|
||||||||||||||
|
![]() |
![]() |
.3lite
![]() |
![]() |
To be honest guys I'm expecting something much simpler. Lets give an example of following functions:
The only difference between those functions and their equivalents of SDL_RenderCopy and SDL_RenderCopyEx is taking an array of source and destination rectangles. The rest can be taken care of by the programmer. That way the implemention of these functions should be straight forward and should take no more than 10 minutes for both OpenGL and Direct3D. I can make it myself, yes, but I would like to stay up to date with the SDL2 itself and I do not like modifying external libraries I'm relying on. D3D_RenderCopy uses DrawPrimitiveUP of Direct3D which is almost ready for batch drawing - just add more vertices of the rest rectangles. Desktop OpenGL uses old immediate mode (draw arrays would be much better), but it's easy to implement it as well. All available renderers within SDL2 are pretty much ready to add batching of the same texture and they require minor changes to SDL_RenderCopy and SDL_RenderCopyEx to make new functions out of them. |
||||||||||||||
|
![]() |
Lack of batching for RenderCopy/RenderCopyEx | ![]() |
Eric Wing
Guest
![]() |
![]() |
On 8/1/15, .3lite wrote:
I agree that it should be simpler than XNA and I personally like this line of thinking. I think the goal of the batching API should be stated explicitly so everybody is on the same page. In my opinion, the goal should be to allow performance optimizations and that's it. (XNA conflates multiple things...performance plus read-my-mind-do-everything-I-want which ultimately makes things more complicated.) Convenience wrappers can always be written on the outside, but you can't wrap around API/performance bottlenecks and expect them go faster. Additionally, I suspect that a good SIMD backend could make the software renderer go a lot faster too. (Watch Handmade Hero for a great demonstration of how he made a chunky software renderer go to 60fps at 1080p using SSE2.) I would suggest constraining the API as much as possible for speed. To make SIMD or any vectorization to go fast, you generally want predictable data layouts and no branches. So for example, with
I might suggest that individual array elements for src/dst rect must always have values and can't be NULL, that way the code that is trying to shuffle things into registers isn't needing to check for NULL all the time. Along that line of thought, we may want explicit array sizes for srcrect/dstrect as additional parameters. Algorithms may want to compute up front how it is going to deal with odd number cases where the number of objects doesn't perfectly divide evenly into the wide registers. This may have an additional convenience for when the user has a large array of rects already, but only needs a subset, preventing the need to make a new copy. srcrect or dstrect arrays themselves being NULL/empty probably could be handled efficiently by separating into specialized versions early before entering into the inner loops. I'm a little ambivalent about flip. It seems like for performance, the user should have pre-oriented the texture. On the otherhand, since it is already in the core SDL API, consistency is nice, and I don't think this needs to incur a noticeable cost as it can also be separated out into specialized versions early. -Eric _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
||||||||||||||
|
![]() |
Re: Lack of batching for RenderCopy/RenderCopyEx | ![]() |
.3lite
![]() |
![]() |
I agree. In fact I made a small mistake in my example. I mean't an array of objects, not array of pointers. Including amount of elements inside the array. That is:
That way instead of providing an array of pointers you will provide an address to an array with objects.
I believe that these two small functions will take renderer of SDL2 into a new level. Who knows how many games were forced to abandon SDL2 in favor of pure Direct3D or OpenGL implementation due to performance issues from lack of batching. |
||||||||||||||||||
|
![]() |
Lack of batching for RenderCopy/RenderCopyEx | ![]() |
Jared Maddox
Guest
![]() |
![]() |
Was center meant to point to an array, or just a single SDL_Point? C has supported pass-by-value of structures for long enough that I haven't easily been able to find out when: I think it was during the K&R/C89 switch-over. I wouldn't use pass-by-value for *_Renderer or *_Texture, but mostly because they might have hidden data.
<snip: see my reply above>
I don't think these would be too bad, since for both the source and destination you're looking at a mandatory memory read, mandatory comparison, optional single-instruction jump, and optional constant write to the same destination as the mandatory memory read. As long as you prep your src/dest null-replacement SDL_Rect instances to the texture & renderer beforehand, you should be fine (after all, srcrect and destrect are likely to point to non-contiguous SDL_Rect, since gather/store can potentially cut down on the total data transfers, and the cache thrashing was likely to happen during copy operations as well).
Agreed, especially since the only other way to find the end of the array is to go hunting for a null-pointer.
Yeah, when I write a function with pointer args it usually starts off like this: int func( type *arg ) { if( arg ) { } return( -1 ); } A quick optomization for even the least optimizing compilers is easy to figure out. _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
||||||||||||||||||||||
|
![]() |
![]() |
.3lite
![]() |
![]() |
Actually it is meant to be a single SDL_Point. All rendering engines within SDL2 do use transformation matrices and transformation matrix does always require a state change flushing the current batch. The batch is meant to be simple, efficient, and it should be up to the programmer to implement any kind of batching he likes. |
||||||||||||
|
![]() |
Re: Lack of batching for RenderCopy/RenderCopyEx | ![]() |
rbanke
![]() |
![]() |
I would love for something along these lines to be added. For the game I'm working on atm I only use two textures for all my assets and between drawing glyphs for text and sprites I'm seeing performance taking dives while testing drawing just our tilemap, gui & text. Being able to draw all of that with two calls instead of several thousand sounds much nicer ![]() |
||||||||||||||||||||
|
![]() |
![]() |
Samote
![]() |
![]() |
That would be awesome !
|
||||||||||
|