Feasibility/correctness of calling GL in another thread |
Feasibility/correctness of calling GL in another thread |
Stefanos A.
Guest
|
This should work, provided your GPU drivers can do context sharing without going belly up. (This includes first-gen Atoms with PowerVR IGPs and some Core / Core2 mobile IGPs with old drivers.)
MonoGame does the exact same thing and it appears to be working fine. That said, why do you need two OpenGL contexts? 2014/1/14 godlike
|
|||||||||||||
|
Feasibility/correctness of calling GL in another thread |
Jonas Kulla
Guest
|
2014/1/14 godlike
I'm doing almost exactly the same thing as you described in my engine: do polling/processing of SDL events and setting state of the window in the main thread, and doing the rendering in another dedicated thread. The only difference is that I create the window in the main thread, pass that pointer into the rendering thread, and create the GL context there (I also use only one thread). Haven't had any problems with this setup on Mac/Linux (Windows untested, but should be fine). |
|||||||||||||
|
Feasibility/correctness of calling GL in another thread |
Jonas Kulla
Guest
|
2014/1/16 Jonas Kulla
Whoops, meant to say "I also only use one GL context". |
|||||||||||||
|
Re: Feasibility/correctness of calling GL in another thread |
slimshader
|
I had no problems with 2 contexts on Windows and Mac but I got crashes on iOS. I used 2nd GL context to upload textures in the background, while main thread was doing the rendering. I disabled background uploding (and 2nd ctx) in the end on iOS, didn't have enoych time to investigate |
|||||||||||||||
|
Feasibility/correctness of calling GL in another thread |
Forest Hale
Guest
|
Short version:
Never use shared contexts for performance-conscious code, it costs way more than the failed (more on that later) overlap of the texture uploads. Long version: During early development of a major product (Steam Big Picture Mode) in the past that used multiple contexts for background uploading of OpenGL textures, we were told by multiple desktop GPU vendors that the drivers flatly mutex every OpenGL call when you have shared contexts, this can result in major (~20%) fps loss even if you don't use the other context at all, it gets worse if you do, and in particular the texture upload does NOT happen in parallel with rendering due to that mutexing. So my advice is never do this, we changed the product to not do this before launch because it was completely not performant, we had been struggling to keep up 60fps until we did, then it easily exceeded 200fps with that one change. The hitching of texture uploads is pretty much unavoidable in OpenGL ES (iOS, Android, etc), on desktop OpenGL you can somewhat hide it with GL_ARB_pixel_buffer_object - where you glMapBuffer on the main thread and then write the pixels from another thread, when done you glUnmapBuffer on the main thread and then issue the glTexImage2D with the pixel buffer object bound, so that it sources its pixels from that object rather than blocking on a client memory copy, but I'm sure this isn't free and I have not tried it in practice, it also requires that you more or less queue your uploads for the main thread to prepare in stages so that's some lovely ping-pong there. While I too would greatly appreciate the addition of some background object upload functionality in OpenGL, or even an entire deferred command buffer system (I proposed this in a hardware-agnostic way but it didn't gain traction), the reality today is that OpenGL contexts and threading are completely non-viable. I should note that Doom 3 BFG Edition seems to use a glMapBuffer on each of 3 buffer objects (vertex, index, uniforms) at the beginning of the frame, queue jobs for all of the processing it wants to do, so that threads write into those mapped buffers, and then at end of frame it does the glUnmapBuffer and walks its own command list to issue all the real GL calls that depend on that data - this works very well, but is out of the scope of most OpenGL threading discussions. On 01/16/2014 06:29 AM, slimshader wrote:
-- LordHavoc Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces Co-designer of Nexuiz - http://alientrap.org/nexuiz "War does not prove who is right, it proves who is left." - Unknown "Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass "A game is a series of interesting choices." - Sid Meier _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
Re: Feasibility/correctness of calling GL in another thread |
slimshader
|
Very interesting stuff, thanks a lot for sharing. Is there anything more you could provide on the topic (links possibly) ?
That said, I do not intend to use it for performance critical stuff but rather for loading screen. Main thread renders loading animation while background thread uploads whole level along with textures. In fact I did notice that this takes slightly longer than doing everything in main but user experience is much better with main thread still operational, showing anims and gameplay tips.
|
|||||||||||||
|
Feasibility/correctness of calling GL in another thread |
Forest Hale
Guest
|
The problem is that as long as there are shared contexts, you incur the massive performance penalty - even if all calls are from one thread.
Hence don't use them - even if this means you have to queue texture uploads and vertex/index buffer creation and such for the main thread (showing the loading screen) to handle at its leisure, people won't care about microstutter/hitching on a loading screen, it will still be pretty smooth because you're still running all your file I/O and other heavy operations on the other thread. On 01/21/2014 01:31 AM, slimshader wrote:
-- LordHavoc Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces Co-designer of Nexuiz - http://alientrap.org/nexuiz "War does not prove who is right, it proves who is left." - Unknown "Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass "A game is a series of interesting choices." - Sid Meier _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||
|
Re: Feasibility/correctness of calling GL in another thread |
slimshader
|
What you are saying is really scary. You mean that even after I loaded a level using 2nd shared ctx and it is not used anymore, mere fact that it exists causes main thread context to go through some kind of locking mechanism? What if I then destroy 2nd ctx? Does the lock go too? Is it specific to a driver or a platform? I deal with Win and iOS. Is it specific to GL version used? I am still limiting myself to GL 1.0 as there is too much driver issues on Win with anything above and I am doing 2D games. |
|||||||||||||
|
Feasibility/correctness of calling GL in another thread |
Jared Maddox
Guest
|
There is simply no way for the driver to know that you won't be using that context if it's still around, so how can it do otherwise? Graphics card vendors don't normally sell programs intended to optimize your NON-graphics code, and knowing that you won't be using the context again basically falls into the same category of things as that.
That will depend on the driver. Thus, you should assume "No".
I believe that Forest (or was it someone else? it was a few days ago) already said that he was told by someone who's involved in the production of video cards that it happens with everything. Indeed, it would surely be extremely difficult, and maybe impossible, for it to be otherwise.
It's possible that it could happen in DirectX as well. I don't know if they have any "lockless" APIs, but even if they do it doesn't mean that everyone implements it without locking. _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||
|
Feasibility/correctness of calling GL in another thread |
Forest Hale
Guest
|
For Direct3D the HAL always locks (like OpenGL's shared contexts) but the locks are on resources rather than API entry points, so there is a performance loss inherent in that API design choice
compared to OpenGL (which goes "full throttle" in the single threaded case), this gives some scalability with threading but performance gains fall off sharply with additional threads (so one additional thread may be justified but not more, unless you like wasting electricity on spin locks - and that second thread just brings you up to OpenGL performance!). Multiple vendors for PC drivers directly told me that their OpenGL drivers lock on every call in case of shared contexts, they make no attempt at overlapping operations like this, it is considered exotic behavior in the context of OpenGL API usage, something that games and other consumer apps do not do, it could be accelerated somewhat on their CAD-specific drivers (such as NVIDIA Quadro series and AMD FirePro series) but I do not have data on those. I would be quite wary of shared contexts on mobile operating systems such as iOS and Android as the driver vendors have been known to have countless bugs throughout their API even in single-threaded usage, I don't know how they handle shared contexts and it might vary by make and model. Or it could be the unicorn feature in their driver that always works despite everything else being randomly broken; I'm not placing bets. On 01/21/2014 10:24 AM, Jared Maddox wrote:
-- LordHavoc Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces Co-designer of Nexuiz - http://alientrap.org/nexuiz "War does not prove who is right, it proves who is left." - Unknown "Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass "A game is a series of interesting choices." - Sid Meier _______________________________________________ SDL mailing list http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org |
|||||||||||||||||||||||
|
Re: Feasibility/correctness of calling GL in another thread |
slimshader
|
Great stuff guys, thanks. I removed my 2nd ctx and now all do all texture operations on the main thread (2nd thread pushes texture data to queue and waits for main thread for upload). It was actually (surprisingly) easy to do with std::promise/future.
Level loads bit faster now and I have additional benefit of things working the same way on Win and iOS. Good to know that I should do the same for D3D implementation.
|
|||||||||||||||||||||||||
|
Nathaniel J Fries
|
You can do this with drivers that support context sharing, sure. But it would make for simpler code and be more portable to do the opposite: render in the main thread, process events in a secondary thread.
SDL_PumpEvents will still need to be called from the main thread for most OSes. But except for a couple of user-initiated loops on Windows, this should have no effect on framerate (I've benchmarked the equivalent code of SDL_PumpEvents and it usually takes about 5microsec to run on a Pentium 4 - 60fps requires loop time < 16ms, or 3000x that) |
|||||||||||
|
Nathaniel J Fries
|
You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.
Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you're developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs. If you aren't doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It's also extremely simple to implement - in C or C++, it can be implemented as nothing more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]). |
|||||||||||
|
Feasibility/correctness of calling GL in another thread |
Jonathan Greig
Guest
|
Having a tail pointer in a singly linked list is always a good idea when optimizing for performance. It makes all items appended to the end of the list or removed from the end of the list faster because it takes constant time O(1) and if you are accessing the last element frequently, that's icing on the cake On Jan 22, 2014 11:29 PM, "Nathaniel J Fries" wrote:
|
|||||||||||||
|
slimshader
|
I hear you, but I am in the niche of older machines, in fact I am having trouble even getting users to get OpenGL 1.4 work correctly (for VBOs (in fact I think it is about time I implement D3D renderer)), also mobile devices are not what you'd consider 2010s "gaming rigs". That being said, secondary thread is (as I said before) not used to speed-up level loading but rather to keep main (event processing, rendering) thread responsive.
I do use task queues (double-buffered) but they are per-thread. Since cross-thread tasks are very seldom I don't want them having locks all the time. BTW. Since we are on threading / GL topic: do you guys render from main thread? What are your update vs render step strategies? If you do them on separate threads how do you sync later (condition vars seem obvious choice)?.
|
|||||||||||||
|
Re: Feasibility/correctness of calling GL in another thread |
slimshader
|
I might be missing something here but how do you even implement a list without tail pointer? You always keep at least one end otherwise it would be inaccessible. In any case, node-based lists suck
|
|||||||||||||||
|
Feasibility/correctness of calling GL in another thread |
Stefanos A.
Guest
|
2014/1/23 slimshader
After trying several threading strategies, my current preference is to keep rendering and window management to the main thread, but handle input on a secondary thread. So far, this has proven the best method to maintain responsiveness without impacting compatibility. Regarding D3D... I prefer to use ANGLE to get OpenGL ES 2.0 on systems without proper OpenGL support. This way, I only need to maintain two renderers: OpenGL everywhere and OpenGL ES for smartphones and (Windows & ~(Nvidia | AMD)). This way, I can also use shaders across the board. ANGLE works all the way down to GMA 950 (and probably GMA 500/Poulsbo, although I haven't tested that), so there's very little reason to use the fixed-function pipeline. Microsoft recently announced they will be working with Google to port ANGLE on WinPhones and Metro, so D3D will be strictly unnecessary going forward - as an indie developer, this suits me perfectly.
|
|||||||||||||||
|
Re: Feasibility/correctness of calling GL in another thread |
slimshader
|
But how that does help? If main thread is blocked, then you don't refresh your screen to show the impact of processed events. In my experience event handling is tiny fraction of a frame. Do you mean that with 2nd thread event handling you avoid "busy" system cursor / window appearing to hang?
I had ANGLE on my radar but now you really got me interesting in this. I am only really interested in 2 platforms: Win and iOS, this means I would only need to maintain GL ES renderer. That would be great. Definitely going to look into it. |
|||||||||||||||
|
Feasibility/correctness of calling GL in another thread |
Stefanos A.
Guest
|
2014/1/23 slimshader
The point is not to improve performance but to minimize latency between the user pressing a button and the world reacting to that button press. If you handle input in your rendering thread, then any dip in the framerate will increase input latency, which can be jarring (esp. on slower systems that cannot maintain a stable framerate.) By spawning a separate thread for input, the OS scheduler will "smoothen out" input latency even when your framerate dips below 10 fps. Of course, this only helps if your world update rate is decoupled from your framerate. In my case, I will skip up to 12 frames in order to guarantee a pseudo-fixed update rate. In other words, I prioritize world updates (60 updates/sec no matter what) and only render frames as a best-effort. This way, if the player presses the "fire" trigger then she will shoot the enemy immediately even if she is running at 5 fps. If the input was handled in the same thread, then the "fire" button would take up to 200ms to register - or it would be skipped completely, if the player lifted her finger before the 200ms mark. This would place the player at a severe disadvantage (hi, Diablo 3!) |
|||||||||||||
|
Re: Feasibility/correctness of calling GL in another thread |
slimshader
|
Clean now A question: I just tried to build minimal GL ES2 app under Win but I am getting unresolved externals for glClear, glClearColor and 2 more in sdl_main function. So it clearly wants to use full GL. I assume you use SDL2 with ANGLE? |
|||||||||||||||
|
Feasibility/correctness of calling GL in another thread |
Stefanos A.
Guest
|
I am using ANGLE with and without SDL2, but I'm using C#/OpenTK which loads both libraries dynamically - so I cannot really help you on these errors, sorry. ("Dynamically" in this case means using LoadLibrary + GetProcAddress("eglGetProcAddress") and then using eglGetProcAddress to load the rest of the entry points.)
IIRC, I had to compile SDL2 from hg in order to get ANGLE working, but it was otherwise straightforward. 2014/1/23 slimshader
|
|||||||||||||
|
Feasibility/correctness of calling GL in another thread |
Jonathan Greig
Guest
|
slimshader, While I normally don't reference Wikipedia for programming matters, look at the tradeoff section ( http://en.wikipedia.org/wiki/Linked_list#Tradeoffs ) and particularly the chart where the last element is known. I was mostly clarifying Nathan's comment that keeping track of the last element _will be faster_ rather than _may be faster_ in case others wish to take his approach. When he mentioned tail pointer, he meant keeping track of the last element. Ultimately, the best choice of container is suited to the particular problem and as Nathan mentioned, a singly-linked-list is easy to implement. In some cases you may not have the luxury of using a standard library and rolling your own is the only choice. |
|||||||||||||
|
Nathaniel J Fries
|
It may still make sense to avoid the second thread by making tasks shorter. Use non-blocking or even asynchronous I/O, or memory mapping (all three options will require writing system-specific code), and you will find that resource loading tasks will not impede the render and event processing tasks (since upload to GPU already blocks other API calls on most implementations anyway). Blocking on file reads is probably the bottleneck pushing you towards the direction you're going.
This is true 99% of the time, but not 100% of the time. Maintaining a tail pointer is occasionally not worth the added complexity, and I have seen situations in which it was actually an optimization to remove it. O(1) and O(n) for tail insertion are equal when there is only one element, but the O(n) code may actually perform better. This is why I disagree with professors and professionals alike treating the big-O notation as the silver bullet of algorithms. It is a useful tool, but there is never a silver bullet. |
|||||||||||||||
|