The SDL forums have moved to discourse.libsdl.org.
This is just a read-only archive of the previous forums, to keep old links working.


SDL Forum Index
SDL
Simple Directmedia Layer Forums
Feasibility/correctness of calling GL in another thread
godlike


Joined: 21 Feb 2010
Posts: 5
Hi all,

In the game engine that I am working on, I am designing a rendering thread that essentially executes all OpenGL calls (including SDL_GL_SwapWindow) instead of the main thread. The problem is that I am not quite sure if the scenario that I have in mind is something safe or it will lead to undefined behavior.

The idea is that in the main thread will:
Code:
SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | ...);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(...);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
    poll_input_and_joystick_events_using_SDL()
    do_other_things()
}


In the rendering thread:
Code:
SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
    execute_GL_calls()
    SDL_GL_SwapWindow(window);
}


The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in another thread and that I am doing some SDL stuff in the main thread and others in the rendering thread (polling events).

What do are your thoughts? Will this work or not?
Feasibility/correctness of calling GL in another thread
Stefanos A.
Guest

This should work, provided your GPU drivers can do context sharing without going belly up. (This includes first-gen Atoms with PowerVR IGPs and some Core / Core2 mobile IGPs with old drivers.)

MonoGame does the exact same thing and it appears to be working fine.


That said, why do you need two OpenGL contexts?


2014/1/14 godlike
Quote:
Hi all,

In the game engine that I am working on, I am designing a rendering thread that essentially executes all OpenGL calls (including SDL_GL_SwapWindow) instead of the main thread. The problem is that I am not quite sure if the scenario that I have in mind is something safe or it will lead to undefined behavior.

The idea is that in the main thread will:



Code:

SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | ...);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(...);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
    poll_input_and_joystick_events_using_SDL()
    do_other_things()
}




In the rendering thread:



Code:

SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
    execute_GL_calls()
    SDL_GL_SwapWindow(window);
}




The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in another thread and that I am doing some SDL stuff in the main thread and others in the rendering thread (polling events).

What do are your thoughts? Will this work or not?



Panagiotis Christopoulos Charitos
AnKi 3D Engine


_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Feasibility/correctness of calling GL in another thread
Jonas Kulla
Guest

2014/1/14 godlike
Quote:
Hi all,

In the game engine that I am working on, I am designing a rendering thread that essentially executes all OpenGL calls (including SDL_GL_SwapWindow) instead of the main thread. The problem is that I am not quite sure if the scenario that I have in mind is something safe or it will lead to undefined behavior.

The idea is that in the main thread will:



Code:

SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | ...);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(...);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
    poll_input_and_joystick_events_using_SDL()
    do_other_things()
}




In the rendering thread:



Code:

SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
    execute_GL_calls()
    SDL_GL_SwapWindow(window);
}




The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in another thread and that I am doing some SDL stuff in the main thread and others in the rendering thread (polling events).

What do are your thoughts? Will this work or not?



I'm doing almost exactly the same thing as you described in my engine: do polling/processing of
SDL events and setting state of the window in the main thread, and doing the rendering in another
dedicated thread. The only difference is that I create the window in the main thread, pass that
pointer into the rendering thread, and create the GL context there (I also use only one thread).


Haven't had any problems with this setup on Mac/Linux (Windows untested, but should be fine).
Feasibility/correctness of calling GL in another thread
Jonas Kulla
Guest

2014/1/16 Jonas Kulla
Quote:
I'm doing almost exactly the same thing as you described in my engine: do polling/processing of
SDL events and setting state of the window in the main thread, and doing the rendering in another
dedicated thread. The only difference is that I create the window in the main thread, pass that
pointer into the rendering thread, and create the GL context there (I also use only one thread).


Haven't had any problems with this setup on Mac/Linux (Windows untested, but should be fine).






Whoops, meant to say "I also only use one GL context".
Re: Feasibility/correctness of calling GL in another thread
slimshader


Joined: 26 Apr 2013
Posts: 39
Stefanos A. wrote:
This should work, provided your GPU drivers can do context sharing without going belly up. (This includes first-gen Atoms with PowerVR IGPs and some Core / Core2 mobile IGPs with old drivers.)

MonoGame does the exact same thing and it appears to be working fine.


That said, why do you need two OpenGL contexts?


2014/1/14 godlike
Quote:
Hi all,

In the game engine that I am working on, I am designing a rendering thread that essentially executes all OpenGL calls (including SDL_GL_SwapWindow) instead of the main thread. The problem is that I am not quite sure if the scenario that I have in mind is something safe or it will lead to undefined behavior.

The idea is that in the main thread will:



Code:

SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | ...);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(...);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
    poll_input_and_joystick_events_using_SDL()
    do_other_things()
}




In the rendering thread:



Code:

SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
    execute_GL_calls()
    SDL_GL_SwapWindow(window);
}




The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in another thread and that I am doing some SDL stuff in the main thread and others in the rendering thread (polling events).

What do are your thoughts? Will this work or not?



Panagiotis Christopoulos Charitos
AnKi 3D Engine


_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org



I had no problems with 2 contexts on Windows and Mac but I got crashes on iOS. I used 2nd GL context to upload textures in the background, while main thread was doing the rendering. I disabled background uploding (and 2nd ctx) in the end on iOS, didn't have enoych time to investigate
Feasibility/correctness of calling GL in another thread
Forest Hale
Guest

Short version:
Never use shared contexts for performance-conscious code, it costs way more than the failed (more on that later) overlap of the texture uploads.

Long version:
During early development of a major product (Steam Big Picture Mode) in the past that used multiple contexts for background uploading of OpenGL textures, we were told by multiple desktop GPU vendors
that the drivers flatly mutex every OpenGL call when you have shared contexts, this can result in major (~20%) fps loss even if you don't use the other context at all, it gets worse if you do, and in
particular the texture upload does NOT happen in parallel with rendering due to that mutexing.

So my advice is never do this, we changed the product to not do this before launch because it was completely not performant, we had been struggling to keep up 60fps until we did, then it easily
exceeded 200fps with that one change.

The hitching of texture uploads is pretty much unavoidable in OpenGL ES (iOS, Android, etc), on desktop OpenGL you can somewhat hide it with GL_ARB_pixel_buffer_object - where you glMapBuffer on the
main thread and then write the pixels from another thread, when done you glUnmapBuffer on the main thread and then issue the glTexImage2D with the pixel buffer object bound, so that it sources its
pixels from that object rather than blocking on a client memory copy, but I'm sure this isn't free and I have not tried it in practice, it also requires that you more or less queue your uploads for
the main thread to prepare in stages so that's some lovely ping-pong there.

While I too would greatly appreciate the addition of some background object upload functionality in OpenGL, or even an entire deferred command buffer system (I proposed this in a hardware-agnostic way
but it didn't gain traction), the reality today is that OpenGL contexts and threading are completely non-viable.

I should note that Doom 3 BFG Edition seems to use a glMapBuffer on each of 3 buffer objects (vertex, index, uniforms) at the beginning of the frame, queue jobs for all of the processing it wants to
do, so that threads write into those mapped buffers, and then at end of frame it does the glUnmapBuffer and walks its own command list to issue all the real GL calls that depend on that data - this
works very well, but is out of the scope of most OpenGL threading discussions.

On 01/16/2014 06:29 AM, slimshader wrote:
Quote:







Stefanos A. wrote:
This should work, provided your GPU drivers can do context sharing without going belly up. (This includes first-gen Atoms with PowerVR IGPs and some Core / Core2 mobile IGPs with old drivers.)

MonoGame does the exact same thing and it appears to be working fine.


That said, why do you need two OpenGL contexts?


2014/1/14 godlike <>







Quote:
Hi all,

In the game engine that I am working on, I am designing a rendering thread that essentially executes all OpenGL calls (including SDL_GL_SwapWindow) instead of the main thread. The problem is that I am
not quite sure if the scenario that I have in mind is something safe or it will lead to undefined behavior.

The idea is that in the main thread will:



Code:

SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | ...);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(...);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
poll_input_and_joystick_events_using_SDL()
do_other_things()
}




In the rendering thread:



Code:

SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
execute_GL_calls()
SDL_GL_SwapWindow(window);
}




The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in another thread and that I am doing some SDL stuff in the main thread and others in the rendering thread (polling events).

What do are your thoughts? Will this work or not?



Panagiotis Christopoulos Charitos
AnKi 3D Engine <http://www.anki3d.org/>


_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org



I had no problems with 2 contexts on Windows and Mac but I got crashes on iOS. I used 2nd GL context to upload textures in the background, while main thread was doing the rendering. I disabled
background uploding (and 2nd ctx) in the end on iOS, didn't have enoych time to investigate


_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org



--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Re: Feasibility/correctness of calling GL in another thread
slimshader


Joined: 26 Apr 2013
Posts: 39
Very interesting stuff, thanks a lot for sharing. Is there anything more you could provide on the topic (links possibly) ?

That said, I do not intend to use it for performance critical stuff but rather for loading screen. Main thread renders loading animation while background thread uploads whole level along with textures. In fact I did notice that this takes slightly longer than doing everything in main but user experience is much better with main thread still operational, showing anims and gameplay tips.

Forest Hale wrote:
Short version:
Never use shared contexts for performance-conscious code, it costs way more than the failed (more on that later) overlap of the texture uploads.

Long version:
During early development of a major product (Steam Big Picture Mode) in the past that used multiple contexts for background uploading of OpenGL textures, we were told by multiple desktop GPU vendors
that the drivers flatly mutex every OpenGL call when you have shared contexts, this can result in major (~20%) fps loss even if you don't use the other context at all, it gets worse if you do, and in
particular the texture upload does NOT happen in parallel with rendering due to that mutexing.

So my advice is never do this, we changed the product to not do this before launch because it was completely not performant, we had been struggling to keep up 60fps until we did, then it easily
exceeded 200fps with that one change.

The hitching of texture uploads is pretty much unavoidable in OpenGL ES (iOS, Android, etc), on desktop OpenGL you can somewhat hide it with GL_ARB_pixel_buffer_object - where you glMapBuffer on the
main thread and then write the pixels from another thread, when done you glUnmapBuffer on the main thread and then issue the glTexImage2D with the pixel buffer object bound, so that it sources its
pixels from that object rather than blocking on a client memory copy, but I'm sure this isn't free and I have not tried it in practice, it also requires that you more or less queue your uploads for
the main thread to prepare in stages so that's some lovely ping-pong there.

While I too would greatly appreciate the addition of some background object upload functionality in OpenGL, or even an entire deferred command buffer system (I proposed this in a hardware-agnostic way
but it didn't gain traction), the reality today is that OpenGL contexts and threading are completely non-viable.

I should note that Doom 3 BFG Edition seems to use a glMapBuffer on each of 3 buffer objects (vertex, index, uniforms) at the beginning of the frame, queue jobs for all of the processing it wants to
do, so that threads write into those mapped buffers, and then at end of frame it does the glUnmapBuffer and walks its own command list to issue all the real GL calls that depend on that data - this
works very well, but is out of the scope of most OpenGL threading discussions.
Feasibility/correctness of calling GL in another thread
Forest Hale
Guest

The problem is that as long as there are shared contexts, you incur the massive performance penalty - even if all calls are from one thread.

Hence don't use them - even if this means you have to queue texture uploads and vertex/index buffer creation and such for the main thread (showing the loading screen) to handle at its leisure, people
won't care about microstutter/hitching on a loading screen, it will still be pretty smooth because you're still running all your file I/O and other heavy operations on the other thread.

On 01/21/2014 01:31 AM, slimshader wrote:
Quote:
Very interesting stuff, thanks a lot for sharing. Is there anything more you could provide on the topic (links possibly) ?

That said, I do not intend to use it for performance critical stuff but rather for loading screen. Main thread renders loading animation while background thread uploads whole level along with
textures. In fact I did notice that this takes slightly longer than doing everything in main but user experience is much better with main thread still operational, showing anims and gameplay tips.








Forest Hale wrote:
Short version:
Never use shared contexts for performance-conscious code, it costs way more than the failed (more on that later) overlap of the texture uploads.

Long version:
During early development of a major product (Steam Big Picture Mode) in the past that used multiple contexts for background uploading of OpenGL textures, we were told by multiple desktop GPU vendors
that the drivers flatly mutex every OpenGL call when you have shared contexts, this can result in major (~20%) fps loss even if you don't use the other context at all, it gets worse if you do, and in
particular the texture upload does NOT happen in parallel with rendering due to that mutexing.

So my advice is never do this, we changed the product to not do this before launch because it was completely not performant, we had been struggling to keep up 60fps until we did, then it easily
exceeded 200fps with that one change.

The hitching of texture uploads is pretty much unavoidable in OpenGL ES (iOS, Android, etc), on desktop OpenGL you can somewhat hide it with GL_ARB_pixel_buffer_object - where you glMapBuffer on the
main thread and then write the pixels from another thread, when done you glUnmapBuffer on the main thread and then issue the glTexImage2D with the pixel buffer object bound, so that it sources its
pixels from that object rather than blocking on a client memory copy, but I'm sure this isn't free and I have not tried it in practice, it also requires that you more or less queue your uploads for
the main thread to prepare in stages so that's some lovely ping-pong there.

While I too would greatly appreciate the addition of some background object upload functionality in OpenGL, or even an entire deferred command buffer system (I proposed this in a hardware-agnostic way
but it didn't gain traction), the reality today is that OpenGL contexts and threading are completely non-viable.

I should note that Doom 3 BFG Edition seems to use a glMapBuffer on each of 3 buffer objects (vertex, index, uniforms) at the beginning of the frame, queue jobs for all of the processing it wants to
do, so that threads write into those mapped buffers, and then at end of frame it does the glUnmapBuffer and walks its own command list to issue all the real GL calls that depend on that data - this
works very well, but is out of the scope of most OpenGL threading discussions.



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org



--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Re: Feasibility/correctness of calling GL in another thread
slimshader


Joined: 26 Apr 2013
Posts: 39
Forest Hale wrote:
The problem is that as long as there are shared contexts, you incur the massive performance penalty - even if all calls are from one thread.

Hence don't use them - even if this means you have to queue texture uploads and vertex/index buffer creation and such for the main thread (showing the loading screen) to handle at its leisure, people
won't care about microstutter/hitching on a loading screen, it will still be pretty smooth because you're still running all your file I/O and other heavy operations on the other thread.


What you are saying is really scary. You mean that even after I loaded a level using 2nd shared ctx and it is not used anymore, mere fact that it exists causes main thread context to go through some kind of locking mechanism? What if I then destroy 2nd ctx? Does the lock go too?

Is it specific to a driver or a platform? I deal with Win and iOS. Is it specific to GL version used? I am still limiting myself to GL 1.0 as there is too much driver issues on Win with anything above and I am doing 2D games.
Feasibility/correctness of calling GL in another thread
Jared Maddox
Guest

Quote:
Date: Tue, 21 Jan 2014 11:35:28 +0000
From: "slimshader"
To:
Subject: Re: [SDL] Feasibility/correctness of calling GL in another
thread
Message-ID:
Content-Type: text/plain; charset="iso-8859-1"


Forest Hale wrote:
Quote:
The problem is that as long as there are shared contexts, you incur the
massive performance penalty - even if all calls are from one thread.

Hence don't use them - even if this means you have to queue texture
uploads and vertex/index buffer creation and such for the main thread
(showing the loading screen) to handle at its leisure, people
won't care about microstutter/hitching on a loading screen, it will still
be pretty smooth because you're still running all your file I/O and other
heavy operations on the other thread.



What you are saying is really scary. You mean that even after I loaded a
level using 2nd shared ctx and it is not used anymore, mere fact that it
exists causes main thread context to go through some kind of locking
mechanism?

There is simply no way for the driver to know that you won't be using
that context if it's still around, so how can it do otherwise?
Graphics card vendors don't normally sell programs intended to
optimize your NON-graphics code, and knowing that you won't be using
the context again basically falls into the same category of things as
that.


Quote:
What if I then destroy 2nd ctx? Does the lock go too?


That will depend on the driver. Thus, you should assume "No".


Quote:
Is it specific to a driver or a platform?

I believe that Forest (or was it someone else? it was a few days ago)
already said that he was told by someone who's involved in the
production of video cards that it happens with everything. Indeed, it
would surely be extremely difficult, and maybe impossible, for it to
be otherwise.


Quote:
I deal with Win and iOS. Is it
specific to GL version used?

It's possible that it could happen in DirectX as well. I don't know if
they have any "lockless" APIs, but even if they do it doesn't mean
that everyone implements it without locking.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Feasibility/correctness of calling GL in another thread
Forest Hale
Guest

For Direct3D the HAL always locks (like OpenGL's shared contexts) but the locks are on resources rather than API entry points, so there is a performance loss inherent in that API design choice
compared to OpenGL (which goes "full throttle" in the single threaded case), this gives some scalability with threading but performance gains fall off sharply with additional threads (so one
additional thread may be justified but not more, unless you like wasting electricity on spin locks - and that second thread just brings you up to OpenGL performance!).

Multiple vendors for PC drivers directly told me that their OpenGL drivers lock on every call in case of shared contexts, they make no attempt at overlapping operations like this, it is considered
exotic behavior in the context of OpenGL API usage, something that games and other consumer apps do not do, it could be accelerated somewhat on their CAD-specific drivers (such as NVIDIA Quadro series
and AMD FirePro series) but I do not have data on those.

I would be quite wary of shared contexts on mobile operating systems such as iOS and Android as the driver vendors have been known to have countless bugs throughout their API even in single-threaded
usage, I don't know how they handle shared contexts and it might vary by make and model. Or it could be the unicorn feature in their driver that always works despite everything else being randomly
broken; I'm not placing bets.

On 01/21/2014 10:24 AM, Jared Maddox wrote:
Quote:
Quote:
Date: Tue, 21 Jan 2014 11:35:28 +0000
From: "slimshader"
To:
Subject: Re: [SDL] Feasibility/correctness of calling GL in another
thread
Message-ID:
Content-Type: text/plain; charset="iso-8859-1"


Forest Hale wrote:
Quote:
The problem is that as long as there are shared contexts, you incur the
massive performance penalty - even if all calls are from one thread.

Hence don't use them - even if this means you have to queue texture
uploads and vertex/index buffer creation and such for the main thread
(showing the loading screen) to handle at its leisure, people
won't care about microstutter/hitching on a loading screen, it will still
be pretty smooth because you're still running all your file I/O and other
heavy operations on the other thread.



What you are saying is really scary. You mean that even after I loaded a
level using 2nd shared ctx and it is not used anymore, mere fact that it
exists causes main thread context to go through some kind of locking
mechanism?

There is simply no way for the driver to know that you won't be using
that context if it's still around, so how can it do otherwise?
Graphics card vendors don't normally sell programs intended to
optimize your NON-graphics code, and knowing that you won't be using
the context again basically falls into the same category of things as
that.


Quote:
What if I then destroy 2nd ctx? Does the lock go too?


That will depend on the driver. Thus, you should assume "No".


Quote:
Is it specific to a driver or a platform?

I believe that Forest (or was it someone else? it was a few days ago)
already said that he was told by someone who's involved in the
production of video cards that it happens with everything. Indeed, it
would surely be extremely difficult, and maybe impossible, for it to
be otherwise.


Quote:
I deal with Win and iOS. Is it
specific to GL version used?

It's possible that it could happen in DirectX as well. I don't know if
they have any "lockless" APIs, but even if they do it doesn't mean
that everyone implements it without locking.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org



--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Re: Feasibility/correctness of calling GL in another thread
slimshader


Joined: 26 Apr 2013
Posts: 39
Great stuff guys, thanks. I removed my 2nd ctx and now all do all texture operations on the main thread (2nd thread pushes texture data to queue and waits for main thread for upload). It was actually (surprisingly) easy to do with std::promise/future.

Level loads bit faster now and I have additional benefit of things working the same way on Win and iOS. Good to know that I should do the same for D3D implementation.

Forest Hale wrote:
For Direct3D the HAL always locks (like OpenGL's shared contexts) but the locks are on resources rather than API entry points, so there is a performance loss inherent in that API design choice
compared to OpenGL (which goes "full throttle" in the single threaded case), this gives some scalability with threading but performance gains fall off sharply with additional threads (so one
additional thread may be justified but not more, unless you like wasting electricity on spin locks - and that second thread just brings you up to OpenGL performance!).

Multiple vendors for PC drivers directly told me that their OpenGL drivers lock on every call in case of shared contexts, they make no attempt at overlapping operations like this, it is considered
exotic behavior in the context of OpenGL API usage, something that games and other consumer apps do not do, it could be accelerated somewhat on their CAD-specific drivers (such as NVIDIA Quadro series
and AMD FirePro series) but I do not have data on those.

I would be quite wary of shared contexts on mobile operating systems such as iOS and Android as the driver vendors have been known to have countless bugs throughout their API even in single-threaded
usage, I don't know how they handle shared contexts and it might vary by make and model. Or it could be the unicorn feature in their driver that always works despite everything else being randomly
broken; I'm not placing bets.

On 01/21/2014 10:24 AM, Jared Maddox wrote:
Quote:
Quote:
Date: Tue, 21 Jan 2014 11:35:28 +0000
From: "slimshader"
To:
Subject: Re: [SDL] Feasibility/correctness of calling GL in another
thread
Message-ID:
Content-Type: text/plain; charset="iso-8859-1"


Forest Hale wrote:
Quote:
The problem is that as long as there are shared contexts, you incur the
massive performance penalty - even if all calls are from one thread.

Hence don't use them - even if this means you have to queue texture
uploads and vertex/index buffer creation and such for the main thread
(showing the loading screen) to handle at its leisure, people
won't care about microstutter/hitching on a loading screen, it will still
be pretty smooth because you're still running all your file I/O and other
heavy operations on the other thread.



What you are saying is really scary. You mean that even after I loaded a
level using 2nd shared ctx and it is not used anymore, mere fact that it
exists causes main thread context to go through some kind of locking
mechanism?

There is simply no way for the driver to know that you won't be using
that context if it's still around, so how can it do otherwise?
Graphics card vendors don't normally sell programs intended to
optimize your NON-graphics code, and knowing that you won't be using
the context again basically falls into the same category of things as
that.


Quote:
What if I then destroy 2nd ctx? Does the lock go too?


That will depend on the driver. Thus, you should assume "No".


Quote:
Is it specific to a driver or a platform?

I believe that Forest (or was it someone else? it was a few days ago)
already said that he was told by someone who's involved in the
production of video cards that it happens with everything. Indeed, it
would surely be extremely difficult, and maybe impossible, for it to
be otherwise.


Quote:
I deal with Win and iOS. Is it
specific to GL version used?

It's possible that it could happen in DirectX as well. I don't know if
they have any "lockless" APIs, but even if they do it doesn't mean
that everyone implements it without locking.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org



--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Nathaniel J Fries


Joined: 30 Mar 2010
Posts: 444
You can do this with drivers that support context sharing, sure. But it would make for simpler code and be more portable to do the opposite: render in the main thread, process events in a secondary thread.

SDL_PumpEvents will still need to be called from the main thread for most OSes. But except for a couple of user-initiated loops on Windows, this should have no effect on framerate (I've benchmarked the equivalent code of SDL_PumpEvents and it usually takes about 5microsec to run on a Pentium 4 - 60fps requires loop time < 16ms, or 3000x that)
Nathaniel J Fries


Joined: 30 Mar 2010
Posts: 444
You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.

Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you're developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs.

If you aren't doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It's also extremely simple to implement - in C or C++, it can be implemented as nothing more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]).
Feasibility/correctness of calling GL in another thread
Jonathan Greig
Guest

Having a tail pointer in a singly linked list is always a good idea when optimizing for performance. It makes all items appended to the end of the list or removed from the end of the list faster because it takes constant time O(1) and if you are accessing the last element frequently, that's icing on the cake Smile On Jan 22, 2014 11:29 PM, "Nathaniel J Fries" wrote:
Quote:
You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.

Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you're developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs.

If you aren't doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It's also extremely simple to implement - in C or C++, it can be implemented as nothing more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]).



Nate Fries


_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

slimshader


Joined: 26 Apr 2013
Posts: 39
I hear you, but I am in the niche of older machines, in fact I am having trouble even getting users to get OpenGL 1.4 work correctly (for VBOs (in fact I think it is about time I implement D3D renderer)), also mobile devices are not what you'd consider 2010s "gaming rigs". That being said, secondary thread is (as I said before) not used to speed-up level loading but rather to keep main (event processing, rendering) thread responsive.

I do use task queues (double-buffered) but they are per-thread. Since cross-thread tasks are very seldom I don't want them having locks all the time.

BTW. Since we are on threading / GL topic: do you guys render from main thread? What are your update vs render step strategies? If you do them on separate threads how do you sync later (condition vars seem obvious choice)?.

Nathaniel J Fries wrote:
You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.

Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you're developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs.

If you aren't doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It's also extremely simple to implement - in C or C++, it can be implemented as nothing more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]).
Re: Feasibility/correctness of calling GL in another thread
slimshader


Joined: 26 Apr 2013
Posts: 39
I might be missing something here but how do you even implement a list without tail pointer? You always keep at least one end otherwise it would be inaccessible. In any case, node-based lists suck Razz

Jonathan Greig wrote:
Having a tail pointer in a singly linked list is always a good idea when optimizing for performance. It makes all items appended to the end of the list or removed from the end of the list faster because it takes constant time O(1) and if you are accessing the last element frequently, that's icing on the cake Smile On Jan 22, 2014 11:29 PM, "Nathaniel J Fries" wrote:
Quote:
You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.

Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you're developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs.

If you aren't doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It's also extremely simple to implement - in C or C++, it can be implemented as nothing more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]).



Nate Fries


_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Feasibility/correctness of calling GL in another thread
Stefanos A.
Guest

2014/1/23 slimshader
Quote:
I hear you, but I am in the niche of older machines, in fact I am having trouble even getting users to get OpenGL 1.4 work correctly (for VBOs (in fact I think it is about time I implement D3D renderer)), also mobile devices are not what you'd consider 2010s "gaming rigs". That being said, secondary thread is (as I said before) not used to speed-up level loading but rather to keep main (event processing, rendering) thread responsive.

I do use task queues (double-buffered) but they are per-thread. Since cross-thread tasks are very seldom I don't want them having locks all the time.

BTW. Since we are on threading / GL topic: do you guys render from main thread? What are your update vs render step strategies? If you do them on separate threads how do you sync later (condition vars seem obvious choice)?.




After trying several threading strategies, my current preference is to keep rendering and window management to the main thread, but handle input on a secondary thread. So far, this has proven the best method to maintain responsiveness without impacting compatibility.


Regarding D3D... I prefer to use ANGLE to get OpenGL ES 2.0 on systems without proper OpenGL support. This way, I only need to maintain two renderers: OpenGL everywhere and OpenGL ES for smartphones and (Windows & ~(Nvidia | AMD)).


This way, I can also use shaders across the board. ANGLE works all the way down to GMA 950 (and probably GMA 500/Poulsbo, although I haven't tested that), so there's very little reason to use the fixed-function pipeline. Microsoft recently announced they will be working with Google to port ANGLE on WinPhones and Metro, so D3D will be strictly unnecessary going forward - as an indie developer, this suits me perfectly.
 
Quote:



Nathaniel J Fries wrote:

You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.

Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you're developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs.

If you aren't doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It's also extremely simple to implement - in C or C++, it can be implemented as nothing more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]).





_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Re: Feasibility/correctness of calling GL in another thread
slimshader


Joined: 26 Apr 2013
Posts: 39
Stefanos A. wrote:


After trying several threading strategies, my current preference is to keep rendering and window management to the main thread, but handle input on a secondary thread. So far, this has proven the best method to maintain responsiveness without impacting compatibility.



But how that does help? If main thread is blocked, then you don't refresh your screen to show the impact of processed events. In my experience event handling is tiny fraction of a frame. Do you mean that with 2nd thread event handling you avoid "busy" system cursor / window appearing to hang?

Quote:


Regarding D3D... I prefer to use ANGLE to get OpenGL ES 2.0 on systems without proper OpenGL support. This way, I only need to maintain two renderers: OpenGL everywhere and OpenGL ES for smartphones and (Windows & ~(Nvidia | AMD)).


This way, I can also use shaders across the board. ANGLE works all the way down to GMA 950 (and probably GMA 500/Poulsbo, although I haven't tested that), so there's very little reason to use the fixed-function pipeline. Microsoft recently announced they will be working with Google to port ANGLE on WinPhones and Metro, so D3D will be strictly unnecessary going forward - as an indie developer, this suits me perfectly.
 


I had ANGLE on my radar but now you really got me interesting in this. I am only really interested in 2 platforms: Win and iOS, this means I would only need to maintain GL ES renderer. That would be great. Definitely going to look into it.
Feasibility/correctness of calling GL in another thread
Stefanos A.
Guest

2014/1/23 slimshader
Quote:



Stefanos A. wrote:



After trying several threading strategies, my current preference is to keep rendering and window management to the main thread, but handle input on a secondary thread. So far, this has proven the best method to maintain responsiveness without impacting compatibility.






But how that does help? If main thread is blocked, then you don't refresh your screen to show the impact of processed events. In my experience event handling is tiny fraction of a frame. Do you mean that with 2nd thread event handling you avoid "busy" system cursor / window appearing to hang?



The point is not to improve performance but to minimize latency between the user pressing a button and the world reacting to that button press.


If you handle input in your rendering thread, then any dip in the framerate will increase input latency, which can be jarring (esp. on slower systems that cannot maintain a stable framerate.) By spawning a separate thread for input, the OS scheduler will "smoothen out" input latency even when your framerate dips below 10 fps.


Of course, this only helps if your world update rate is decoupled from your framerate. In my case, I will skip up to 12 frames in order to guarantee a pseudo-fixed update rate. In other words, I prioritize world updates (60 updates/sec no matter what) and only render frames as a best-effort.


This way, if the player presses the "fire" trigger then she will shoot the enemy immediately even if she is running at 5 fps.


If the input was handled in the same thread, then the "fire" button would take up to 200ms to register - or it would be skipped completely, if the player lifted her finger before the 200ms mark. This would place the player at a severe disadvantage (hi, Diablo 3!)
Re: Feasibility/correctness of calling GL in another thread
slimshader


Joined: 26 Apr 2013
Posts: 39
Stefanos A. wrote:
2014/1/23 slimshader
Quote:



Stefanos A. wrote:



After trying several threading strategies, my current preference is to keep rendering and window management to the main thread, but handle input on a secondary thread. So far, this has proven the best method to maintain responsiveness without impacting compatibility.






But how that does help? If main thread is blocked, then you don't refresh your screen to show the impact of processed events. In my experience event handling is tiny fraction of a frame. Do you mean that with 2nd thread event handling you avoid "busy" system cursor / window appearing to hang?



The point is not to improve performance but to minimize latency between the user pressing a button and the world reacting to that button press.


If you handle input in your rendering thread, then any dip in the framerate will increase input latency, which can be jarring (esp. on slower systems that cannot maintain a stable framerate.) By spawning a separate thread for input, the OS scheduler will "smoothen out" input latency even when your framerate dips below 10 fps.


Of course, this only helps if your world update rate is decoupled from your framerate. In my case, I will skip up to 12 frames in order to guarantee a pseudo-fixed update rate. In other words, I prioritize world updates (60 updates/sec no matter what) and only render frames as a best-effort.


This way, if the player presses the "fire" trigger then she will shoot the enemy immediately even if she is running at 5 fps.


If the input was handled in the same thread, then the "fire" button would take up to 200ms to register - or it would be skipped completely, if the player lifted her finger before the 200ms mark. This would place the player at a severe disadvantage (hi, Diablo 3!)


Clean now Smile

A question: I just tried to build minimal GL ES2 app under Win but I am getting unresolved externals for glClear, glClearColor and 2 more in sdl_main function. So it clearly wants to use full GL. I assume you use SDL2 with ANGLE?
Feasibility/correctness of calling GL in another thread
Stefanos A.
Guest

I am using ANGLE with and without SDL2, but I'm using C#/OpenTK which loads both libraries dynamically - so I cannot really help you on these errors, sorry. ("Dynamically" in this case means using LoadLibrary + GetProcAddress("eglGetProcAddress") and then using eglGetProcAddress to load the rest of the entry points.)

IIRC, I had to compile SDL2 from hg in order to get ANGLE working, but it was otherwise straightforward.



2014/1/23 slimshader
Quote:



Stefanos A. wrote:

2014/1/23 slimshader <>



Quote:




Stefanos A. wrote:



After trying several threading strategies, my current preference is to keep rendering and window management to the main thread, but handle input on a secondary thread. So far, this has proven the best method to maintain responsiveness without impacting compatibility.






But how that does help? If main thread is blocked, then you don't refresh your screen to show the impact of processed events. In my experience event handling is tiny fraction of a frame. Do you mean that with 2nd thread event handling you avoid "busy" system cursor / window appearing to hang?






The point is not to improve performance but to minimize latency between the user pressing a button and the world reacting to that button press.


If you handle input in your rendering thread, then any dip in the framerate will increase input latency, which can be jarring (esp. on slower systems that cannot maintain a stable framerate.) By spawning a separate thread for input, the OS scheduler will "smoothen out" input latency even when your framerate dips below 10 fps.


Of course, this only helps if your world update rate is decoupled from your framerate. In my case, I will skip up to 12 frames in order to guarantee a pseudo-fixed update rate. In other words, I prioritize world updates (60 updates/sec no matter what) and only render frames as a best-effort.


This way, if the player presses the "fire" trigger then she will shoot the enemy immediately even if she is running at 5 fps.


If the input was handled in the same thread, then the "fire" button would take up to 200ms to register - or it would be skipped completely, if the player lifted her finger before the 200ms mark. This would place the player at a severe disadvantage (hi, Diablo 3!)






Clean now

A question: I just tried to build minimal GL ES2 app under Win but I am getting unresolved externals for glClear, glClearColor and 2 more in sdl_main function. So it clearly wants to use full GL. I assume you use SDL2 with ANGLE?


_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Feasibility/correctness of calling GL in another thread
Jonathan Greig
Guest

Quote:
I might be missing something here but how do you even implement a list without tail pointer? You always keep at least one end otherwise it would be inaccessible. In any case, node-based lists suck 


slimshader,
While I normally don't reference Wikipedia for programming matters, look at the tradeoff section ( http://en.wikipedia.org/wiki/Linked_list#Tradeoffs ) and particularly the chart where the last element is known. I was mostly clarifying Nathan's comment that keeping track of the last element _will be faster_ rather than _may be faster_ in case others wish to take his approach. When he mentioned tail pointer, he meant keeping track of the last element. Ultimately, the best choice of container is suited to the particular problem and as Nathan mentioned, a singly-linked-list is easy to implement. In some cases you may not have the luxury of using a standard library and rolling your own is the only choice.
Nathaniel J Fries


Joined: 30 Mar 2010
Posts: 444
slimshader wrote:
I hear you, but I am in the niche of older machines, in fact I am having trouble even getting users to get OpenGL 1.4 work correctly (for VBOs (in fact I think it is about time I implement D3D renderer)), also mobile devices are not what you'd consider 2010s "gaming rigs". That being said, secondary thread is (as I said before) not used to speed-up level loading but rather to keep main (event processing, rendering) thread responsive.

I do use task queues (double-buffered) but they are per-thread. Since cross-thread tasks are very seldom I don't want them having locks all the time.

BTW. Since we are on threading / GL topic: do you guys render from main thread? What are your update vs render step strategies? If you do them on separate threads how do you sync later (condition vars seem obvious choice)?

It may still make sense to avoid the second thread by making tasks shorter. Use non-blocking or even asynchronous I/O, or memory mapping (all three options will require writing system-specific code), and you will find that resource loading tasks will not impede the render and event processing tasks (since upload to GPU already blocks other API calls on most implementations anyway). Blocking on file reads is probably the bottleneck pushing you towards the direction you're going.

Jonathan Greig wrote:
I was mostly clarifying Nathan's comment that keeping track of the last element _will be faster_ rather than _may be faster_ in case others wish to take his approach.

This is true 99% of the time, but not 100% of the time. Maintaining a tail pointer is occasionally not worth the added complexity, and I have seen situations in which it was actually an optimization to remove it. O(1) and O(n) for tail insertion are equal when there is only one element, but the O(n) code may actually perform better. This is why I disagree with professors and professionals alike treating the big-O notation as the silver bullet of algorithms. It is a useful tool, but there is never a silver bullet.