The SDL forums have moved to discourse.libsdl.org.
This is just a read-only archive of the previous forums, to keep old links working.


SDL Forum Index
SDL
Simple Directmedia Layer Forums
Using SDL_atomic
mbabuskov


Joined: 08 Feb 2015
Posts: 29
Hi,

I'm trying to use SDL_Atomic, but I don't understand it fully. My code seems to work in different variants I tried, but it also works if I use simple integer without atomic stuff, so I'm not sure it will guard against rare race conditions. Here's what it needs to do:

I have a game where saving takes a lot of time (it uses JSON format and has 1000+ objects, so it's slow) and I decided to use a secondary thread. I'm using a single bool flag to mark that world is "saving", so that main code knows not to touch some important objects and it also shows to the user when it's done.

MAIN THREAD runs this code:

bool isSaving = false;
while (true)
{
... poll and handle sdl events (some stuff that would modify world is ignored if isSaving)
... render stuff
}

SAVE THREAD runs this code:

void save()
{
isSaving = true;
... do stuff
isSaving = false;
}

The thread is detached and gets destroyed when this function is done.

How should I convert this code to use SDL_atomic functions?

It seems to me that using SDL_atomic_t instead of bool would be straightforward:

SDL_AtomicGet(SDL_atomic_t *a);
SDL_AtomicSet(SDL_atomic_t *a, int v);


But, SDL_atomic.h says:

* IMPORTANT:
* If you are not an expert in concurrent lockless programming, you should
* only be using the atomic lock and reference counting functions in this
* file. In all other cases you should be protecting your data structures
* with full mutexes.
*
* The list of "safe" functions to use are:
* SDL_AtomicLock()
* SDL_AtomicUnlock()
* SDL_AtomicIncRef()
* SDL_AtomicDecRef()


Can I still use SDL_AtomicGet/Set and is there something important to know?

Thanks.
Using SDL_atomic
Sik


Joined: 26 Nov 2011
Posts: 905
To be fair an atomic here may be enough if you just want to pass
around a flag. Basically:

1) Make the flag of type SDL_atomic_t
2) Do SDL_AtomicSet(&flag, 0) and then spawn the secondary thread
3a) In the main thread, do SDL_AtomicGet(&flag) until it doesn't return 0
3b) In the secondary thread, do SDL_AtomicSet(&flag, 1) when it's done saving
4) Let the secondary thread die Razz

(swap 0 and 1 if you think that makes more sense, the point stands)

Atomics are enough when you're doing stuff like signaling the status
of something through a flag. If it's data that's constantly being
shared among threads then a mutex is better. (a good example of this
is the audio callback in SDL, changing anything that the callback uses
requires using a mutex)
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Neil White
Guest

http://wiki.libsdl.org/SDL_AtomicSet needs code as well


On Sat, Feb 28, 2015 at 5:10 AM, Sik the hedgehog wrote:
Quote:
To be fair an atomic here may be enough if you just want to pass
around a flag. Basically:

1) Make the flag of type SDL_atomic_t
2) Do SDL_AtomicSet(&flag, 0) and then spawn the secondary thread
3a) In the main thread, do SDL_AtomicGet(&flag) until it doesn't return 0
3b) In the secondary thread, do SDL_AtomicSet(&flag, 1) when it's done saving
4) Let the secondary thread die Razz

(swap 0 and 1 if you think that makes more sense, the point stands)

Atomics are enough when you're doing stuff like signaling the status
of something through a flag. If it's data that's constantly being
shared among threads then a mutex is better. (a good example of this
is the audio callback in SDL, changing anything that the callback uses
requires using a mutex)
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
eric.w


Joined: 12 Feb 2014
Posts: 38
On Tue, Feb 24, 2015 at 2:46 PM, mbabuskov wrote:
Quote:
Hi,

I'm trying to use SDL_Atomic, but I don't understand it fully. My code seems
to work in different variants I tried, but it also works if I use simple
integer without atomic stuff, so I'm not sure it will guard against rare
race conditions. Here's what it needs to do:

I have a game where saving takes a lot of time (it uses JSON format and has
1000+ objects, so it's slow) and I decided to use a secondary thread. I'm
using a single bool flag to mark that world is "saving", so that main code
knows not to touch some important objects and it also shows to the user when
it's done.

MAIN THREAD runs this code:

bool isSaving = false;
while (true)
{
... poll and handle sdl events (some stuff that would modify world is
ignored if isSaving)
... render stuff
}

SAVE THREAD runs this code:

void save()
{
isSaving = true;
... do stuff
isSaving = false;
}

The thread is detached and gets destroyed when this function is done.

How should I convert this code to use SDL_atomic functions?

It seems to me that using SDL_atomic_t instead of bool would be
straightforward:

SDL_AtomicGet(SDL_atomic_t *a);
SDL_AtomicSet(SDL_atomic_t *a, int v);


But, SDL_atomic.h says:

* IMPORTANT:
* If you are not an expert in concurrent lockless programming, you should
* only be using the atomic lock and reference counting functions in this
* file. In all other cases you should be protecting your data structures
* with full mutexes.
*
* The list of "safe" functions to use are:
* SDL_AtomicLock()
* SDL_AtomicUnlock()
* SDL_AtomicIncRef()
* SDL_AtomicDecRef()


Can I still use SDL_AtomicGet/Set and is there something important to know?

Thanks.

Hi,

I think your proposal will have a race condition like this:

// main thread
if (!saving) // atomic read of saving here
{
// Problem: saving thread might 'saving' to true right now
// main thread modifies world while saving thread is reading it =>
corrupted save
}


To make it safe, you should use a lock/mutex. Just from skimming the
SDL_Lock documentation, and assuming you want the main thread to skip
the world update during saving, I'd do it like this:

// main thread is going to update world
if (SDL_AtomicTryLock (lock)) // if the saving thread is currently
holding the lock, don't wait for it, just skip updating the world
{
// update world here
SDL_AtomicUnlock(lock);
}

// saving thread.
SDL_AtomicLock(lock); // If the main thread is currently updating
world, this will (correctly) wait until the lock is released.
// read world state and save it
SDL_AtomicUnlock(lock);

The thing to watch out for here is you have to be 100% sure the
SDL_AtomicUnlock calls are made,
i.e. watch out for exceptions / gotos / break statements / etc. - if
you miss unlocking for some reason, you'll get a deadlock (or else the
world will stop updating.)

Hope this helps.
Eric
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Neil White
Guest

im gonna attempt to update all the wiki when someone gives me a login and a laptop ( laptop tomorrow ) a lot that is missing from the page is in the code tests just need ctrl c & v , unless someone has written an elaborate script


On Sat, Feb 28, 2015 at 1:52 PM, Eric Wasylishen wrote:
Quote:
On Tue, Feb 24, 2015 at 2:46 PM, mbabuskov wrote:
Quote:
Hi,

I'm trying to use SDL_Atomic, but I don't understand it fully. My code seems
to work in different variants I tried, but it also works if I use simple
integer without atomic stuff, so I'm not sure it will guard against rare
race conditions. Here's what it needs to do:

I have a game where saving takes a lot of time (it uses JSON format and has
1000+ objects, so it's slow) and I decided to use a secondary thread. I'm
using a single bool flag to mark that world is "saving", so that main code
knows not to touch some important objects and it also shows to the user when
it's done.

MAIN THREAD runs this code:

bool isSaving = false;
while (true)
{
... poll and handle sdl events (some stuff that would modify world is
ignored if isSaving)
... render stuff
}

SAVE THREAD runs this code:

void save()
{
isSaving = true;
... do stuff
isSaving = false;
}

The thread is detached and gets destroyed when this function is done.

How should I convert this code to use SDL_atomic functions?

It seems to me that using SDL_atomic_t instead of bool would be
straightforward:

SDL_AtomicGet(SDL_atomic_t *a);
SDL_AtomicSet(SDL_atomic_t *a, int v);


But, SDL_atomic.h says:

* IMPORTANT:
* If you are not an expert in concurrent lockless programming, you should
* only be using the atomic lock and reference counting functions in this
* file. In all other cases you should be protecting your data structures
* with full mutexes.
*
* The list of "safe" functions to use are:
* SDL_AtomicLock()
* SDL_AtomicUnlock()
* SDL_AtomicIncRef()
* SDL_AtomicDecRef()


Can I still use SDL_AtomicGet/Set and is there something important to know?

Thanks.

Hi,

I think your proposal will have a race condition like this:

// main thread
if (!saving) // atomic read of saving here
{
// Problem: saving thread might 'saving' to true right now
// main thread modifies world while saving thread is reading it =>
corrupted save
}


To make it safe, you should use a lock/mutex. Just from skimming the
SDL_Lock documentation, and assuming you want the main thread to skip
the world update during saving, I'd do it like this:

// main thread is going to update world
if (SDL_AtomicTryLock (lock)) // if the saving thread is currently
holding the lock, don't wait for it, just skip updating the world
{
    // update world here
    SDL_AtomicUnlock(lock);
}

// saving thread.
SDL_AtomicLock(lock); // If the main thread is currently updating
world, this will (correctly) wait until the lock is released.
// read world state and save it
SDL_AtomicUnlock(lock);

The thing to watch out for here is you have to be 100% sure the
SDL_AtomicUnlock calls are made,
i.e. watch out for exceptions / gotos / break statements / etc. - if
you miss unlocking for some reason, you'll get a deadlock (or else the
world will stop updating.)

Hope this helps.
Eric
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


Using SDL_atomic
john skaller
Guest

On 01/03/2015, at 12:10 AM, Sik the hedgehog wrote:

Quote:
To be fair an atomic here may be enough if you just want to pass
around a flag.

Not really. Atomic operations ensure that reads and writes
of the data return a coherent value. That's all.

So if you increment a variable atomically in thread 1,
thread 2 will either read the old value, or it will read
the new value. It won't read a messed up value even if the
variable is multibyte.

A mutex is just a user defined atomic operation.
So in principle it also cannot be used to "pass" a value.

The best way to synchronise threads is to use channels.
[Library like 0MQ provides this, as does my Felix system]

Unfortunately these primitives are not available on most
processors and not available in most programming languages
either.

So you will have to use the next best thing, which is a Posix
semaphore. Semaphores synchronise threads by allowing
one thread to set some data then signal another thread
it has done so. The other thread can wait for the signal.

General semaphores are of course not useful for
passing data though.

Posix therefore had to enhance the concept to include
fencing, that is, to ensure data to be communicated at
the synchronisation time are shared.

Unfortunately, there's no way to know what is to be shared
so EVERYTHING has to be synchronised.

This is true of a Posix mutex too.

Anyhow, semaphores are the way to go with Posix.
Windows doesn't have them, but there are emulations
available. No idea about iOS or Android.

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Sik


Joined: 26 Nov 2011
Posts: 905
2015-02-28 20:02 GMT-03:00, john skaller:
Quote:
Not really. Atomic operations ensure that reads and writes
of the data return a coherent value. That's all.

So if you increment a variable atomically in thread 1,
thread 2 will either read the old value, or it will read
the new value. It won't read a messed up value even if the
variable is multibyte.

Flags are booleans though (;゚ω゚)
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
john skaller
Guest

On 01/03/2015, at 1:35 PM, Sik the hedgehog wrote:

Quote:
2015-02-28 20:02 GMT-03:00, john skaller:
Quote:
Not really. Atomic operations ensure that reads and writes
of the data return a coherent value. That's all.

So if you increment a variable atomically in thread 1,
thread 2 will either read the old value, or it will read
the new value. It won't read a messed up value even if the
variable is multibyte.

Flags are booleans though (;゚ω゚)

That's true but the encoding could be 64 bit 0
or (int64_1)(-1) = 0xFFFFFFFFFFFFFFFF.

Without an atomic operation one might set the second value
but read

0xFFFFFFFF00000000

and who knows what the effect would be on the application?
[Assuming the data bus is only 32 bit :]

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Sik


Joined: 26 Nov 2011
Posts: 905
2015-03-01 0:41 GMT-03:00, john skaller:
Quote:
That's true but the encoding could be 64 bit 0
or (int64_1)(-1) = 0xFFFFFFFFFFFFFFFF.

Without an atomic operation one might set the second value
but read

0xFFFFFFFF00000000

and who knows what the effect would be on the application?
[Assuming the data bus is only 32 bit :]

Yeah, but I was talking about doing it with SDL_AtomicSet/Get which
*are* atomic (also IIRC the underlying type is 32-bit signed integer
anyway). For a case like this a full-blown mutex is overkill, mutexes
are designed for sharing around larger amounts of data rather than
just a flag.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Re: Using SDL_atomic
mbabuskov


Joined: 08 Feb 2015
Posts: 29
eric.w wrote:
I think your proposal will have a race condition like this:

// main thread
if (!saving) // atomic read of saving here
{
// Problem: saving thread might 'saving' to true right now
// main thread modifies world while saving thread is reading it =>
corrupted save


I fail to see how can this create problems. The saving thread sets "saving" to true just before quitting. It won't try to read world anymore. The next time save is done, a new saving thread will be started.

But, I see other potential problems:

1. it's possible that changed value is not propagated to the main thread quickly (or ever). As far as I researched this on the Internet, it seems that this should never happen on common modern platforms (Intel CPUs where this game will run), but it could in theory. AFAICT, using SDL_SetAtomic would fix this.

2. The saving thread code is:

void save()
{
... save stuff
saveFlag = false;
}

As I understand, some compiler/CPU might "optimize" the code and since saveFlag is not checked or used in the whole save() function it could reorder the code to make that statement run *before* the "safe stuff" or somewhere in the middle of it.

I changed the code to use SDL_AtomicSet.

void save()
{
... save stuff
SDL_AtomicSet(&saveFlag, 0);
}

Would that make sure that code is not reordered? Is there some chance that a compiler or CPU could reorder my SDL_AtomicSet call to run before "save stuff"?
mbabuskov


Joined: 08 Feb 2015
Posts: 29
Actually, to clarify. My code looks like this now:

MAIN THREAD:

while (!gameOver)
{
...
int isSaving = SDL_AtomicGet(&saveFlag);
if (needsSaving && !isSaving)
{
needsSaving = false;
SDL_AtomicSet(&saveFlag, 1);
start detached save thread
}

if (!isSaving)
updateDangerousStuff();

updateOtherStuff();
}

SAVE THREAD:

void save()
{
...save stuff to file

SDL_AtomicSet(&saveFlag, 0);

...thread gets destroyed
}

The saving thread never sets the flag to true, only to false.
Using SDL_atomic
john skaller
Guest

On 01/03/2015, at 9:16 PM, mbabuskov wrote:
Quote:

1. it's possible that changed value is not propagated to the main thread quickly (or ever). As far as I researched this on the Internet, it seems that this should never happen on common modern platforms (Intel CPUs where this game will run), but it could in theory. AFAICT, using SDL_SetAtomic would fix this.

How?

Atomic write and read ensures that the individual writes of
the write will not interleave with individual reads of the read.

Nothing more. It makes no assurances about the ordering
of any OTHER reads and writes by any threads.

You need a fence for that. So consider:

var saving = false;

proc game() {
while run do
if needs_saving do
saving = true;
spawn_pthread saving_thread;
done
if not saving do dangerous_stuff; done
safe_stuff;
done
}

proc saving_thread () {
save_data;
saving = false;
}

Assuming the assignments are atomic, is this code safe?
[BTW: this is real Felix code]

The answer, unfortunately is NO. The reason is, some writes
*scheduled* in dangerous_stuff might not have occurred when
saving thread is launched. In fact they might propagate in pieces
during the save operation.

To force the writes to occur, you need a fence before saving is set
to true. This ensures the writes are completed before saving commences.
Of course they will appear completed to the main thread without a fence.

Generally synchronisation primitives include a fence, or the primitives
would be useless. However if you check Posix pthread_mutex_lock
specification, for example, there's no mention of it. Nor any of the SDL_Atomic
functions.

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Re: Using SDL_atomic
mbabuskov


Joined: 08 Feb 2015
Posts: 29
john skaller wrote:
How?
javascript:bbstyle(-1)

Whoops, you are right. Actually, I missed one line from my code. Here's what it really looks like:

MAIN THREAD:

while (!gameOver)
{
...
int isSaving = SDL_AtomicGet(&saveFlag);
if (needsSaving && !isSaving)
{
needsSaving = false;
isSaving = 1; // the line missing from the last message
SDL_AtomicSet(&saveFlag, 1);
start detached save thread
}

if (!isSaving)
updateDangerousStuff();

updateOtherStuff();
}

SAVE THREAD:

void save()
{
...save stuff to file

SDL_AtomicSet(&saveFlag, 0);

...thread gets destroyed
}
Using SDL_atomic
Jared Maddox
Guest

Quote:
Date: Mon, 2 Mar 2015 01:00:33 +1100
From: john skaller
To:
Subject: Re: [SDL] Using SDL_atomic
Message-ID:

Content-Type: text/plain; charset=us-ascii


<snip>

Quote:
Atomic write and read ensures that the individual writes of
the write will not interleave with individual reads of the read.

Nothing more. It makes no assurances about the ordering
of any OTHER reads and writes by any threads.

You need a fence for that. So consider:


<snip>

Quote:
Assuming the assignments are atomic, is this code safe?
[BTW: this is real Felix code]

The answer, unfortunately is NO. The reason is, some writes
*scheduled* in dangerous_stuff might not have occurred when
saving thread is launched. In fact they might propagate in pieces
during the save operation.

To force the writes to occur, you need a fence before saving is set
to true. This ensures the writes are completed before saving commences.
Of course they will appear completed to the main thread without a fence.

Generally synchronisation primitives include a fence, or the primitives
would be useless. However if you check Posix pthread_mutex_lock
specification, for example, there's no mention of it. Nor any of the SDL_Atomic
functions.


John, look at this documentation page: https://wiki.libsdl.org/CategoryAtomic

Right above the Atomic Locks header it has this line: "All of the
atomic operations that modify memory are full memory barriers."

This probably isn't the highest-performance way to do things (there
might be something that does fences according to cache block: you'd
get better performance while inside a protected region if you used a
fence as rarely as possible while inside that block), but it's the
best that can be done both outside of the compiler, AND without the
programmer digging into the hairy details of their target processor.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
john skaller
Guest

On 02/03/2015, at 6:57 AM, Jared Maddox wrote:

Quote:
Quote:

Generally synchronisation primitives include a fence, or the primitives
would be useless. However if you check Posix pthread_mutex_lock
specification, for example, there's no mention of it. Nor any of the SDL_Atomic
functions.


John, look at this documentation page: https://wiki.libsdl.org/CategoryAtomic

Right above the Atomic Locks header it has this line: "All of the
atomic operations that modify memory are full memory barriers."

Ah, thanks. This comment should be repeated in the documentation
for each individual function.

i will fix it!


--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
john skaller
Guest

On 02/03/2015, at 10:46 AM, john skaller wrote:

Quote:

On 02/03/2015, at 6:57 AM, Jared Maddox wrote:

Quote:
Quote:

Generally synchronisation primitives include a fence, or the primitives
would be useless. However if you check Posix pthread_mutex_lock
specification, for example, there's no mention of it. Nor any of the SDL_Atomic
functions.


John, look at this documentation page: https://wiki.libsdl.org/CategoryAtomic

Right above the Atomic Locks header it has this line: "All of the
atomic operations that modify memory are full memory barriers."

Ah, thanks. This comment should be repeated in the documentation
for each individual function.

i will fix it!


[I have started doing this but my bandwidth is so bad I can't load web pages
in a reasonable time at the moment .. will have to finish it later when
bandwidth improves. Sorry!]

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Eirik Byrkjeflot Anonsen
Guest

john skaller writes:

Quote:
On 01/03/2015, at 9:16 PM, mbabuskov wrote:
Quote:

1. it's possible that changed value is not propagated to the main thread quickly (or ever). As far as I researched this on the Internet, it seems that this should never happen on common modern platforms (Intel CPUs where this game will run), but it could in theory. AFAICT, using SDL_SetAtomic would fix this.

How?

Atomic write and read ensures that the individual writes of
the write will not interleave with individual reads of the read.

Nothing more. It makes no assurances about the ordering
of any OTHER reads and writes by any threads.
[...]
Quote:
To force the writes to occur, you need a fence before saving is set
to true. This ensures the writes are completed before saving commences.
Of course they will appear completed to the main thread without a fence.

Generally synchronisation primitives include a fence, or the primitives
would be useless. However if you check Posix pthread_mutex_lock
specification, for example, there's no mention of it. Nor any of the SDL_Atomic
functions.

To quote the documentation: "All of the atomic operations that modify
memory are full memory barriers."

Though, now that I look at it, I realize that SDL_AtomicGet() does not,
in fact, "modify" memory. So there might be a problem there.

And for pthread_mutex_lock():
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11

"functions that synchronize thread execution and also synchronize memory
with respect to other threads" ... includes the pthread_mutex functions.

Though of course, we have the little issue that "threads cannot be
implemented as a library":
http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf

In practice, this doesn't matter so much, since all current compilers
will do the right thing. For obvious reasons Smile

eirik
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Sik


Joined: 26 Nov 2011
Posts: 905
2015-03-02 14:14 GMT-03:00, Eirik Byrkjeflot Anonsen:
Quote:
Though, now that I look at it, I realize that SDL_AtomicGet() does not,
in fact, "modify" memory. So there might be a problem there.

But SDL_AtomicSet still would, so the worst that would happen is that
SDL_AtomicGet would just see the update later than expected if I
understand correctly (but will eventually see it, since it's actively
asking for the data in RAM). When SDL_AtomicGet sees the change it's
safe to assume that whatever happened before SDL_AtomicSet made its
way into RAM already.

The problem would be if you keep making changes after SDL_AtomicSet,
but in that case you should ask yourself how do you expect the code to
work in the first place Razz
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Eirik Byrkjeflot Anonsen
Guest

Sik the hedgehog writes:

Quote:
2015-03-02 14:14 GMT-03:00, Eirik Byrkjeflot Anonsen:
Quote:
Though, now that I look at it, I realize that SDL_AtomicGet() does not,
in fact, "modify" memory. So there might be a problem there.

But SDL_AtomicSet still would, so the worst that would happen is that
SDL_AtomicGet would just see the update later than expected if I
understand correctly (but will eventually see it, since it's actively
asking for the data in RAM). When SDL_AtomicGet sees the change it's
safe to assume that whatever happened before SDL_AtomicSet made its
way into RAM already.

You would think so, however:

void protected_dangerous() {
if (SDL_AtomicGet(atomic) == 0) {
dangerous(shared_data);
}
}

Looks safe, right? However, if SDL_AtomicGet() is not a proper memory
barrier, the compiler is allowed to rewrite this as:

void protected_dangerous() {
suitable_type tmp_shared_data = shared_data;
if (SDL_AtomicGet(atomic) == 0) {
dangerous(tmp_shared_data);
}
}

Note that in this case 'shared_data' is read before 'atomic' is tested.
Thus it might end up sending a stale value to dangerous(). When the SDL
documentation says "Seriously, here be dragons!", it really means it Smile

eirik
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
john skaller
Guest

On 03/03/2015, at 7:13 AM, Eirik Byrkjeflot Anonsen wrote:

Quote:

You would think so, however:

void protected_dangerous() {
if (SDL_AtomicGet(atomic) == 0) {
dangerous(shared_data);
}
}

Looks safe, right? However, if SDL_AtomicGet() is not a proper memory
barrier, the compiler is allowed to rewrite this as:

void protected_dangerous() {
suitable_type tmp_shared_data = shared_data;
if (SDL_AtomicGet(atomic) == 0) {
dangerous(tmp_shared_data);
}
}

Note that in this case 'shared_data' is read before 'atomic' is tested.
Thus it might end up sending a stale value to dangerous(). When the SDL
documentation says "Seriously, here be dragons!", it really means it Smile

SDL_AtomicGet takes a pointer to an
struct SDL_Atomic_t which contains an integer field named "value".
Here is the current definition:

typedef struct { int value; } SDL_atomic_t;

I think this should read:

typedef struct {int volatile value; } SDL_atomic_t;

That's line 189

https://hg.libsdl.org/SDL/file/bc00287b414f/include/SDL_atomic.h

Then the "volatile" should, normally, prevent any compiler optimisations.

[It's not assured by the C or C++ Standards but a compiler breaking normally
expected volatile semantics would be a poor implementation .. not that that's
saying much, given how recent gcc's break a lot of previously working code
in the name of strict adherence to a Standard semantics for a language
which is sloppy and weak .. bad idea GNU!]

I have to say, however, that these notes in the same file would be much
simpler to use with explicit fences instead of crudding about with
undocumented and uncertain features:

/**
* Memory barriers are designed to prevent reads and writes from being
* reordered by the compiler and being seen out of order on multi-core CPUs.
*
* A typical pattern would be for thread A to write some data and a flag,
* and for thread B to read the flag and get the data. In this case you
* would insert a release barrier between writing the data and the flag,
* guaranteeing that the data write completes no later than the flag is
* written, and you would insert an acquire barrier between reading the
* flag and reading the data, to ensure that all the reads associated
* with the flag have completed.
*
* In this pattern you should always see a release barrier paired with
* an acquire barrier and you should gate the data reads/writes with a
* single flag variable.
*
* For more information on these semantics, take a look at the blog post:
* http://preshing.com/20120913/acquire-and-release-semantics
*/

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Sik


Joined: 26 Nov 2011
Posts: 905
2015-03-02 17:13 GMT-03:00, Eirik Byrkjeflot Anonsen:
Quote:
Note that in this case 'shared_data' is read before 'atomic' is tested.
Thus it might end up sending a stale value to dangerous(). When the SDL
documentation says "Seriously, here be dragons!", it really means it Smile

Wouldn't this be an issue with mutexes as well, actually? I mean, if
the compiler can reorder around SDL_AtomicGet like that, it surely can
reorder around the function that calls the mutex lock as well for the
same reason. It's not like the compiler knows that the shared data is
not safe to cache Razz

2015-03-02 18:16 GMT-03:00, john skaller:
Quote:
Then the "volatile" should, normally, prevent any compiler optimisations.

Except it won't. The only thing it does is guarantee that it'll write
to memory and that consecutive volatile accesses will be done in said
order (and now not even that thanks to processor-level reordering, it
needs to be cache-through as well for that to work). There's a very
good reason why volatile doesn't work at all for multithreading.

The only purpose of volatile is to access hardware ports. Anything
else won't work as expected.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Bob


Joined: 19 Sep 2009
Posts: 185
Just a couple things, since I pushed the atomics through acceptance and wrote the first several versions before they were completely rewritten... :-)


Just because you start another thread is no reason to believe that the thread is running. in fact, since the total number of threads running on a system is pretty much always larger than the number of cores you can be sure that sometimes one thread will be running and one thread will not be running. You have to use locks to make sure that a thread can actually block and force another thread to run. The code as presented may never let the save thread run at all because until it runs the flag will not be set and unless you force the other thread to block once in a while it may never stop running and let the flag be set.


Probably the best test of when to use an atomic versus a mutex is how long the flag will stay in the locked state. If you are going to keep the flag set for more than a few hundred cycles then use a mutex. Yes, I said CYCLES. The time it takes to run at most a few hundred instructions in a single core.


Oh, yeah, you should never use simple assignment to communicate between threads. Like it was pointed out above, between the machine scheduling instructions out of order and the compiler moving code all over the place you can never be sure when, or if, an assignment will actually take place. In fact if you set a value like flag = true; and then later say flag = false, and do not check the value in between the two statements, the compiler may just decide to eliminate flag = true because it has no effect. If the flag is initialized to false then both statements can be removed completely, unless you tell the compiler not to do that using the volatile type modifier. A good dead code eliminator pass in the compiler can do amazing things if you let it and are not aware that it exists.




Bob Pendleton


On Mon, Mar 2, 2015 at 4:15 PM, Sik the hedgehog wrote:
Quote:
2015-03-02 17:13 GMT-03:00, Eirik Byrkjeflot Anonsen:
Quote:
Note that in this case 'shared_data' is read before 'atomic' is tested.
Thus it might end up sending a stale value to dangerous(). When the SDL
documentation says "Seriously, here be dragons!", it really means it Smile

Wouldn't this be an issue with mutexes as well, actually? I mean, if
the compiler can reorder around SDL_AtomicGet like that, it surely can
reorder around the function that calls the mutex lock as well for the
same reason. It's not like the compiler knows that the shared data is
not safe to cache Razz

2015-03-02 18:16 GMT-03:00, john skaller:
Quote:
Then the "volatile" should, normally, prevent any compiler optimisations.

Except it won't. The only thing it does is guarantee that it'll write
to memory and that consecutive volatile accesses will be done in said
order (and now not even that thanks to processor-level reordering, it
needs to be cache-through as well for that to work). There's a very
good reason why volatile doesn't work at all for multithreading.

The only purpose of volatile is to access hardware ports. Anything
else won't work as expected.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org





--
+-----------------------------------------------------------
+ Bob Pendleton: writer and programmer
+ email:
+ blog: www.TheGrumpyProgrammer.com
Using SDL_atomic
john skaller
Guest

On 03/03/2015, at 9:15 AM, Sik the hedgehog wrote:

Quote:
2015-03-02 17:13 GMT-03:00, Eirik Byrkjeflot Anonsen:
Quote:
Note that in this case 'shared_data' is read before 'atomic' is tested.
Thus it might end up sending a stale value to dangerous(). When the SDL
documentation says "Seriously, here be dragons!", it really means it Smile

Wouldn't this be an issue with mutexes as well, actually? I mean, if
the compiler can reorder around SDL_AtomicGet like that, it surely can
reorder around the function that calls the mutex lock as well for the
same reason. It's not like the compiler knows that the shared data is
not safe to cache Razz

It's not really compiler reordering that's the problem, rather
its the CPU and cache management.

In theory, mutexes just can't work. i mean, they are specified
to ensure mutual exclusion, and they will do that, but in theory
that is of no value whatsoever, since it doesn't lead to any ability
to share.

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
john skaller
Guest

On 03/03/2015, at 10:01 AM, Bob Pendleton wrote:

Quote:
Just a couple things, since I pushed the atomics through acceptance and wrote the first several versions before they were completely rewritten... :-)

Thanks!

Quote:
Just because you start another thread is no reason to believe that the thread is running.

Quote:
Probably the best test of when to use an atomic versus a mutex is how long the flag will stay in the locked state.

Which is of course no use for the problem you mentioned above :-)

If you want to force a thread to wait for another thread you have to use
a condition variable or semaphore .. that won't force the other thread
to run but it will force the current one to wait UNTIL it runs (up
to a particular point).

In fact for SDL the correct control structure to provide is probably a
thing called a monitor. [Monitors are provided in Felix represented
as pchannels]

The only other way to "force" threads to run is to use a RTOS (Real time
operating system).

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Sik


Joined: 26 Nov 2011
Posts: 905
2015-03-02 22:33 GMT-03:00, john skaller:
Quote:
The only other way to "force" threads to run is to use a RTOS (Real time
operating system).

Or to force a yield, which tells the OS that it's a good place to make
the thread start waiting (i.e. tell the schedule that it's OK to
switch threads ahead of time). I know SDL_Sleep(0) on Windows manages
to do this, but for some reason I can't get this to work on Linux (my
build is configured to use the wrong API, maybe?).

I know that using SDL_Sleep may be seen as a problem by some people
due to the unpredictability of sleeping (it waits at least the amount
of specified but can wait more, even *seconds* if it wishes), but I
was messing with it and in practice it doesn't really cause problems,
at least on modern Windows.
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
john skaller
Guest

On 03/03/2015, at 6:07 PM, Sik the hedgehog wrote:

Quote:
2015-03-02 22:33 GMT-03:00, john skaller:
Quote:
The only other way to "force" threads to run is to use a RTOS (Real time
operating system).

Or to force a yield, which tells the OS that it's a good place to make
the thread start waiting

That's useful but it still doesn't force "the other" thread to run.



--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
john skaller
Guest

On 03/03/2015, at 6:07 PM, Sik the hedgehog wrote:

Quote:
I know SDL_Sleep(0) on Windows manages
to do this, but for some reason I can't get this to work on Linux (my
build is configured to use the wrong API, maybe?).

Try setting it to 1 instead of 0 (you mean SDL_Delay I assume)
[The argument should be floating point but that's another issue .. :]

If you really want another thread to run, you need to give it some time.

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Sik


Joined: 26 Nov 2011
Posts: 905
2015-03-03 9:33 GMT-03:00, john skaller:
Quote:
That's useful but it still doesn't force "the other" thread to run.

Well, a RTOS wouldn't help here either, you'd need a single tasking
system and have full control over each core Razz (you simply have no way
to tell the scheduler to move onto the other thread, it can decide to
move to a different thread, or even to a different process)

To put it bluntly, you should always assume the other thread may
respond up much later in the future if performance is a serious issue.
Incidentally this is also why the concept of critical sections exists,
they tell the scheduler that it's the worst moment to switch away
since other threads are waiting for a resource to be unlocked.

2015-03-03 9:41 GMT-03:00, john skaller:
Quote:
Try setting it to 1 instead of 0 (you mean SDL_Delay I assume)

Er yeah.

But if I recall correctly 0 on Windows does a yield anyway (it moves
onto other threads and returns as soon as the scheduler says so) and
SDL doesn't seem to be filtering out the value. On Linux it's a whole
different issue since first of all it depends on the underlying API
(i.e. one of the two APIs it supports just does a busy loop, so even a
huge delay will result in CPU hogging by the thread if SDL is using
that)
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Eirik Byrkjeflot Anonsen
Guest

john skaller writes:

Quote:
On 03/03/2015, at 9:15 AM, Sik the hedgehog wrote:

Quote:
2015-03-02 17:13 GMT-03:00, Eirik Byrkjeflot Anonsen:
Quote:
Note that in this case 'shared_data' is read before 'atomic' is tested.
Thus it might end up sending a stale value to dangerous(). When the SDL
documentation says "Seriously, here be dragons!", it really means it Smile

Wouldn't this be an issue with mutexes as well, actually? I mean, if
the compiler can reorder around SDL_AtomicGet like that, it surely can
reorder around the function that calls the mutex lock as well for the
same reason. It's not like the compiler knows that the shared data is
not safe to cache Razz

Correct, however, see below...

Quote:
It's not really compiler reordering that's the problem, rather
its the CPU and cache management.

Similar problems, but compiler optimizations are more likely to cause
problems because they can do so much more. CPU and cache management will
not eliminate "unnecessary" code or execute code speculatively. A good
optimizing compiler can and will do both. And more.

Quote:
In theory, mutexes just can't work. i mean, they are specified
to ensure mutual exclusion, and they will do that, but in theory
that is of no value whatsoever, since it doesn't lead to any ability
to share.

A correct implementation of posix mutexes does work, since they are
specified to do all the right things (mutual exclusion and full memory
barriers). However, as I referred to earlier in this thread: "Threads
Cannot be Implemented as a Library"
(http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf).

In practice, that just means that compilers have to recognize the
threading code and disable any optimizations around that code that would
break it. And I expect that's exactly what they do. But it does make it
more likely for bugs to show up in this area, I'm sure.

This is unlike the situation with "volatile" which have been documented
for a long time as not suitable for multi-threading. So I would not
trust any serious optimizing compiler to avoid dangerous optimizations
around those.

Of course, the situation is much better with C++11, which does provide
the necessary primitives (and memory model) to write multi-threaded
code.

eirik
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
john skaller
Guest

On 03/03/2015, at 11:57 PM, Sik the hedgehog wrote:

Quote:
2015-03-03 9:33 GMT-03:00, john skaller:
Quote:
That's useful but it still doesn't force "the other" thread to run.

Well, a RTOS wouldn't help here either,

Sure it would. I've written one. RTOS can make hard guarantees
that threads run, and run in a particular time as well.

Next time you fly over a nuclear power plant .. you'd better hope
both the plant and plane are controlled with critical components
running on a RTOS.

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
john skaller
Guest

On 04/03/2015, at 2:44 AM, Eirik Byrkjeflot Anonsen wrote:

Quote:

Similar problems, but compiler optimizations are more likely to cause
problems because they can do so much more. CPU and cache management will
not eliminate "unnecessary" code or execute code speculatively.

Oh, but they do (execute code speculatively). In fact all modern
intel CPU's do this.

It's unlikely a compiler will do it because compilers can really
only schedule a single thread of control. All they do is try
to help the CPU do it.

"Modern" compilers are still quite stupid, at least in part
because they're compiling a language which appears to have
been designed to defeat optimisation (namely, C).

[People doing high performance numerical work still use Fortran ..]

Quote:
A correct implementation of posix mutexes does work, since they are
specified to do all the right things (mutual exclusion and full memory
barriers).

I actually checked the specs and saw no mention of a memory barrier,
do you happen to have a link?

Quote:
In practice, that just means that compilers have to recognize the
threading code and disable any optimizations around that code that would
break it. And I expect that's exactly what they do.

Certainly not for languages like C. They don't "recognise" anything.
C compilers are extremely dumb. They can barely optimise
basic primitives like memset .. and when they do they break
all sorts of security code (clearing passwords out of memory ...)

Quote:
Of course, the situation is much better with C++11, which does provide
the necessary primitives (and memory model) to write multi-threaded
code.

The situation is better with C++ because it provides much higher
level constructs and a stronger type system, as well as specifically
supporting threads. In addition the design is deliberate and modern
(although it still has to work in a framework which is poorly structured).

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Bob


Joined: 19 Sep 2009
Posts: 185
Look, sharing works, threads work, because the hardware is designed to make them work. Yes, you have to use operations that are recognized by the hardware as memory barriers and in some cases the compiler also has to recognize them so it does the right thing with instruction scheduling. But, the machines do it right and the compilers do it right and if you do all the right things it works and it works very well. I first did multi-threaded code on a Univac 1108 in the early 70s (in fortran and cobol) and the basic rules have not changed. (I've also implement threads on the 8080 and later machines :-)


And, yes, you can implement a thread package as a library, but it needs to have support from hardware and the OS.


But, learning to write multithreaded code is not easy. The natural assumptions built into the human mind about how multiple threads "should" act is completely different from the reality of how they DO act.



BTW, the main reason I wanted atomics in SDL was to implement atomic reference counting. Reference counting has its problems, but it has very nice properties for use in interactive programs. But, threads and reference counts do not mix well unless you have atomic increment and decrement.


Oh, yeah, I should not have said anything about "forcing" another thread to run. You can't do that. You can only stop your thread from running until the other thread has run.



Bob Pendleton







On Tue, Mar 3, 2015 at 3:23 PM, john skaller wrote:
Quote:

On 04/03/2015, at 2:44 AM, Eirik Byrkjeflot Anonsen wrote:

Quote:

Similar problems, but compiler optimizations are more likely to cause
problems because they can do so much more. CPU and cache management will
not eliminate "unnecessary" code or execute code speculatively.

Oh, but they do (execute code speculatively). In fact all modern
intel CPU's do this.

It's unlikely a compiler will do it because compilers can really
only schedule a single thread of control. All they do is try
to help the CPU do it.

"Modern" compilers are still quite stupid, at least in part
because they're compiling a language which appears to have
been designed to defeat optimisation (namely, C).

[People doing high performance numerical work still use Fortran ..]

Quote:
A correct implementation of posix mutexes does work, since they are
specified to do all the right things (mutual exclusion and full memory
barriers).

I actually checked the specs and saw no mention of a memory barrier,
do you happen to have a link?

Quote:
In practice, that just means that compilers have to recognize the
threading code and disable any optimizations around that code that would
break it. And I expect that's exactly what they do.

Certainly not for languages like C. They don't "recognise" anything.
C compilers are extremely dumb. They can barely optimise
basic primitives like memset .. and when they do they break
all sorts of security code (clearing passwords out of memory ...)

Quote:
Of course, the situation is much better with C++11, which does provide
the necessary primitives (and memory model) to write multi-threaded
code.

The situation is better with C++ because it provides much higher
level constructs and a stronger type system, as well as specifically
supporting threads. In addition the design is deliberate and modern
(although it still has to work in a framework which is poorly structured).

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org





--
+-----------------------------------------------------------
+ Bob Pendleton: writer and programmer
+ email:
+ blog: www.TheGrumpyProgrammer.com
Using SDL_atomic
Eirik Byrkjeflot Anonsen
Guest

john skaller writes:

Quote:
On 04/03/2015, at 2:44 AM, Eirik Byrkjeflot Anonsen wrote:

Quote:

Similar problems, but compiler optimizations are more likely to cause
problems because they can do so much more. CPU and cache management will
not eliminate "unnecessary" code or execute code speculatively.

Oh, but they do (execute code speculatively). In fact all modern
intel CPU's do this.

Yes, unfortunate choice of words there Smile

The guarantees provided by intel CPUs when they reorder your code is far
stronger than the guarantees of the C/C++ standards, though.

Quote:
It's unlikely a compiler will do it because compilers can really
only schedule a single thread of control. All they do is try
to help the CPU do it.

It is not only likely, it is absolutely guaranteed that modern compilers
will both eliminate unnecessary code and execute code speculatively.
That is, a compiler will calculate values just in case they may be
needed. And that means they will reorder the code in such a way that
code that should not be reachable will still be executed. So code that
is written like:

if (a == 0)
call_a_function(b);

can well be rewritten as:

b_type tmp = b;
if (a == 0)
call_a_function(tmp);

if the compiler's analysis shows that this is likely to typically be
faster. (And it doesn't break the guarantees of the language, of
course.)

Quote:
"Modern" compilers are still quite stupid, at least in part
because they're compiling a language which appears to have
been designed to defeat optimisation (namely, C).

C does have features that make certain classes of optimization harder
(pointers, in particular). Some of those problems can be mitigated (e.g.
by using "restrict" in the places where the compiler needs that
guarantee.)

Quote:
[People doing high performance numerical work still use Fortran ..]

True, though maybe as much from tradition as from actual advantages Smile
But yes, classic fortran has some restrictions that are useful for these
cases.

Quote:
Quote:
A correct implementation of posix mutexes does work, since they are
specified to do all the right things (mutual exclusion and full memory
barriers).

I actually checked the specs and saw no mention of a memory barrier,
do you happen to have a link?

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11

"synchronize thread execution and also synchronize memory with respect
to other threads."

Quote:
Quote:
In practice, that just means that compilers have to recognize the
threading code and disable any optimizations around that code that would
break it. And I expect that's exactly what they do.

Certainly not for languages like C. They don't "recognise" anything.
C compilers are extremely dumb. They can barely optimise
basic primitives like memset .. and when they do they break
all sorts of security code (clearing passwords out of memory ...)

Yes, even not-very-modern C compilers recognize common patterns and
optimize them specifically. Because that is really effective.

For important constructs that are known to be broken unless the compiler
helps out, you can be sure the compiler authors will detect those cases
and protect them where necessary.

memset is also a library function and not a basic primitive. And as you
say, compilers do recognize them and optimize them. Or how about this
little surprise: What do you think this code compiles to (using current
gcc):

printf("Hello world\n");

Turns out the final binary doesn't call printf at all. It is instead
turned into:

puts("Hello world");

I discovered this when my breakpoint on printf never triggered Smile

Though, on second thoughts, I expect what actually happens with the
threading functions is that compiler-specific barriers have been added
to the code to ensure the tested compilers are unable to optimize the
code to breaking.


Quote:
Quote:
Of course, the situation is much better with C++11, which does provide
the necessary primitives (and memory model) to write multi-threaded
code.

The situation is better with C++ because it provides much higher
level constructs and a stronger type system, as well as specifically
supporting threads. In addition the design is deliberate and modern
(although it still has to work in a framework which is poorly structured).

C++11 in particular. Older versions do not have language support for
threads, and so need to deal with the problems of libraries supporting
threads.

eirik
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Eirik Byrkjeflot Anonsen
Guest

Bob Pendleton writes:

Quote:
Look, sharing works, threads work, because the hardware is designed to make
them work. Yes, you have to use operations that are recognized by the
hardware as memory barriers and in some cases the compiler also has to
recognize them so it does the right thing with instruction scheduling.

Yes, and this is the problem. The C89 and C++03 specifications have very
limited provisions for memory barriers. This is intentional, because it
allows some seriously effective optimizations. Thus there is no way for
a library to implement proper threading primitives while only referring
to the language specifications. That's essentially what Hans Boehm says
in the article.

Of course, if you are making a library for a particular version of a
particular compiler, you can usually figure out ways to force it not to
break your code Smile

Quote:
But,
the machines do it right and the compilers do it right and if you do all
the right things it works and it works very well.

Also true. In practice, compiler vendors will work with threading
library vendors to ensure that those threading libraries will fulfill
their promises. Because anything else would be truly stupid.

However, if you write your own threading primitives, you run into the
problem that you need those compiler-level memory barriers. Some
compilers actually provide that, but pure C89 and C++03 do not.

eirik
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Bob


Joined: 19 Sep 2009
Posts: 185
Ah, yes, the standards do not provide the mechanism. That is very true. The standards can not provide the mechanism because the mechanism is always machine and sometimes OS specific. The standards define the basic language that you are supposed to be able to count on from system to system and machine to machine. The standards can not define things that must be done differently for each cpu architecture or OS. The standard does not even specify how many bits are in an int, it only specifies the minimum number of bits in an int. I've worked on machines with 18 and 36 bit ints. I saw C on a lisp machine with arbitrary length (as large as will fit in virtual memory) ints. Thousand plus digit ints are cool. Just like SDL must live with the lowest common denominator so must standards live with defining what can be defined.


But, C compilers provide extensions that make it possible to implement all sorts of things, such as thread libraries, even though the standards do not and can not. contain those features.


Glad we got that straightened out. There is a huge difference between the language specified in the standard and the language that actually gets implemented.


An aside, I seriously dislike most every threading package I have ever encountered because of just one thing. They do not implement threads that work as people expect them to work. They implement them the ways the OS scheduler works. Not at all the same thing. This confuses people fiercely.


Bob Pendleton


On Wed, Mar 4, 2015 at 9:36 AM, Eirik Byrkjeflot Anonsen wrote:
Quote:
Bob Pendleton writes:

Quote:
Look, sharing works, threads work, because the hardware is designed to make
them work. Yes, you have to use operations that are recognized by the
hardware as memory barriers and in some cases the compiler also has to
recognize them so it does the right thing with instruction scheduling.

Yes, and this is the problem. The C89 and C++03 specifications have very
limited provisions for memory barriers. This is intentional, because it
allows some seriously effective optimizations. Thus there is no way for
a library to implement proper threading primitives while only referring
to the language specifications. That's essentially what Hans Boehm says
in the article.

Of course, if you are making a library for a particular version of a
particular compiler, you can usually figure out ways to force it not to
break your code Smile

Quote:
But,
the machines do it right and the compilers do it right and if you do all
the right things it works and it works very well.

Also true. In practice, compiler vendors will work with threading
library vendors to ensure that those threading libraries will fulfill
their promises. Because anything else would be truly stupid.

However, if you write your own threading primitives, you run into the
problem that you need those compiler-level memory barriers. Some
compilers actually provide that, but pure C89 and C++03 do not.

eirik
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org





--
+-----------------------------------------------------------
+ Bob Pendleton: writer and programmer
+ email:
+ blog: www.TheGrumpyProgrammer.com
Using SDL_atomic
john skaller
Guest

On 05/03/2015, at 2:36 AM, Eirik Byrkjeflot Anonsen wrote:
Quote:

Also true. In practice, compiler vendors will work with threading
library vendors to ensure that those threading libraries will fulfill
their promises. Because anything else would be truly stupid.

The truly stupid is typically exceedingly common.

--
john skaller

http://felix-lang.org



_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Eirik Byrkjeflot Anonsen
Guest

Bob Pendleton writes:

Quote:
Ah, yes, the *standards* do not provide the mechanism. That is very true.
The standards *can not* provide the mechanism because the mechanism is
always machine and sometimes OS specific. The standards define the basic
language that you are supposed to be able to count on from system to system
and machine to machine. The standards can not define things that must be
done differently for each cpu architecture or OS.

True when these differences are visible to the code. However, for memory
coherence, all the code cares about is that it gets guarantees about
some specific level of memory coherence at some specific points of
source-level execution. So the compiler can hide away the details of
exactly how that is accomplished.

And in fact, some standards do provide such mechanisms. C++11 being the
most relevant example I know of (I don't know whether C99 or C11 does).
I think the reason C89 and C++03 did not provide such mechanisms were
that they weren't considered important at the time. And they thought
that it could be done as libraries Smile

Quote:
The standard does not
even specify how many bits are in an int, it only specifies the minimum
number of bits in an int.

C++03 actually specifies a char to be 1 byte. Not that it helps any, as
it goes on to say that it is unspecified how many bits are in a byte Smile

[...]
Quote:
Glad we got that straightened out. There is a huge difference between the
language specified in the standard and the language that actually gets
implemented.

Important distinction Smile

And in the end, the main point I take away from Hans Boehm's article is
that C compilers will do extremely weird things to your code. Which will
work as expected in single-threaded code because any temporary weird
state will be cleaned up before it is observed. But in multi-threaded
code, you really need explicit compiler-level memory barriers around all
access to shared-memory data.

Quote:
An aside, I seriously dislike most every threading package I have ever
encountered because of just one thing. They do not implement threads that
work as people expect them to work. They implement them the ways the OS
scheduler works. Not at all the same thing. This confuses people fiercely.

I'm interested. In which ways do you think these thread libraries work
contrary to people's expectations?

eirik
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Using SDL_atomic
Bob


Joined: 19 Sep 2009
Posts: 185
Quote:
An aside, I seriously dislike most every threading package I have ever
encountered because of just one thing. They do not implement threads that
work as people expect them to work. They implement them the ways the OS
scheduler works. Not at all the same thing. This confuses people fiercely.

I'm interested. In which ways do you think these thread libraries work
contrary to people's expectations?


Ok, that is really the subject for a long blog post.


I'll try to be fairly quick here; People expect threads to run in parallel. They expect that if they have 20 threads all 20 threads will be running at once. People think of threads as being like workers in a factory, each doing their jobs in parallel with all the other workers in the factory. In reality you have N cores. That means you have at most N workers running around doing the jobs of all the workers.  These workers do not just automatically stop one job and switch to another job. They only switch when they can not keep doing the job they are doing.


The key thing is that no matter how many threads you have only a few of them will be running at one time. But, people expect all threads to be running all the time. Even people who know better tend to expect that if they have N cores they should have N active threads in their code. When in fact they may well have zero cores active or N - (any number <= N).


When you get to situations with multiple machines connected together it gets even harder for people to understand. I once spent hours... really days, trying to explain to an EE why when our systems were connected by a high speed parallel bus the complete system ran slower than when connected by a low speed serial line. The difference was that the bus was polled and the serial line was interrupt driven. He never did understand why fast was slow and slow was fast... but he finally gave me an interrupt on the parallel bus. The wrong Interrupt, but an interrupt which let it run almost as fast as the serial line. He never did understand that the interrupt let me queue data so that both machines ran nearly full speed all the time while polling forced the machines to run in lockstep.


Oh well, documentation and education does not make enough of a distinction between software threads, the things that thread packages deal with and hardware threads, the real things that do the work. The lack of a one to one correspondence between them is very surprising to people.


The most intuitive thread package I ever used was one I wrote under DOS on a 286 lo these many years ago. It switched threads when ever a thread blocked, but it also used a timer interrupt to force switching after about a thousand instructions had been run. That kind of fine grain scheduling "wastes" a lot of CPU time but it made it look like every thread was always running. I did have to lock out task switching around all I/o calls though...



I base my observations on my own learning curve, my experience trying to teach the subject, and on decades of helping people on mailing lists.


Bob Pendleton






On Thu, Mar 5, 2015 at 9:55 AM, Eirik Byrkjeflot Anonsen wrote:
Quote:
Bob Pendleton writes:

Quote:
Ah, yes, the *standards* do not provide the mechanism. That is very true.
The standards *can not* provide the mechanism because the mechanism is
always machine and sometimes OS specific. The standards define the basic
language that you are supposed to be able to count on from system to system
and machine to machine. The standards can not define things that must be
done differently for each cpu architecture or OS.

True when these differences are visible to the code. However, for memory
coherence, all the code cares about is that it gets guarantees about
some specific level of memory coherence at some specific points of
source-level execution. So the compiler can hide away the details of
exactly how that is accomplished.

And in fact, some standards do provide such mechanisms. C++11 being the
most relevant example I know of (I don't know whether C99 or C11 does).
I think the reason C89 and C++03 did not provide such mechanisms were
that they weren't considered important at the time. And they thought
that it could be done as libraries Smile

Quote:
The standard does not
even specify how many bits are in an int, it only specifies the minimum
number of bits in an int.

C++03 actually specifies a char to be 1 byte. Not that it helps any, as
it goes on to say that it is unspecified how many bits are in a byte Smile

[...]
Quote:
Glad we got that straightened out. There is a huge difference between the
language specified in the standard and the language that actually gets
implemented.

Important distinction Smile

And in the end, the main point I take away from Hans Boehm's article is
that C compilers will do extremely weird things to your code. Which will
work as expected in single-threaded code because any temporary weird
state will be cleaned up before it is observed. But in multi-threaded
code, you really need explicit compiler-level memory barriers around all
access to shared-memory data.

Quote:
An aside, I seriously dislike most every threading package I have ever
encountered because of just one thing. They do not implement threads that
work as people expect them to work. They implement them the ways the OS
scheduler works. Not at all the same thing. This confuses people fiercely.

I'm interested. In which ways do you think these thread libraries work
contrary to people's expectations?

eirik



--
+-----------------------------------------------------------
+ Bob Pendleton: writer and programmer
+ email:
+ blog: www.TheGrumpyProgrammer.com
Using SDL_atomic
Sik


Joined: 26 Nov 2011
Posts: 905
2015-03-05 12:55 GMT-03:00, Eirik Byrkjeflot Anonsen:
Quote:
C++03 actually specifies a char to be 1 byte. Not that it helps any, as
it goes on to say that it is unspecified how many bits are in a byte Smile

I believe C99 explicitly states char to be exactly 8 bits (the rest of
the sizes is still up to the implementation though, minimum size
aside).
_______________________________________________
SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org