A question about thread-architecture.

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A question about thread-architecture.

Post by bob »

diep wrote:
Kempelen wrote:Hello, a question about thread-architecture:

I programmed Rodin with main thread to handle WB protocol and a thread created at startup to engine-think. I have seen engines that only uses one thread and call a kind of "bioskey" function to ask if new user/gui data is available. Is second aproach more efficient that the first for a non-SMP engine? I ask this because I have read that windows usually need CPU time when there are more than one thread to handle them propertly. Maybe that time would be useful for the engine.....

Regards
Fermin
the original DOS derived Diep was just polling during search each n nodes.

What i do in Diep since 2002 is a special i/o thread that handles all user typing and it commands the search to do anything. Reason to this back then was the bit studentish assumption that some specific way to read keyboard would block this proces from search.

Even though it would be 1 out of 500 cpu's, that could stop all other 499 cpu's.

That was reason to build this special i/o thread.

In theory it is better, but it does have disadvantages.

a) you need to prove things correct
b) here is most important problem: that is that the i/o thread you don't want it to eat a full core. So basically you need to call Sleep there.

I have it sleep 10 milliseconds for example each time. Now that seems fine, except when you want to play super superbullet type games with your engine.

The sleep function will trigger regular and cause each action towards the engine to not work fast.

So for example i ran into trouble when i wanted Diep to produce 2 million evaluations with white and black reversed at a bunch of positions and then check for determinism. In Diep this is 0 ply search, as search initializes all sorts of datastructure for evaluation.

Of course that should be seconds CPU time, with just the i/o eating some big time, but it takes hours in reality, because the i/o thread sleeps, and there is a latency between it giving the engine command to start and stop the search. So instead of it taking microseconds a position it takes dozens of microseconds, just because of the communication between 2 threads.

I have yet to fix that - and i sure must fix this ASAP.

In short it is true that you offload things to a different core, but realize that there is no fast SOFTWARE mechanism to communicate between a thread that's basically always idling and a searching thread. From the searching viewpoint you want everything realtime, and the controlling thread basically has to idle.

The minimum time to wake up a thread is to schedule it in linux. Even in the realtime kernel that's pretty slow. It is in principle 10+ milliseconds simply.

So an active thread has an average responsetime of 70 microseconds in the realtime kernel. Much better than the normal kernel where it can run up to 0.5 milliseconds for an active thread.

Yet to reactivate a thread that is costly.

So realize what you're gonna do if you create a special thread for this; it means it becomes very tough to very quickly have your program produce a search, an evaluation, or anything like that.

BECAUSE the kernel basically fires at 100Hz.

You and i can't change this - databases depend upon it. So it's the same in all OS-es. A stupid limitation.

It's better in gpu's, where everything is realtime, but that's a different discussion :)

Actually in todays Diep i have default 3 threads when it runs single processor.

1 thread that's very dumb and just reads the input/keyboard/socket/TCP connection. It is blocking. 1 thread that's what i call the i/o thread, but better name would be commanding thread, it gets the keyboard input and parses it and it gives the searching processes/threads (there is no difference in diep between searching threads local and searching processes remote at shared memory machines) commands to start and stop search and it checks how long it has been thinking and based upon that commands the search to abort.

then there is the searchthread - never idles, it just reads a volatile variable to see whehter it needs to start search, then from other variables it sees whether it must initiate a search or just simply split somewhere.

becasue the command thread is idling normally, it needs to get woken up before it can do something. And we don't want it to eat much of a system time, so i give it a bunch of milliseconds to idle each time.

In your case you could get away just doing a blocking 'read the input of the user' type thread.

I really would advice you to make such a blocking thread reading stdin. As for the command thread, you might not do that yet until you're really SMP :)
I don't follow. In Cray Blitz, we just did a "read()" that blocked that thread until the user typed something. So what is the need for a sleep()??? I have it on my list to fix this in Crafty one day as I do not like the ugly code needed to poll for input. But it is a significant rewrite, since my "ponder code" would have to be completely re-done...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A question about thread-architecture.

Post by bob »

Sven Schüle wrote:
mar wrote:
diep wrote: b) here is most important problem: that is that the i/o thread you don't want it to eat a full core. So basically you need to call Sleep there.

I have it sleep 10 milliseconds for example each time. Now that seems fine, except when you want to play super superbullet type games with your engine.
Hi Vincent,

I wonder why you sleep in the I/O thread? I have a plain while ( fgets() ) loop and haven't noticed the I/O thread eat any CPU time. Perhaps you read the input in a different way?
I thought that fgets on stdin should always be blocking...

Martin
Actually I wondered about exactly the same thing. A blocking read does not eat CPU time. Sleep() is not appropriate here IMO.

Sven
I asked the same question above. Perhaps he does other things like a check the time loop so that the I/O thread controls time usage as well... Otherwise sleep() doesn't make any sense to me. Actually, sleep() doesn't make any sense at all under any circumstance. Maybe usleep() since I doubt one wants to sleep for a whole second except in real long games...
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: A question about thread-architecture.

Post by mcostalba »

bob wrote: I asked the same question above. Perhaps he does other things like a check the time loop so that the I/O thread controls time usage as well...
May very well be like this, in Stockfish (since Glaurung times) there was a single poll() function called after a fixed number of nodes that actually was performing two tasks: check for input and check for remaining search time.

When I moved to a separated I/O thread waiting for input on a getline() I needed to add a timer/alarm facility to be able to finally remove the old poll() code. What I did was to foreseen even another thread (called timer) normally sleeping. This thread fires at a given interval to check for available time and then comes back to sleep. Timer is more or less a wrapper around WaitForSingleObject() (on Windows) and pthread_cond_timedwait() (on Linux). While the I/O thread was not so difficult to setup, once written, this timer thing is instead tricky and it required some effort to make it work especially at super fast blitz times.

Anyhow I can now say that this new approach is much better than the old one from a software design point of view and I would never come back.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: A question about thread-architecture.

Post by diep »

mar wrote:
diep wrote: b) here is most important problem: that is that the i/o thread you don't want it to eat a full core. So basically you need to call Sleep there.

I have it sleep 10 milliseconds for example each time. Now that seems fine, except when you want to play super superbullet type games with your engine.
Hi Vincent,

I wonder why you sleep in the I/O thread? I have a plain while ( fgets() ) loop and haven't noticed the I/O thread eat any CPU time. Perhaps you read the input in a different way?
I thought that fgets on stdin should always be blocking...

Martin
It's doing also time division, giving command to start a search etc, just not searching itself.

On top of that a 3d thread is doing a blocked read of stdin or the incoming TCP/IP connection, whichever is appropriate.

You don't want to do any of that in a polling model of course.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: A question about thread-architecture.

Post by diep »

Sven Schüle wrote:
mar wrote:
diep wrote: b) here is most important problem: that is that the i/o thread you don't want it to eat a full core. So basically you need to call Sleep there.

I have it sleep 10 milliseconds for example each time. Now that seems fine, except when you want to play super superbullet type games with your engine.
Hi Vincent,

I wonder why you sleep in the I/O thread? I have a plain while ( fgets() ) loop and haven't noticed the I/O thread eat any CPU time. Perhaps you read the input in a different way?
I thought that fgets on stdin should always be blocking...

Martin
Actually I wondered about exactly the same thing. A blocking read does not eat CPU time. Sleep() is not appropriate here IMO.

Sven
You cannot do blocking read of course in the thread that commands the search. A 3d thread is used for that in Diep in its own DEP protocol.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: A question about thread-architecture.

Post by diep »

bob wrote:
diep wrote:
Kempelen wrote:Hello, a question about thread-architecture:

I programmed Rodin with main thread to handle WB protocol and a thread created at startup to engine-think. I have seen engines that only uses one thread and call a kind of "bioskey" function to ask if new user/gui data is available. Is second aproach more efficient that the first for a non-SMP engine? I ask this because I have read that windows usually need CPU time when there are more than one thread to handle them propertly. Maybe that time would be useful for the engine.....

Regards
Fermin
the original DOS derived Diep was just polling during search each n nodes.

What i do in Diep since 2002 is a special i/o thread that handles all user typing and it commands the search to do anything. Reason to this back then was the bit studentish assumption that some specific way to read keyboard would block this proces from search.

Even though it would be 1 out of 500 cpu's, that could stop all other 499 cpu's.

That was reason to build this special i/o thread.

In theory it is better, but it does have disadvantages.

a) you need to prove things correct
b) here is most important problem: that is that the i/o thread you don't want it to eat a full core. So basically you need to call Sleep there.

I have it sleep 10 milliseconds for example each time. Now that seems fine, except when you want to play super superbullet type games with your engine.

The sleep function will trigger regular and cause each action towards the engine to not work fast.

So for example i ran into trouble when i wanted Diep to produce 2 million evaluations with white and black reversed at a bunch of positions and then check for determinism. In Diep this is 0 ply search, as search initializes all sorts of datastructure for evaluation.

Of course that should be seconds CPU time, with just the i/o eating some big time, but it takes hours in reality, because the i/o thread sleeps, and there is a latency between it giving the engine command to start and stop the search. So instead of it taking microseconds a position it takes dozens of microseconds, just because of the communication between 2 threads.

I have yet to fix that - and i sure must fix this ASAP.

In short it is true that you offload things to a different core, but realize that there is no fast SOFTWARE mechanism to communicate between a thread that's basically always idling and a searching thread. From the searching viewpoint you want everything realtime, and the controlling thread basically has to idle.

The minimum time to wake up a thread is to schedule it in linux. Even in the realtime kernel that's pretty slow. It is in principle 10+ milliseconds simply.

So an active thread has an average responsetime of 70 microseconds in the realtime kernel. Much better than the normal kernel where it can run up to 0.5 milliseconds for an active thread.

Yet to reactivate a thread that is costly.

So realize what you're gonna do if you create a special thread for this; it means it becomes very tough to very quickly have your program produce a search, an evaluation, or anything like that.

BECAUSE the kernel basically fires at 100Hz.

You and i can't change this - databases depend upon it. So it's the same in all OS-es. A stupid limitation.

It's better in gpu's, where everything is realtime, but that's a different discussion :)

Actually in todays Diep i have default 3 threads when it runs single processor.

1 thread that's very dumb and just reads the input/keyboard/socket/TCP connection. It is blocking. 1 thread that's what i call the i/o thread, but better name would be commanding thread, it gets the keyboard input and parses it and it gives the searching processes/threads (there is no difference in diep between searching threads local and searching processes remote at shared memory machines) commands to start and stop search and it checks how long it has been thinking and based upon that commands the search to abort.

then there is the searchthread - never idles, it just reads a volatile variable to see whehter it needs to start search, then from other variables it sees whether it must initiate a search or just simply split somewhere.

becasue the command thread is idling normally, it needs to get woken up before it can do something. And we don't want it to eat much of a system time, so i give it a bunch of milliseconds to idle each time.

In your case you could get away just doing a blocking 'read the input of the user' type thread.

I really would advice you to make such a blocking thread reading stdin. As for the command thread, you might not do that yet until you're really SMP :)
I don't follow. In Cray Blitz, we just did a "read()" that blocked that thread until the user typed something. So what is the need for a sleep()??? I have it on my list to fix this in Crafty one day as I do not like the ugly code needed to poll for input. But it is a significant rewrite, since my "ponder code" would have to be completely re-done...
Obviously if you don't poll you need a thread that commands the search. That's what i made back in 2002. Bugfixes proving it 100% correct in all cases that didn't happen until januari 2010 :)

If you have a thread that commands the search, it's a good idea to build a thread that's doing a blocked read of stdin or the TCP connections.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: A question about thread-architecture.

Post by diep »

mcostalba wrote: Anyhow I can now say that this new approach is much better than the old one from a software design point of view and I would never come back.
Your 'new approach' is indeed what i'm doing for a year or 10 now.

However i disagree with the exact formulation you wrote down here.

"better from a software design point of view"

a) you need to put in massive amount of time to figure it all out
b) even then you still need to know something that's simply in general not easy to find with google - in my case contacts with a few kernel team members

And in all other cases the average experienced, even senior software engineer will total mess up to say it in very polite manner.

The superior simple design is a polling design where you create 1 thread for a blocking receiver thread of the standard input, which for example sets a volatile variable somewhere, and the only thing the search has to do is check that volatile variable.

Using kernel functions is asking for trouble except if you really know what you're doing, and the manual page and MSDN aren't enough then. You need more info than that.

Only under those conditions it's interesting to use this 'new old approach', in all other cases it's like using the space shuttle to deliver sixpack Heineken.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: A question about thread-architecture.

Post by mcostalba »

diep wrote: it's like using the space shuttle to deliver sixpack Heineken.
There is not a space shuttle in thread.cpp, in case you are bored today I'd suggest to give it a look:

https://github.com/mcostalba/Stockfish/ ... thread.cpp

But if you, like Bob, have already reached the developer Nirvana, where you don't look at others people code anymore, than I apologize in advance :-)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A question about thread-architecture.

Post by bob »

mcostalba wrote:
diep wrote: it's like using the space shuttle to deliver sixpack Heineken.
There is not a space shuttle in thread.cpp, in case you are bored today I'd suggest to give it a look:

https://github.com/mcostalba/Stockfish/ ... thread.cpp

But if you, like Bob, have already reached the developer Nirvana, where you don't look at others people code anymore, than I apologize in advance :-)
Do I make a mistake by ASSUMING that when you explain something you are explaining exactly what you did? Or should I assume when you explain something it might be just random noise and I need to look carefully at your code???

The idea you explained is not new, it is not complicated, and I have personally used it twice, first in Cray Blitz, and then in older versions of Crafty. It has some merit, it has a few issues. If the gain is minimal, and the complexity increases, I might have to think about whether to do it or not. Had I bitten the bullet and went to an iterative search rather than a recursive implementation, I would probably still be doing it.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A question about thread-architecture.

Post by bob »

diep wrote:
bob wrote:
diep wrote:
Kempelen wrote:Hello, a question about thread-architecture:

I programmed Rodin with main thread to handle WB protocol and a thread created at startup to engine-think. I have seen engines that only uses one thread and call a kind of "bioskey" function to ask if new user/gui data is available. Is second aproach more efficient that the first for a non-SMP engine? I ask this because I have read that windows usually need CPU time when there are more than one thread to handle them propertly. Maybe that time would be useful for the engine.....

Regards
Fermin
the original DOS derived Diep was just polling during search each n nodes.

What i do in Diep since 2002 is a special i/o thread that handles all user typing and it commands the search to do anything. Reason to this back then was the bit studentish assumption that some specific way to read keyboard would block this proces from search.

Even though it would be 1 out of 500 cpu's, that could stop all other 499 cpu's.

That was reason to build this special i/o thread.

In theory it is better, but it does have disadvantages.

a) you need to prove things correct
b) here is most important problem: that is that the i/o thread you don't want it to eat a full core. So basically you need to call Sleep there.

I have it sleep 10 milliseconds for example each time. Now that seems fine, except when you want to play super superbullet type games with your engine.

The sleep function will trigger regular and cause each action towards the engine to not work fast.

So for example i ran into trouble when i wanted Diep to produce 2 million evaluations with white and black reversed at a bunch of positions and then check for determinism. In Diep this is 0 ply search, as search initializes all sorts of datastructure for evaluation.

Of course that should be seconds CPU time, with just the i/o eating some big time, but it takes hours in reality, because the i/o thread sleeps, and there is a latency between it giving the engine command to start and stop the search. So instead of it taking microseconds a position it takes dozens of microseconds, just because of the communication between 2 threads.

I have yet to fix that - and i sure must fix this ASAP.

In short it is true that you offload things to a different core, but realize that there is no fast SOFTWARE mechanism to communicate between a thread that's basically always idling and a searching thread. From the searching viewpoint you want everything realtime, and the controlling thread basically has to idle.

The minimum time to wake up a thread is to schedule it in linux. Even in the realtime kernel that's pretty slow. It is in principle 10+ milliseconds simply.

So an active thread has an average responsetime of 70 microseconds in the realtime kernel. Much better than the normal kernel where it can run up to 0.5 milliseconds for an active thread.

Yet to reactivate a thread that is costly.

So realize what you're gonna do if you create a special thread for this; it means it becomes very tough to very quickly have your program produce a search, an evaluation, or anything like that.

BECAUSE the kernel basically fires at 100Hz.

You and i can't change this - databases depend upon it. So it's the same in all OS-es. A stupid limitation.

It's better in gpu's, where everything is realtime, but that's a different discussion :)

Actually in todays Diep i have default 3 threads when it runs single processor.

1 thread that's very dumb and just reads the input/keyboard/socket/TCP connection. It is blocking. 1 thread that's what i call the i/o thread, but better name would be commanding thread, it gets the keyboard input and parses it and it gives the searching processes/threads (there is no difference in diep between searching threads local and searching processes remote at shared memory machines) commands to start and stop search and it checks how long it has been thinking and based upon that commands the search to abort.

then there is the searchthread - never idles, it just reads a volatile variable to see whehter it needs to start search, then from other variables it sees whether it must initiate a search or just simply split somewhere.

becasue the command thread is idling normally, it needs to get woken up before it can do something. And we don't want it to eat much of a system time, so i give it a bunch of milliseconds to idle each time.

In your case you could get away just doing a blocking 'read the input of the user' type thread.

I really would advice you to make such a blocking thread reading stdin. As for the command thread, you might not do that yet until you're really SMP :)
I don't follow. In Cray Blitz, we just did a "read()" that blocked that thread until the user typed something. So what is the need for a sleep()??? I have it on my list to fix this in Crafty one day as I do not like the ugly code needed to poll for input. But it is a significant rewrite, since my "ponder code" would have to be completely re-done...
Obviously if you don't poll you need a thread that commands the search. That's what i made back in 2002. Bugfixes proving it 100% correct in all cases that didn't happen until januari 2010 :)

If you have a thread that commands the search, it's a good idea to build a thread that's doing a blocked read of stdin or the TCP connections.
In CB I had an I/O thread, and a search thread or threads. The search threadshad global flag to deal with, such as "abort_search". I still let the search code poll for time, but it knew if it was pondering or not and did not poll while pondering. In a normal game, it was set up to check the time about once per second or so, minimizing overhead.

Others have done it differently, where they start/terminate the search thread(s) when they start to ponder, or start to search, or when time runs out. I didn't want the overhead. Back in the 80's, starting a process was pretty expensive as unix did not then have "copy on write" making process creation pretty expensive. Not so much today.