lazy smp questions

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: lazy smp questions

Post by cdani »

lucasart wrote:
cdani wrote:
mar wrote:
lucasart wrote:I've got some lazy SMP working, and it gives me this kind of repeated and shuffled output (eg. with 8 threads):
I didn't bother with this, I simply don't output anything for helpers and (if a helper finished first) I echo score/pv again at the end of the iteration.
As I ignore the threads > 0, I only show the info of the thread 0. I even do not collect the pv for other threads as it is not necessary.
I suppose you can do that. It's not very correct though, as it means you may show completed iterations later than they actually happen (what if thread #2 finished an iteration 10 sec before thread 0?).
In Andscacs, when a thread <> 0 finishes before #0, it stops itself, and a cpu is idle. The search finishes when thread #0 finish, and the other threads do not count for anything a part of helping through the different hashes.

I have yet to try more complicated things.
lucasart wrote: Here's my search code, if you're interested:
https://github.com/lucasart/Demolito/bl ... /search.cc
Is very interesting, it will be surely better of what I do :-)
mbootsector
Posts: 6
Joined: Thu Sep 24, 2015 10:16 am

Re: lazy smp questions

Post by mbootsector »

lucasart wrote:
cdani wrote:
mar wrote:
lucasart wrote:I've got some lazy SMP working, and it gives me this kind of repeated and shuffled output (eg. with 8 threads):
I didn't bother with this, I simply don't output anything for helpers and (if a helper finished first) I echo score/pv again at the end of the iteration.
As I ignore the threads > 0, I only show the info of the thread 0. I even do not collect the pv for other threads as it is not necessary.
I suppose you can do that. It's not very correct though, as it means you may show completed iterations later than they actually happen (what if thread #2 finished an iteration 10 sec before thread 0?).
I think I have the laziest of lazy SMPs for Stockfish (https://github.com/mbootsector/Stockfish/tree/lazy_smp).
N threads are started with a different target depth for each thread. They run on their own until there is a signal to abort the search. Main thread is responsible for time control, and sending moves to the GUI. That's all. There aren't any tests for "incorrectness", but it performs well anyway. On fishtest, a first test was only about 13 elo worse than the "normal smp" in Stockfish. That was with 3 threads, the performance will likely change with more threads and longer time controls.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: lazy smp questions

Post by cdani »

mbootsector wrote: I think I have the laziest of lazy SMPs for Stockfish (https://github.com/mbootsector/Stockfish/tree/lazy_smp).
N threads are started with a different target depth for each thread. They run on their own until there is a signal to abort the search. Main thread is responsible for time control, and sending moves to the GUI. That's all. There aren't any tests for "incorrectness", but it performs well anyway. On fishtest, a first test was only about 13 elo worse than the "normal smp" in Stockfish. That was with 3 threads, the performance will likely change with more threads and longer time controls.
Nice! Do you have two comparable exe files for Windows to do some tests? One with the standard distribution and one with yours?

Thanks.
mbootsector
Posts: 6
Joined: Thu Sep 24, 2015 10:16 am

Re: lazy smp questions

Post by mbootsector »

cdani wrote:
mbootsector wrote: I think I have the laziest of lazy SMPs for Stockfish (https://github.com/mbootsector/Stockfish/tree/lazy_smp).
N threads are started with a different target depth for each thread. They run on their own until there is a signal to abort the search. Main thread is responsible for time control, and sending moves to the GUI. That's all. There aren't any tests for "incorrectness", but it performs well anyway. On fishtest, a first test was only about 13 elo worse than the "normal smp" in Stockfish. That was with 3 threads, the performance will likely change with more threads and longer time controls.
Nice! Do you have two comparable exe files for Windows to do some tests? One with the standard distribution and one with yours?

Thanks.
Sorry, I don't have those. Can't compile SF for Windows atm.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: lazy smp questions

Post by lucasart »

mbootsector wrote:
lucasart wrote:
cdani wrote:
mar wrote:
lucasart wrote:I've got some lazy SMP working, and it gives me this kind of repeated and shuffled output (eg. with 8 threads):
I didn't bother with this, I simply don't output anything for helpers and (if a helper finished first) I echo score/pv again at the end of the iteration.
As I ignore the threads > 0, I only show the info of the thread 0. I even do not collect the pv for other threads as it is not necessary.
I suppose you can do that. It's not very correct though, as it means you may show completed iterations later than they actually happen (what if thread #2 finished an iteration 10 sec before thread 0?).
I think I have the laziest of lazy SMPs for Stockfish (https://github.com/mbootsector/Stockfish/tree/lazy_smp).
N threads are started with a different target depth for each thread. They run on their own until there is a signal to abort the search. Main thread is responsible for time control, and sending moves to the GUI. That's all. There aren't any tests for "incorrectness", but it performs well anyway. On fishtest, a first test was only about 13 elo worse than the "normal smp" in Stockfish. That was with 3 threads, the performance will likely change with more threads and longer time controls.
Very nice work. It's remarkably close to master. Only 6 elo behind on 7 threads. This is nothing: under such test conditions, 7 threads vs. 1 thread would easily score +200 elo.

One quick win you should try, to squeeze a bit more elo out of this:
* currently you return bestmove and ponder move from mth->pv[]
* you have an arbitrary notion of main thread, probably for practical reasons. but, all search threads are equal. it may not be the main thread that completed the highest iteration so far. You should have a global structure with pv+depth, that any thread can update once it finishes an iteration that is strictly larger than the last iteration recorded. that way you always have the best pv so far to return.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
mbootsector
Posts: 6
Joined: Thu Sep 24, 2015 10:16 am

Re: lazy smp questions

Post by mbootsector »

lucasart wrote:
mbootsector wrote:
lucasart wrote:
cdani wrote:
mar wrote:
lucasart wrote:I've got some lazy SMP working, and it gives me this kind of repeated and shuffled output (eg. with 8 threads):
I didn't bother with this, I simply don't output anything for helpers and (if a helper finished first) I echo score/pv again at the end of the iteration.
As I ignore the threads > 0, I only show the info of the thread 0. I even do not collect the pv for other threads as it is not necessary.
I suppose you can do that. It's not very correct though, as it means you may show completed iterations later than they actually happen (what if thread #2 finished an iteration 10 sec before thread 0?).
I think I have the laziest of lazy SMPs for Stockfish (https://github.com/mbootsector/Stockfish/tree/lazy_smp).
N threads are started with a different target depth for each thread. They run on their own until there is a signal to abort the search. Main thread is responsible for time control, and sending moves to the GUI. That's all. There aren't any tests for "incorrectness", but it performs well anyway. On fishtest, a first test was only about 13 elo worse than the "normal smp" in Stockfish. That was with 3 threads, the performance will likely change with more threads and longer time controls.
Very nice work. It's remarkably close to master. Only 6 elo behind on 7 threads. This is nothing: under such test conditions, 7 threads vs. 1 thread would easily score +200 elo.

One quick win you should try, to squeeze a bit more elo out of this:
* currently you return bestmove and ponder move from mth->pv[]
* you have an arbitrary notion of main thread, probably for practical reasons. but, all search threads are equal. it may not be the main thread that completed the highest iteration so far. You should have a global structure with pv+depth, that any thread can update once it finishes an iteration that is strictly larger than the last iteration recorded. that way you always have the best pv so far to return.
I tried selecting the best move from the deepest thread instead of main, but there was no difference in strength, it may even have been weaker, but I tested with only 2+0.05... and currently, only the Main thread keeps track of time management which includes checking for fail lows at root. What if the move in the deepest thread just failed low? I did not have any tests for that... Also, maybe the threads are not equal. The deepest thread probably has less help from TT, and could be pruning at the wrong place, finishing the iteration faster but pruning the wrong nodes, which results in a worse move. Seems that the more I think about this, the more complicated it gets. Why was it called lazy again? :)

I think there is a few more elo to squeeze out of this, but the big questions is, is it really better than YBWC? Considering that almost all tests and tunings on Fishtest are done with one thread, and because lazy SMP searches in the same way the one-thread tests were done, we could have a case where a less efficient search method catches up with a better one just because the parameters are optimized for it.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: lazy smp questions

Post by cdani »

I have done some games of Stockfish standard vs Stockfish with lazy mp:
www.andscacs.com/stockfish/stockfish_lazy_mp_games.zip

You have four files:

Code: Select all

rst_master_lzmpv2_20.03_6t_I75820K.pgn
1   st_master   +10  +44/=225/-35 51.48%  156.5/304
2   st_lsmpv2   -10  +35/=225/-44 48.52%  147.5/304

rst_master_lzmpv2_50.05_6t_I75820K.pgn
1   st_lsmpv2    +3  +25/=175/-23 50.45%  112.5/223
2   st_master    -3  +23/=175/-25 49.55%  110.5/223


rst_normal_lzmpv2_25.03_16t_E52680v2_amazon_c3.4xlarge.pgn
1   st_modern_lzmpv2   +10  +13/=80/-10 51.46%   53.0/103
2   st_modern          -10  +10/=80/-13 48.54%   50.0/103

rst_normal_lzmpv2_50.05_16t_E52670.pgn
1   st_modern_lzmpv2   +18  +41/=223/-26 52.59%  152.5/290
2   st_modern          -18  +26/=223/-41 47.41%  137.5/290

20.03 means 20 seconds + 0.03 added for move. Each file name has also the number of threads and the processor name.
There are not a lot of games, but seems probable that with more time or with more threads lazy mp is better than stockfish standard.

In a side note, the test in amazon provoked some lost on time games, most of them by the lazy mp version. Of course the amazon server is not really a 16 full threads one, probably with half of them being hypertreading ones.
mbootsector
Posts: 6
Joined: Thu Sep 24, 2015 10:16 am

Re: lazy smp questions

Post by mbootsector »

cdani wrote:I have done some games of Stockfish standard vs Stockfish with lazy mp:
www.andscacs.com/stockfish/stockfish_lazy_mp_games.zip

You have four files:

Code: Select all

rst_master_lzmpv2_20.03_6t_I75820K.pgn
1   st_master   +10  +44/=225/-35 51.48%  156.5/304
2   st_lsmpv2   -10  +35/=225/-44 48.52%  147.5/304

rst_master_lzmpv2_50.05_6t_I75820K.pgn
1   st_lsmpv2    +3  +25/=175/-23 50.45%  112.5/223
2   st_master    -3  +23/=175/-25 49.55%  110.5/223


rst_normal_lzmpv2_25.03_16t_E52680v2_amazon_c3.4xlarge.pgn
1   st_modern_lzmpv2   +10  +13/=80/-10 51.46%   53.0/103
2   st_modern          -10  +10/=80/-13 48.54%   50.0/103

rst_normal_lzmpv2_50.05_16t_E52670.pgn
1   st_modern_lzmpv2   +18  +41/=223/-26 52.59%  152.5/290
2   st_modern          -18  +26/=223/-41 47.41%  137.5/290

20.03 means 20 seconds + 0.03 added for move. Each file name has also the number of threads and the processor name.
There are not a lot of games, but seems probable that with more time or with more threads lazy mp is better than stockfish standard.

In a side note, the test in amazon provoked some lost on time games, most of them by the lazy mp version. Of course the amazon server is not really a 16 full threads one, probably with half of them being hypertreading ones.
Thanks for running!
It's too early to say which is better because there are so few games. Also, the trend seems to be that when you use only real cores, master is better. When using HT, lazy smp seems better.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: lazy smp questions

Post by cdani »

mbootsector wrote:
cdani wrote:I have done some games of Stockfish standard vs Stockfish with lazy mp:
www.andscacs.com/stockfish/stockfish_lazy_mp_games.zip

You have four files:

Code: Select all

rst_master_lzmpv2_20.03_6t_I75820K.pgn
1   st_master   +10  +44/=225/-35 51.48%  156.5/304
2   st_lsmpv2   -10  +35/=225/-44 48.52%  147.5/304

rst_master_lzmpv2_50.05_6t_I75820K.pgn
1   st_lsmpv2    +3  +25/=175/-23 50.45%  112.5/223
2   st_master    -3  +23/=175/-25 49.55%  110.5/223


rst_normal_lzmpv2_25.03_16t_E52680v2_amazon_c3.4xlarge.pgn
1   st_modern_lzmpv2   +10  +13/=80/-10 51.46%   53.0/103
2   st_modern          -10  +10/=80/-13 48.54%   50.0/103

rst_normal_lzmpv2_50.05_16t_E52670.pgn
1   st_modern_lzmpv2   +18  +41/=223/-26 52.59%  152.5/290
2   st_modern          -18  +26/=223/-41 47.41%  137.5/290

20.03 means 20 seconds + 0.03 added for move. Each file name has also the number of threads and the processor name.
There are not a lot of games, but seems probable that with more time or with more threads lazy mp is better than stockfish standard.

In a side note, the test in amazon provoked some lost on time games, most of them by the lazy mp version. Of course the amazon server is not really a 16 full threads one, probably with half of them being hypertreading ones.
Thanks for running!
It's too early to say which is better because there are so few games. Also, the trend seems to be that when you use only real cores, master is better. When using HT, lazy smp seems better.
The test rst_normal_lzmpv2_50.05_16t_E52670.pgn is in a server with 16 physical cores, is not on amazon server. It has hyperthreading, but it's not used. It appears in task manager like a 32 cores, but I only used 16. So is already a valid result like the one you are waiting for.

Also the test rst_master_lzmpv2_50.05_6t_I75820K.pgn is in 6 physical cores.
mbootsector
Posts: 6
Joined: Thu Sep 24, 2015 10:16 am

Re: lazy smp questions

Post by mbootsector »

cdani wrote:
mbootsector wrote:
cdani wrote:I have done some games of Stockfish standard vs Stockfish with lazy mp:
www.andscacs.com/stockfish/stockfish_lazy_mp_games.zip

You have four files:

Code: Select all

rst_master_lzmpv2_20.03_6t_I75820K.pgn
1   st_master   +10  +44/=225/-35 51.48%  156.5/304
2   st_lsmpv2   -10  +35/=225/-44 48.52%  147.5/304

rst_master_lzmpv2_50.05_6t_I75820K.pgn
1   st_lsmpv2    +3  +25/=175/-23 50.45%  112.5/223
2   st_master    -3  +23/=175/-25 49.55%  110.5/223


rst_normal_lzmpv2_25.03_16t_E52680v2_amazon_c3.4xlarge.pgn
1   st_modern_lzmpv2   +10  +13/=80/-10 51.46%   53.0/103
2   st_modern          -10  +10/=80/-13 48.54%   50.0/103

rst_normal_lzmpv2_50.05_16t_E52670.pgn
1   st_modern_lzmpv2   +18  +41/=223/-26 52.59%  152.5/290
2   st_modern          -18  +26/=223/-41 47.41%  137.5/290

20.03 means 20 seconds + 0.03 added for move. Each file name has also the number of threads and the processor name.
There are not a lot of games, but seems probable that with more time or with more threads lazy mp is better than stockfish standard.

In a side note, the test in amazon provoked some lost on time games, most of them by the lazy mp version. Of course the amazon server is not really a 16 full threads one, probably with half of them being hypertreading ones.
Thanks for running!
It's too early to say which is better because there are so few games. Also, the trend seems to be that when you use only real cores, master is better. When using HT, lazy smp seems better.
The test rst_normal_lzmpv2_50.05_16t_E52670.pgn is in a server with 16 physical cores, is not on amazon server. It has hyperthreading, but it's not used. It appears in task manager like a 32 cores, but I only used 16. So is already a valid result like the one you are waiting for.

Also the test rst_master_lzmpv2_50.05_6t_I75820K.pgn is in 6 physical cores.
Ok, then the results are more interesting. :)
Does the machine have two E5-2670 cpus? Intels pages show that E5-2670 has only 8 cores.