mar wrote:mathmoi wrote:Isn't lazy SMP the technique that use the TT as a communication device between threads that search the same tree?
Basically yes, but it has been improved since the old days (also we're using larger TTs today):
resync on each ID iteration, early termination when one of the helpers finishes and run each other thread on depth+1 (as proposed by Dan Homan).
My understanding was that it gave little speedup passed 2 CPU. Am I missing something?
Where did you read that? :shock: There's evidence that it works and in fact it works very well.
If claim sun in blue will you believe it? If I keep repeating it over and over again? Or will you trust what you see?
I have tested this algorithm using Cheng 0.38 on a 16 core Dell PowerEdge T620 computer. Both hyperthreading and turbo boost are enabled. The computer runs Fedora 19. Here are the results, computed by bayeselo:
Code: Select all
Cheng 4c vs Cheng 1c, 1+0.08:
Rank Name Elo + - games score oppo. draws
1 cheng4_038_4c 68 8 8 1457 70% -68 35%
2 cheng4_038 -68 8 8 1457 30% 68 35%
Cheng 4c vs Cheng 1c, 8+0.64:
Rank Name Elo + - games score oppo. draws
1 cheng4_038_4c 66 10 10 1011 71% -66 42%
2 cheng4_038 -66 10 10 1011 29% 66 42%
Cheng 8c vs Cheng 4c, 1+0.08:
Rank Name Elo + - games score oppo. draws
1 cheng4_038_8c 28 8 8 1475 59% -28 44%
2 cheng4_038_4c -28 8 8 1475 41% 28 44%
Cheng 16c vs Cheng 8c, 1+0.08:
Rank Name Elo + - games score oppo. draws
1 cheng4_038_16c 4 8 8 1400 51% -4 46%
2 cheng4_038_8c -4 8 8 1400 49% 4 46%
Cheng 16c vs Cheng 8c, 2+0.16:
Rank Name Elo + - games score oppo. draws
1 cheng4_038_16c 13 8 8 1433 54% -13 50%
2 cheng4_038_8c -13 8 8 1433 46% 13 50%
Cheng 16c vs Cheng 8c, 4+0.32:
Rank Name Elo + - games score oppo. draws
1 cheng4_038_16c 6 8 8 1151 52% -6 53%
2 cheng4_038_8c -6 8 8 1151 48% 6 53%
Cheng 16c vs Cheng 8c, 180+1:
Rank Name Elo + - games score oppo. draws
1 cheng4_038_16c 17 20 20 184 56% -17 53%
2 cheng4_038_8c -17 20 20 184 44% 17 53%
Even at hyper bullet speed (1s+0.08s/move) it scales well up to 8 cores. +130 elo from 1 to 4 cores, and +56 elo from 4 to 8 cores. At 16 cores it seems hyper bullet speed is too fast for the algorithm to be effective, but it still seems to work well, +34 elo, at 3m+1s/move time control, even though the number of games (184) is too low to be really sure.
Feeding all 16 vs 8 cores games into bayeselo gives +16 elo and LOS=99.9%, so it is pretty clear that the algorithm works also for 16 cores even if it is unclear exactly how well it works.
Some other results for comparison, even though the low number of games and different 1 core ratings make it hard to draw any definite conclusions:
Code: Select all
Texel 16c vs Texel 8c, 180+1:
Rank Name Elo + - games score oppo. draws
1 Texel16c 21 19 18 190 58% -21 69%
2 Texel8c -21 18 19 190 42% 21 69%
Texel 16c vs Texel 16c_nonuma, 180+1:
Rank Name Elo + - games score oppo. draws
1 Texel16c 7 19 19 175 53% -7 71%
2 Texel16cnn -7 19 19 175 47% 7 71%
Komodo 8 16c vs Komodo 8 8c, 180+1:
Rank Name Elo + - games score oppo. draws
1 komodo8_16c 12 18 18 196 55% -12 78%
2 komodo8_8c -12 18 18 196 45% 12 78%