A new chess engine : m8 (comming not so soon)

mar · Post by **mar** » Sun Feb 01, 2015 6:00 pm

mathmoi wrote:Isn't lazy SMP the technique that use the TT as a communication device between threads that search the same tree?

Basically yes, but it has been improved since the old days (also we're using larger TTs today):
resync on each ID iteration, early termination when one of the helpers finishes and run each other thread on depth+1 (as proposed by Dan Homan).

My understanding was that it gave little speedup passed 2 CPU. Am I missing something?

Where did you read that?

There's evidence that it works and in fact it works very well.
If claim sun in blue will you believe it? If I keep repeating it over and over again? Or will you trust what you see?

petero2 · Post by **petero2** » Sun Feb 01, 2015 8:38 pm

mar wrote:
mathmoi wrote:Isn't lazy SMP the technique that use the TT as a communication device between threads that search the same tree?
Basically yes, but it has been improved since the old days (also we're using larger TTs today):
resync on each ID iteration, early termination when one of the helpers finishes and run each other thread on depth+1 (as proposed by Dan Homan).

My understanding was that it gave little speedup passed 2 CPU. Am I missing something?
Where did you read that? :shock: There's evidence that it works and in fact it works very well.
If claim sun in blue will you believe it? If I keep repeating it over and over again? Or will you trust what you see?

I have tested this algorithm using Cheng 0.38 on a 16 core Dell PowerEdge T620 computer. Both hyperthreading and turbo boost are enabled. The computer runs Fedora 19. Here are the results, computed by bayeselo:

Code: Select all

Cheng 4c vs Cheng 1c, 1+0.08&#58;
Rank Name             Elo    +    - games score oppo. draws 
   1 cheng4_038_4c     68    8    8  1457   70%   -68   35% 
   2 cheng4_038       -68    8    8  1457   30%    68   35% 

Cheng 4c vs Cheng 1c, 8+0.64&#58;
Rank Name             Elo    +    - games score oppo. draws 
   1 cheng4_038_4c     66   10   10  1011   71%   -66   42% 
   2 cheng4_038       -66   10   10  1011   29%    66   42% 

Cheng 8c vs Cheng 4c, 1+0.08&#58;
Rank Name             Elo    +    - games score oppo. draws 
   1 cheng4_038_8c     28    8    8  1475   59%   -28   44% 
   2 cheng4_038_4c    -28    8    8  1475   41%    28   44% 

Cheng 16c vs Cheng 8c, 1+0.08&#58;
Rank Name             Elo    +    - games score oppo. draws 
   1 cheng4_038_16c     4    8    8  1400   51%    -4   46% 
   2 cheng4_038_8c     -4    8    8  1400   49%     4   46% 

Cheng 16c vs Cheng 8c, 2+0.16&#58;
Rank Name             Elo    +    - games score oppo. draws 
   1 cheng4_038_16c    13    8    8  1433   54%   -13   50% 
   2 cheng4_038_8c    -13    8    8  1433   46%    13   50% 

Cheng 16c vs Cheng 8c, 4+0.32&#58;
Rank Name             Elo    +    - games score oppo. draws 
   1 cheng4_038_16c     6    8    8  1151   52%    -6   53% 
   2 cheng4_038_8c     -6    8    8  1151   48%     6   53% 

Cheng 16c vs Cheng 8c, 180+1&#58;
Rank Name             Elo    +    - games score oppo. draws 
   1 cheng4_038_16c    17   20   20   184   56%   -17   53% 
   2 cheng4_038_8c    -17   20   20   184   44%    17   53%

Even at hyper bullet speed (1s+0.08s/move) it scales well up to 8 cores. +130 elo from 1 to 4 cores, and +56 elo from 4 to 8 cores. At 16 cores it seems hyper bullet speed is too fast for the algorithm to be effective, but it still seems to work well, +34 elo, at 3m+1s/move time control, even though the number of games (184) is too low to be really sure.

Feeding all 16 vs 8 cores games into bayeselo gives +16 elo and LOS=99.9%, so it is pretty clear that the algorithm works also for 16 cores even if it is unclear exactly how well it works.

Some other results for comparison, even though the low number of games and different 1 core ratings make it hard to draw any definite conclusions:

Code: Select all

Texel 16c vs Texel 8c, 180+1&#58;
Rank Name          Elo    +    - games score oppo. draws 
   1 Texel16c       21   19   18   190   58%   -21   69% 
   2 Texel8c       -21   18   19   190   42%    21   69% 

Texel 16c vs Texel 16c_nonuma, 180+1&#58;
Rank Name          Elo    +    - games score oppo. draws 
   1 Texel16c        7   19   19   175   53%    -7   71% 
   2 Texel16cnn     -7   19   19   175   47%     7   71% 

Komodo 8 16c vs Komodo 8 8c, 180+1&#58;
Rank Name          Elo    +    - games score oppo. draws 
   1 komodo8_16c    12   18   18   196   55%   -12   78% 
   2 komodo8_8c    -12   18   18   196   45%    12   78%

mar · Post by **mar** » Sun Feb 01, 2015 9:14 pm

Thanks for the data Peter

Not bad for something that's lousy beyond 2 cores I guess.

mathmoi · Post by **mathmoi** » Sun Feb 01, 2015 10:39 pm

mar wrote:
mathmoi wrote:Isn't lazy SMP the technique that use the TT as a communication device between threads that search the same tree?
Basically yes, but it has been improved since the old days (also we're using larger TTs today):
resync on each ID iteration, early termination when one of the helpers finishes and run each other thread on depth+1 (as proposed by Dan Homan).

My understanding was that it gave little speedup passed 2 CPU. Am I missing something?
Where did you read that? There's evidence that it works and in fact it works very well.
If claim sun in blue will you believe it? If I keep repeating it over and over again? Or will you trust what you see?

Hi Martin,

I'm not sure where I read that. I always though lazy SMP did not gave a good speedup. Obviously before I implement an SMP algorithm I will have to educate myself on what algorithms are available and how they perform.

I just looked on CPW and did not find much information about lazy SMP (or even YBWC). Is there a good resource online or maybe on CCC?

mp

Volker Annuss · Post by **Volker Annuss** » Sun Feb 01, 2015 11:36 pm

m8 (read mate)

Nice name. The german word for 8 is acht.
The german word Macht means power/force.

AlvaroBegue · Post by **AlvaroBegue** » Mon Feb 02, 2015 12:03 am

Volker Annuss wrote:
m8 (read mate)
Nice name. The german word for 8 is acht.
The german word Macht means power/force.

In Spanish it doesn't work so well: "mocho" is an adjective used for animals and it means that their horns are rounded or blunt.

Evert · Post by **Evert** » Mon Feb 02, 2015 12:23 am

mathmoi wrote:
Evert wrote:
Ferdy wrote:Perhaps you may try chess variant, say capablanca, 8 rows x 10 columns. Two pieces each are added in each side, one piece is a combination of knight and rook the other piece is a combination of knight and bishop. It is fun and interesting to explore the possibilities of these two new pieces over the expanded board.
I can only add my support for this suggestion. If you want to stick with 8x8, consider Seirawan Chess: it adds the same two pieces as Capablanca chess, and regular chess is just a subset Seirawan Chess anyway.

EDIT: think of it this way: another chess engine is just that, they're a dime a dozen. Another chess variant engine is something special. There may be fewer people interested in chess variant engines, but because there are so few of them out there, people who are interested will be interested in yours.
Hi Evert,

I'll look into this. As I said, I'm interested in implementing search variant, but I am probably limited to 8 x 8 variants since I want to use bitboards.

SjaakII uses bitboards and it plays on boards larger than 8x8 (the limitation on board size is no more than 128 squares and no more than 16 ranks/files), so it can be done. It's not necessarily simple though (but it is fun because it means you have to figure out how to do some things for yourself) and I don't think you would want to use something like magic generation.

Note that I'm not saying it's necessarily optimal, just that it's possible.

mar · Post by **mar** » Mon Feb 02, 2015 9:41 am

mathmoi wrote:Is there a good resource online or maybe on CCC?

Of course, you can start here for example (original thread by Dan Homan): http://www.talkchess.com/forum/viewtopic.php?t=46858

lucasart · Post by **lucasart** » Mon Feb 02, 2015 11:16 am

mar wrote:Thanks for the data Peter
Not bad for something that's lousy beyond 2 cores I guess.

This is remarkable for an algorithm that is so simple. Considering that most people have 4 cores or less, it seems like an ideal tradeoff. Even 4 to 8 scaling is not too bad.

One technicality I don't understand: 1 thread runs depth d, while n-1 run d+1. seems easy enough, but… how do you manage aspiration windows? or does running depth d include the aspiration loop?

lucasart · Post by **lucasart** » Mon Feb 02, 2015 11:39 am

It's quite possible that this lazy SMP can be generalized to perform better with 16 threads. Something to try (though requires a 16 core machine which I don't posess):
* current version: 1 thread at depth d and n-1= 15 threads at depth d+1
* test version: 1 thread at depth d, 3 threads at depth d+1, 12 threads at depth d+2
(or variations along those lines)

A new chess engine : m8 (comming not so soon)

Re: A new chess engine : m8 (comming not so soon)

Re: A new chess engine : m8 (comming not so soon)

Re: A new chess engine : m8 (comming not so soon)

Re: A new chess engine : m8 (comming not so soon)

Re: A new chess engine : m8 (comming not so soon)

Re: A new chess engine : m8 (comming not so soon)

Re: A new chess engine : m8 (comming not so soon)

Re: A new chess engine : m8 (comming not so soon)

Re: A new chess engine : m8 (comming not so soon)

Re: A new chess engine : m8 (comming not so soon)