Page 1 of 2

Some SMP measurements with Rookie v3

Posted: Fri Feb 06, 2015 12:37 am
by mvk
Recenty I conducted a scaling test to measure the performance of Rookie 3.x's parallel search.

The algorithm I use works roughly as follows:
1. Main process forks "n" child processes at startup
2. Command communication between main process and children is through pipes
3. Child sends candidate split moves to the main process, and continues searching as normal
4. Child applies YBW and minsplitdepth = 6 for this
5. Main process collects these split moves, and when a child is idle, dispatches a search there
6. The idea is that the main process should always have a stash of moves to send to idle cpus (in reality the stash often runs empty, due to YBW)
7. A child doesn't wait for a result of a split move. If it is out of work, it will pick a move which it has sent to the main process, and searches it himself.
8. The child sends a cancel request for that move to the parent when that happens
9. If the parent hasn't dispatched the move yet, it just removes it from its stash
10. Otherwise it aborts the search at the child to which it has dispatched the move before
11. Abort messages from main process to childs are SIGINT signals
12. All communication on the pipes is in human readable command line text
13. Search results of split moves are stored in the global shared memory hash table. No additional communication other than that.
So the main process doesn't search itself, it just acts as a conductor of the orchestra.

This is a horrible algorithm, but note that it was optimised for something. Not for speed or elo. It was optimised for implementation time first, and second to keep the door open for a distributed search (which I never made, but still). If I look back in my log book, I started the implementation on Sep/18 (2010), and it was running reliably two weeks later on Oct/02. Well in time to trust it for the Leiden tournament of November that year, where Rookie v3.0 debuted and won the Programmer's Prize.

I haven't touched the algorithm ever since. Before I would do that, I need to know how well it performs. Recently I started to measure that.

Test conditions:
- AMD 8350 (8 cores) @ 4GHz, 16 GB ram, 128 GB SSD.
- cutechess-cli
- Round-robin tournament between 1cpu, 2cpu, 4cpu and 8cpu version.
- When 8cpu was playing, cutechess-cli concurrency was set to 1. When 4cpu was playing, concurrency was 2. When 2cpu was playing, concurrency was 4.
- Hash table 256MB per player
- 6pc Syzygy on SSD
- No pondering
- Timecontrol: 120s base with 5s increment
- 1000 openings, played from both sides.
- Resign when score below -8 pawns for 3 moves
- The 1cpu version doesn't fork processes. The search in that case is done by the main process. (This shouldn't matter, but still).

The tournament runs started Oct/19 and finished Feb/04, or just over 100 days. You can see why I haven't done this before.

Results are bad or not so bad, depending on your point of view:

Code: Select all

Rank Name               Elo    +    - games score oppo. draws 
   1 rookie3.8b2-cpu8    84    7    7  6000   58%    39   53% 
   2 rookie3.8b2-cpu4    73    7    7  6000   55%    42   55% 
   3 rookie3.8b2-cpu2    43    7    7  6000   48%    52   55% 
   4 rookie3.8b2          0    7    7  6000   39%    67   51% 
(The LoS table is full with 99% and 100% numbers, as expected.)

Conclusion: not bad for 2 weeks of programming effort. Not good for what should be possible with 8 cores.

I have somewhat lost my interest in chess programming, so I will probably not touch it in the coming years anyway. I just wanted to share a result.

Re: Some SMP measurements with Rookie v3

Posted: Fri Feb 06, 2015 5:53 am
by Ferdy
Thanks for sharing, what about sharing the engine? As you may know one of the reasons why I stopped participating online tournaments is because of those engines that are private which enter the tournaments too :(. You can share it only to me of course.

My interest is also down at present. I believe stockfish has owned this game now and it is good, it is free and is still developed continuously.

Re: Some SMP measurements with Rookie v3

Posted: Fri Feb 06, 2015 7:39 am
by xmas79
mvk wrote:...I have somewhat lost my interest in chess programming...
Why? If I may ask...

Re: Some SMP measurements with Rookie v3

Posted: Fri Feb 06, 2015 8:06 am
by Graham Banks
Ferdy wrote:Thanks for sharing, what about sharing the engine? As you may know one of the reasons why I stopped participating online tournaments is because of those engines that are private which enter the tournaments too :(. You can share it only to me of course.

My interest is also down at present. I believe stockfish has owned this game now and it is good, it is free and is still developed continuously.
Many of us are interested in more than just Stockfish, Komodo and Houdini.
I hope that you can regain your enthusiasm.

Re: Some SMP measurements with Rookie v3

Posted: Fri Feb 06, 2015 8:33 am
by cdani
Graham Banks wrote: Many of us are interested in more than just Stockfish, Komodo and Houdini.
I hope that you can regain your enthusiasm.
Sure!

Re: Some SMP measurements with Rookie v3

Posted: Fri Feb 06, 2015 9:19 am
by mvk
Ferdy wrote:Thanks for sharing, what about sharing the engine? As you may know one of the reasons why I stopped participating online tournaments is because of those engines that are private which enter the tournaments too :(. You can share it only to me of course.

My interest is also down at present. I believe stockfish has owned this game now and it is good, it is free and is still developed continuously.
Maybe one day. There are usability problems with the program. They don't bother me, but most people would find them extremely annoying. Things like very long startup time, non-compliances with the xboard protocol, hard-coded file name paths, recompile needed to change the hash table size, manual setting of the shared memory limits on your system, no windows compile, non standard book format. Until recently I had the bitbases embedded, giving a 2GB executable. I don't want to support a release with such issues. It will cost me too much of my time to fix, or to support when out. That said, a few other programmers have a private versions of Rookie though.

Re: Some SMP measurements with Rookie v3

Posted: Fri Feb 06, 2015 9:24 am
by mvk
xmas79 wrote:
mvk wrote:...I have somewhat lost my interest in chess programming...
Why? If I may ask...
Health issues. And sometimes you need a few years break from it anyway. I should correct myself. Interest is there. Appetite and energy could be better. It will return.

Re: Some SMP measurements with Rookie v3

Posted: Fri Feb 06, 2015 10:13 am
by Ferdy
mvk wrote:
Ferdy wrote:Thanks for sharing, what about sharing the engine? As you may know one of the reasons why I stopped participating online tournaments is because of those engines that are private which enter the tournaments too :(. You can share it only to me of course.

My interest is also down at present. I believe stockfish has owned this game now and it is good, it is free and is still developed continuously.
Maybe one day. There are usability problems with the program. They don't bother me, but most people would find them extremely annoying. Things like very long startup time, non-compliances with the xboard protocol, hard-coded file name paths, recompile needed to change the hash table size, manual setting of the shared memory limits on your system, no windows compile, non standard book format. Until recently I had the bitbases embedded, giving a 2GB executable. I don't want to support a release with such issues. It will cost me too much of my time to fix, or to support when out. That said, a few other programmers have a private versions of Rookie though.
Good to know thanks, go on and have some breaks.

Re: Some SMP measurements with Rookie v3

Posted: Fri Feb 06, 2015 1:49 pm
by Modern Times
mvk wrote:Test conditions:
- AMD 8350 (8 cores) @ 4GHz, 16 GB ram, 128 GB SSD.
- cutechess-cli
- Round-robin tournament between 1cpu, 2cpu, 4cpu and 8cpu version.
- When 8cpu was playing, cutechess-cli concurrency was set to 1. When 4cpu was playing, concurrency was 2. When 2cpu was playing, concurrency was 4.
- Hash table 256MB per player
- 6pc Syzygy on SSD
- No pondering
- Timecontrol: 120s base with 5s increment
- 1000 openings, played from both sides.
- Resign when score below -8 pawns for 3 moves
- The 1cpu version doesn't fork processes. The search in that case is done by the main process. (This shouldn't matter, but still).

The tournament runs started Oct/19 and finished Feb/04, or just over 100 days. You can see why I haven't done this before.

Results are bad or not so bad, depending on your point of view:

Code: Select all

Rank Name               Elo    +    - games score oppo. draws 
   1 rookie3.8b2-cpu8    84    7    7  6000   58%    39   53% 
   2 rookie3.8b2-cpu4    73    7    7  6000   55%    42   55% 
   3 rookie3.8b2-cpu2    43    7    7  6000   48%    52   55% 
   4 rookie3.8b2          0    7    7  6000   39%    67   51% 
(The LoS table is full with 99% and 100% numbers, as expected.)

Conclusion: not bad for 2 weeks of programming effort. Not good for what should be possible with 8 cores.

Tthe modular design of the AMD Piledriver architecture might be holding back the 8CPU performance a little. I'm not sure of the real world effect of that.

Re: Some SMP measurements with Rookie v3

Posted: Fri Feb 06, 2015 7:52 pm
by mvk
Modern Times wrote: Tthe modular design of the AMD Piledriver architecture might be holding back the 8CPU performance a little. I'm not sure of the real world effect of that.
That is a good point. I did raw speedup tests a longer time ago. Running multiple instances of the program in parallel, and counting the combined NPS. Results are here (alternative: pdf):

Image

You can observe that from CPU 5 and up, the additional NPS per core is only approximately 65% of that of the first 4 cores. The first 4 each give 2.7Mnps~2.8Mnps, while the last 4 give 1.7~1.8M each.