OpenBench question

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

OpenBench question

Post by Rebel »

I downloaded the 4moves_noob.pgn opening set, it contains 1.8 million [!] positions. What me wonder, when you start a match with this opening suite, are the distributed positions random or sequential?
90% of coding is debugging, the other 10% is writing bugs.
noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: OpenBench question

Post by noobpwnftw »

https://github.com/AndyGrant/OpenBench/ ... nt.py#L326

Per code, opening selection is random.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: OpenBench question

Post by Rebel »

Random seems to me the best option indeed as I noticed that sequential chunks are from the same game. OTOH, since matches are not equal regarding the openings, has it been tried to run the same match of (say) 20,000 games between Ethereal and Rubichess 4-5 times and that the results are about equal?
90% of coding is debugging, the other 10% is writing bugs.
AndrewGrant
Posts: 1750
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: OpenBench question

Post by AndrewGrant »

Opening selection is randomly. Which does mean lots of duplicate openings get played in the long run.

I have never tested what you proposed. Really you need two sets of tests:
1. First play a set of N games between two engines using the same openings, 5 times.
2. Then play a set of N randomly selected openings each time, 5 times.

I would hope that you get similar results. Sorta the basis for how all testing right now is done. I'm still surprised SPRT does what it does.

--

As an aside, you've reminded me that I need to convert those PGNs to EPDs. The Cutechess version I was using at the time did not have support for EPD. After getting custom builds to fix a castling bug, EPDs are supported and I can ensure that Clients can run the EPDs. Which are smaller, and would have saved me from compressing the opening books in order to by-pass filesize limits on Github.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: OpenBench question

Post by Rebel »

AndrewGrant wrote: Wed Apr 14, 2021 12:16 am Opening selection is randomly. Which does mean lots of duplicate openings get played in the long run.

I have never tested what you proposed. Really you need two sets of tests:
1. First play a set of N games between two engines using the same openings, 5 times.
2. Then play a set of N randomly selected openings each time, 5 times.

I would hope that you get similar results. Sorta the basis for how all testing right now is done. I'm still surprised SPRT does what it does.
Yes.

Are you going to do it on OpenBench?, else I will when my PC's are free.
--

As an aside, you've reminded me that I need to convert those PGNs to EPDs. The Cutechess version I was using at the time did not have support for EPD. After getting custom builds to fix a castling bug, EPDs are supported and I can ensure that Clients can run the EPDs. Which are smaller, and would have saved me from compressing the opening books in order to by-pass filesize limits on Github.
I shuffled the 1.8 million positions and split them into 18 parts of 100,000. Download - http://rebel13.nl/dump/noob.7z

Pick a set, test it sequential. Pick another set, test it sequential and compare if results match better.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: OpenBench question

Post by Rebel »

Rebel wrote: Wed Apr 14, 2021 8:26 am I shuffled the 1.8 million positions and split them into 18 parts of 100,000. Download - http://rebel13.nl/dump/noob.7z

Pick a set, test it sequential. Pick another set, test it sequential and compare if results match better.
I did both, first: - 10,000 games random, 10 times divided over 2 PC's. TC=40/10, EPD=noob.epd (1.8 million)

Code: Select all

                            RANDOM                                     
         Intel i7 3.6 Ghz               Intel i7 3.2 Ghz
Round   Ethereal  Rubichess    Round   Ethereal  Rubichess       
  1      *49.3%*   50.7%        1       50.4%     49.6%
  2       49.9%    50.1%        2       50.6%     49.4%
  3       49.8%    50.2%        3       50.1%     49.9%
  4       49.9%    50.1%        4       50.5%     49.5%
  5      *50.3%*   49.7%        5       50.8%     49.2% 
* after 10 runs one match already gave a 1% (=7 elo) difference.

Second - 10,000 games sequential, 10 times divided over 2 PC's. TC=40/10, EPD=noob.001.epd (100,000 shuffled)

Code: Select all

                           SEQUENTIAL                                     
         Intel i7 3.6 Ghz               Intel i7 3.2 Ghz
Round   Ethereal  Rubichess    Round   Ethereal  Rubichess       
  1       50.0%    50.0%         1       50.5%    49.5%
  2       50.3%    49.7%         2      *51.1%*   48.9%
  3       49.4%    50.6%         3      *50.1%*   49.9%
  4                              4
  5                              5
* is as bad as random, after 6 runs one match already gave a 1% (=7 elo) difference.

What have I proven? That 10,000 games is not enough for a reliable test, but that is old news.

BTW, I used the latest versions of Ethereal and Rubi.
90% of coding is debugging, the other 10% is writing bugs.