OpenBench question

Rebel · Post by **Rebel** » Tue Apr 13, 2021 11:34 am

I downloaded the 4moves_noob.pgn opening set, it contains 1.8 million [!] positions. What me wonder, when you start a match with this opening suite, are the distributed positions random or sequential?

noobpwnftw · Post by **noobpwnftw** » Tue Apr 13, 2021 10:37 pm

https://github.com/AndyGrant/OpenBench/ ... nt.py#L326

Per code, opening selection is random.

Rebel · Post by **Rebel** » Tue Apr 13, 2021 11:38 pm

Random seems to me the best option indeed as I noticed that sequential chunks are from the same game. OTOH, since matches are not equal regarding the openings, has it been tried to run the same match of (say) 20,000 games between Ethereal and Rubichess 4-5 times and that the results are about equal?

AndrewGrant · Post by **AndrewGrant** » Wed Apr 14, 2021 12:16 am

Opening selection is randomly. Which does mean lots of duplicate openings get played in the long run.

I have never tested what you proposed. Really you need two sets of tests:
1. First play a set of N games between two engines using the same openings, 5 times.
2. Then play a set of N randomly selected openings each time, 5 times.

I would hope that you get similar results. Sorta the basis for how all testing right now is done. I'm still surprised SPRT does what it does.

--

As an aside, you've reminded me that I need to convert those PGNs to EPDs. The Cutechess version I was using at the time did not have support for EPD. After getting custom builds to fix a castling bug, EPDs are supported and I can ensure that Clients can run the EPDs. Which are smaller, and would have saved me from compressing the opening books in order to by-pass filesize limits on Github.

Rebel · Post by **Rebel** » Wed Apr 14, 2021 8:26 am

AndrewGrant wrote: ↑Wed Apr 14, 2021 12:16 am Opening selection is randomly. Which does mean lots of duplicate openings get played in the long run.

I have never tested what you proposed. Really you need two sets of tests:
1. First play a set of N games between two engines using the same openings, 5 times.
2. Then play a set of N randomly selected openings each time, 5 times.

I would hope that you get similar results. Sorta the basis for how all testing right now is done. I'm still surprised SPRT does what it does.

Yes.

Are you going to do it on OpenBench?, else I will when my PC's are free.

--

As an aside, you've reminded me that I need to convert those PGNs to EPDs. The Cutechess version I was using at the time did not have support for EPD. After getting custom builds to fix a castling bug, EPDs are supported and I can ensure that Clients can run the EPDs. Which are smaller, and would have saved me from compressing the opening books in order to by-pass filesize limits on Github.

I shuffled the 1.8 million positions and split them into 18 parts of 100,000. Download - http://rebel13.nl/dump/noob.7z

Pick a set, test it sequential. Pick another set, test it sequential and compare if results match better.

Rebel · Post by **Rebel** » Sat Apr 17, 2021 10:26 am

Rebel wrote: ↑Wed Apr 14, 2021 8:26 am I shuffled the 1.8 million positions and split them into 18 parts of 100,000. Download - http://rebel13.nl/dump/noob.7z

Pick a set, test it sequential. Pick another set, test it sequential and compare if results match better.

I did both, first: - 10,000 games random, 10 times divided over 2 PC's. TC=40/10, EPD=noob.epd (1.8 million)

Code: Select all

                            RANDOM                                     
         Intel i7 3.6 Ghz               Intel i7 3.2 Ghz
Round   Ethereal  Rubichess    Round   Ethereal  Rubichess       
  1      *49.3%*   50.7%        1       50.4%     49.6%
  2       49.9%    50.1%        2       50.6%     49.4%
  3       49.8%    50.2%        3       50.1%     49.9%
  4       49.9%    50.1%        4       50.5%     49.5%
  5      *50.3%*   49.7%        5       50.8%     49.2%

* after 10 runs one match already gave a 1% (=7 elo) difference.

Second - 10,000 games sequential, 10 times divided over 2 PC's. TC=40/10, EPD=noob.001.epd (100,000 shuffled)

Code: Select all

                           SEQUENTIAL                                     
         Intel i7 3.6 Ghz               Intel i7 3.2 Ghz
Round   Ethereal  Rubichess    Round   Ethereal  Rubichess       
  1       50.0%    50.0%         1       50.5%    49.5%
  2       50.3%    49.7%         2      *51.1%*   48.9%
  3       49.4%    50.6%         3      *50.1%*   49.9%
  4                              4
  5                              5

* is as bad as random, after 6 runs one match already gave a 1% (=7 elo) difference.

What have I proven? That 10,000 games is not enough for a reliable test, but that is old news.

BTW, I used the latest versions of Ethereal and Rubi.

OpenBench question

OpenBench question

Re: OpenBench question

Re: OpenBench question

Re: OpenBench question

Re: OpenBench question

Re: OpenBench question