The scaling with time of opening books

Dan Cooper · Post by **Dan Cooper** » Sat Sep 24, 2016 9:55 pm

Laskos wrote:
Dann Corbit wrote:Stockfish also greatly benefits from a book.
My quick stockfish variant test shows that Brainfish outperformed all the other versions, even though the only difference was the brainfish book.
Code: Select all
  Program                          Elo    +   -   Games   Score   Av.Op.  Draws

1 BrainFish_160914_x64_modern    &#58; 3161   21  20   500    75.5 %   2965   48.6 %
2 Cfish-modern                   &#58; 2989   14  14   500    48.5 %   3000   78.2 %
3 Cfish-vanilla                  &#58; 2987   13  13   500    48.1 %   3000   81.0 %
4 Cfish                          &#58; 2986   13  14   500    48.0 %   3000   80.0 %
5 Stockfish-x64-mingw            &#58; 2964   14  14   500    44.1 %   3005   78.2 %
6 Matefinder-x64-mingw           &#58; 2913   15  16   500    35.8 %   3015   70.8 %
So about 170 Elo for a book at time control 1 minute + 1 second time increment with 10 threads at 3.2 GHz.
Completely confirmed here. I took the same BookX.bin, because I don't know how to randomize Cerebellum book for variety, it doesn't like "srand" switch in Cutechess-Cli, and might play always the same winning openings against a "naked" Stockfish without a book. Cerebellum book as used by Brainfish has more knowledge than usual "bin" book with Stockfish. So, with BookX.bin, the results are even more conclusive than with Komodo, the improvement is large, and even significantly more to longer time control:

ELO difference:

20s+0.2s --> 98.1
360s+3.6s --> 132.7
Code: Select all
20s+0.2s
Score of SF2 vs SF1&#58; 138 - 28 - 234  &#91;0.637&#93; 400
ELO difference&#58; 98.07 +/- 21.46

360s+3.6s
Score of SF2 vs SF1&#58; 55 - 4 - 81  &#91;0.682&#93; 140
ELO difference&#58; 132.66 +/- 35.78
Practically, at longer time control, Stockfish no book has difficulty winning at all against Stockfish with BookX, losing 40% of the games. The doubling in time is about 80 ELO at this time control, meaning that effective time gain is a factor of 3, or 200%. 20% time saved with a book cannot account for 200% improvement just by itself, the book has other effects too (like simply overplaying the no book SF).

Can you see the average amount of time bookless K and SF use in the opening book stage?

It would be interesting to see that SF is using much more time than K in the opening and achieving much worse results.

Laskos · Post by **Laskos** » Sun Sep 25, 2016 8:53 am

Guenther wrote:
Laskos wrote:
Uri Blass wrote:
Laskos wrote:There is not much talk here about the importance of opening books, although online events and WCCC are heavily dependent on them. I was curious a bit about their impact. I saw two dismissive opinions made here:

1/ Opening books only save time.
2/ With increased time control their importance diminishes, as engines themselves become better at openings.

I took a good (and small) Polyglot opening book BookX.bin by Adam Hair with Komodo and played at different time controls against Komodo no book. Games are from ultra-fast 20s+0.2s to blitz 360s+3.6s on one core. The results seem to contradict both 1/ and 2/.

ELO difference:

20s+0.2s --> 61.7
60s+0.6s --> 59.6
180s+1.8s --> 70.4
360s+3.6s --> 79.5
Code: Select all
20s+0.2s
Score of K2 vs K1&#58; 105 - 34 - 261  &#91;0.589&#93; 400
ELO difference&#58; 61.68 +/- 20.17

60s+0.6s
Score of K2 vs K1&#58; 109 - 41 - 250  &#91;0.585&#93; 400
ELO difference&#58; 59.64 +/- 20.62

180s+1.8s
Score of K2 vs K1&#58; 20 - 4 - 56  &#91;0.600&#93; 80
ELO difference&#58; 70.44 +/- 40.59

360s+3.6s
Score of K2 vs K1&#58; 22 - 4 - 54  &#91;0.613&#93; 80
ELO difference&#58; 79.53 +/- 42.10
Opening books do indeed save time, about 15-20% from total time used. But the gain in the case 360s+3.6s is about equal to doubling in time (100% difference), which is about 80 ELO points for Komodo at this blitz time control. So, book does more than saving time. Also, the importance of the book with longer time control seems to increase, not decrease.
I think that the time control was not long enough for the importance to go down.

Common sense tell me that if the engines can find the right moves by themselves they do not need book so it is obvious that the importance should go down.

I guess that 360+3.6 time control is not long enough for the importance to go down but I believe that the importance go down if you use TCEC time control.

If the time control is long enough books can be even counter productive because engine may find better moves than the book moves.
It's nice and easy to make unverifiable and unfalsifiable claims. May it happen that all engines misplay the openings given them even almost infinite time control? As of now the verified claim is that book factor is important and increases with longer time control, you better present some experimental rebuttal or a solid theory behind the claim.
Kai, why this harsh answer? I have the same common sense, that at long time controls which are of course much higher than 6m+3.6s (which is still a short time control) there will be a turning point.

It might happen, and in fact at 3600s + 36s in 60 games with Stockfish, I got a less impressive +9 -0 =51 for the book BookX.bin. Still some 50+ ELO points for the book, and no-book Stockfish seems to not be able to win a single game.

It cannot be proven without immense time/work, but there are indications from games like TCEC and SSDF or others, when books produced moves which were weaker than the programs moves, if they had played for themselves.
It also depends of course on book length. The more moves are in a book the more the chance for non optimal moves.

TCEC book is a book for variety, not a competitional one. Also, there is a misconception that competitive opening books are sequences of best 1-movers. False, the important thing is how you exit the book after the whole sequence. It might be that most of the moves from a good, competitive book that an engine finds sub-optimal are poorly understood by the engine sequences (even at very long TC).

BTW even your test proves not too much so far, because it is based only on 80 games for time controls above 60s+0.6s.

Based on both Komodo and Stockfish results, it is beyond error margins that book increases importance from 20+0.2 to 360+3.6. I have this latest result which shows a diminishing importance to 3600 + 36, but it is still unclear that the engine alone can equalize the matters against the opening book. Even this LTC test shows that naked Stockfish cannot win a game in 60 against BookX Stockfish.

And a last word for the Brainfish test. This cannot be generalized, because the brainfish book is exclusively (and intensively) tuned against/with SF,
which means it does best against SF. (Also its depth is much larger than in your book test - up to ply 50 IIRC)

We can do limited tests up to certain depths and for certain time controls,
but this should also include a few more programs.

Anyway this kind of test is always interesting. Did you mention the depth of that BookX? I cannot find a number.

Thanks for your tests.

Laskos · Post by **Laskos** » Sun Sep 25, 2016 9:00 am

Dan Cooper wrote:
Laskos wrote:
Dann Corbit wrote:Stockfish also greatly benefits from a book.
My quick stockfish variant test shows that Brainfish outperformed all the other versions, even though the only difference was the brainfish book.
Code: Select all
  Program                          Elo    +   -   Games   Score   Av.Op.  Draws

1 BrainFish_160914_x64_modern    &#58; 3161   21  20   500    75.5 %   2965   48.6 %
2 Cfish-modern                   &#58; 2989   14  14   500    48.5 %   3000   78.2 %
3 Cfish-vanilla                  &#58; 2987   13  13   500    48.1 %   3000   81.0 %
4 Cfish                          &#58; 2986   13  14   500    48.0 %   3000   80.0 %
5 Stockfish-x64-mingw            &#58; 2964   14  14   500    44.1 %   3005   78.2 %
6 Matefinder-x64-mingw           &#58; 2913   15  16   500    35.8 %   3015   70.8 %
So about 170 Elo for a book at time control 1 minute + 1 second time increment with 10 threads at 3.2 GHz.
Completely confirmed here. I took the same BookX.bin, because I don't know how to randomize Cerebellum book for variety, it doesn't like "srand" switch in Cutechess-Cli, and might play always the same winning openings against a "naked" Stockfish without a book. Cerebellum book as used by Brainfish has more knowledge than usual "bin" book with Stockfish. So, with BookX.bin, the results are even more conclusive than with Komodo, the improvement is large, and even significantly more to longer time control:

ELO difference:

20s+0.2s --> 98.1
360s+3.6s --> 132.7
Code: Select all
20s+0.2s
Score of SF2 vs SF1&#58; 138 - 28 - 234  &#91;0.637&#93; 400
ELO difference&#58; 98.07 +/- 21.46

360s+3.6s
Score of SF2 vs SF1&#58; 55 - 4 - 81  &#91;0.682&#93; 140
ELO difference&#58; 132.66 +/- 35.78
Practically, at longer time control, Stockfish no book has difficulty winning at all against Stockfish with BookX, losing 40% of the games. The doubling in time is about 80 ELO at this time control, meaning that effective time gain is a factor of 3, or 200%. 20% time saved with a book cannot account for 200% improvement just by itself, the book has other effects too (like simply overplaying the no book SF).
Can you see the average amount of time bookless K and SF use in the opening book stage?

It would be interesting to see that SF is using much more time than K in the opening and achieving much worse results.

I cannot see that, but based on my experience, the amount of time an engine spends in the opening book stage is some 20%. It might be that Komodo spends little less than Stockfish, but this is most striking at TCEC and less in blitz games.

Guenther · Post by **Guenther** » Sun Sep 25, 2016 9:28 am

Laskos wrote:Based on both Komodo and Stockfish results, it is beyond error margins that book increases importance from 20+0.2 to 360+3.6. I have this latest result which shows a diminishing importance to 3600 + 36, but it is still unclear that the engine alone can equalize the matters against the opening book. Even this LTC test shows that naked Stockfish cannot win a game in 60 against BookX Stockfish.

I have two questions. One I asked already before. How 'small' in plies is that BookX really? I did only find one BookX with unknown origin for download,
which in fact was quite deep and over 11MB which is big for Polyglot. I also read somewhere it was handtuned for a longer period.

How did you randomize the opening moves for the _no book_ version? Otherwise you don't test the book but only a very few lines.

I started another booktest for myself and instead of playing against a no book version I provided an extremely small 4 plies (only common opening moves, booksize = 9KB)
for the no book to get enough variety to really test the book version.

Laskos · Post by **Laskos** » Sun Sep 25, 2016 10:08 am

Guenther wrote:
Laskos wrote:Based on both Komodo and Stockfish results, it is beyond error margins that book increases importance from 20+0.2 to 360+3.6. I have this latest result which shows a diminishing importance to 3600 + 36, but it is still unclear that the engine alone can equalize the matters against the opening book. Even this LTC test shows that naked Stockfish cannot win a game in 60 against BookX Stockfish.
I have two questions. One I asked already before. How 'small' in plies is that BookX really? I did only find one BookX with unknown origin for download,
which in fact was quite deep and over 11MB which is big for Polyglot. I also read somewhere it was handtuned for a longer period.

How did you randomize the opening moves for the _no book_ version? Otherwise you don't test the book but only a very few lines.

I started another booktest for myself and instead of playing against a no book version I provided an extremely small 4 plies (only common opening moves, booksize = 9KB)
for the no book to get enough variety to really test the book version.

I don't think BookX.bin has a certain depth, some lines are shallow, other deep. It's not that big, there are 70+ MB bin books. I randomized just the book with varying seed "srand" command for each run, the no book Stockfish is not randomized. This might be a problem, let's see your results. I might too try to test with bookdpeth=2 option for a broad book like MyFriends.bin as a "no-book" engines.

Guenther · Post by **Guenther** » Sun Sep 25, 2016 11:01 am

Laskos wrote:
Guenther wrote:
Laskos wrote:Based on both Komodo and Stockfish results, it is beyond error margins that book increases importance from 20+0.2 to 360+3.6. I have this latest result which shows a diminishing importance to 3600 + 36, but it is still unclear that the engine alone can equalize the matters against the opening book. Even this LTC test shows that naked Stockfish cannot win a game in 60 against BookX Stockfish.
I have two questions. One I asked already before. How 'small' in plies is that BookX really? I did only find one BookX with unknown origin for download,
which in fact was quite deep and over 11MB which is big for Polyglot. I also read somewhere it was handtuned for a longer period.

How did you randomize the opening moves for the _no book_ version? Otherwise you don't test the book but only a very few lines.

I started another booktest for myself and instead of playing against a no book version I provided an extremely small 4 plies (only common opening moves, booksize = 9KB)
for the no book to get enough variety to really test the book version.
I don't think BookX.bin has a certain depth, some lines are shallow, other deep. It's not that big, there are 70+ MB bin books. I randomized just the book with varying seed "srand" command for each run, the no book Stockfish is not randomized. This might be a problem, let's see your results. I might too try to test with bookdpeth=2 option for a broad book like MyFriends.bin as a "no-book" engines.

I think it should have a max depth set during the make process.
If it is this:

Code: Select all

14,9 MB &#40;15.716.240 Bytes&#41;

I can check for myself. (There is a Polyglot dump tool which extracts all lines to ascii)

You can have my 4plies book if you like. We can compare our data later.
I try to play 1000 games against various time controls. The first test started with 20+0.25 and is nearly finished.
That was against my old medium (general) book, which has max 30 plies and was a bit hand tuned too, but it also leaves sometimes early,
when uncommon lines are played, which had not enough games or too low score percentage to make it into the book during the make process.

From the results so far the difference is smaller than expected. I guess the average depth of leaving this book is too small and the other conclusion,
which I share with you is, the time to gain is too small at that tc.
BTW due to testing under WB I can provide data about average leaving book score and depth.
(The annotation tag, which is 'abused', since I wished that info somewhere at the time A.Scotti changed old WB)
Do you have some book leaving data too?

I will do a second test with the lately updated Noomen2016.ctg via the Aquarium book adapter under the same conditions.

Laskos · Post by **Laskos** » Sun Sep 25, 2016 4:37 pm

Guenther wrote:
Laskos wrote:
Guenther wrote:
Laskos wrote:Based on both Komodo and Stockfish results, it is beyond error margins that book increases importance from 20+0.2 to 360+3.6. I have this latest result which shows a diminishing importance to 3600 + 36, but it is still unclear that the engine alone can equalize the matters against the opening book. Even this LTC test shows that naked Stockfish cannot win a game in 60 against BookX Stockfish.
I have two questions. One I asked already before. How 'small' in plies is that BookX really? I did only find one BookX with unknown origin for download,
which in fact was quite deep and over 11MB which is big for Polyglot. I also read somewhere it was handtuned for a longer period.

How did you randomize the opening moves for the _no book_ version? Otherwise you don't test the book but only a very few lines.

I started another booktest for myself and instead of playing against a no book version I provided an extremely small 4 plies (only common opening moves, booksize = 9KB)
for the no book to get enough variety to really test the book version.
I don't think BookX.bin has a certain depth, some lines are shallow, other deep. It's not that big, there are 70+ MB bin books. I randomized just the book with varying seed "srand" command for each run, the no book Stockfish is not randomized. This might be a problem, let's see your results. I might too try to test with bookdpeth=2 option for a broad book like MyFriends.bin as a "no-book" engines.
I think it should have a max depth set during the make process.
If it is this:
Code: Select all
14,9 MB &#40;15.716.240 Bytes&#41;

Yes, same.

Code: Select all

14.9 MB &#40;15,716,240 bytes&#41;

I can check for myself. (There is a Polyglot dump tool which extracts all lines to ascii)

You can have my 4plies book if you like. We can compare our data later.
I try to play 1000 games against various time controls. The first test started with 20+0.25 and is nearly finished.
That was against my old medium (general) book, which has max 30 plies and was a bit hand tuned too, but it also leaves sometimes early,
when uncommon lines are played, which had not enough games or too low score percentage to make it into the book during the make process.

From the results so far the difference is smaller than expected. I guess the average depth of leaving this book is too small and the other conclusion,
which I share with you is, the time to gain is too small at that tc.
BTW due to testing under WB I can provide data about average leaving book score and depth.
(The annotation tag, which is 'abused', since I wished that info somewhere at the time A.Scotti changed old WB)
Do you have some book leaving data too?

I will do a second test with the lately updated Noomen2016.ctg via the Aquarium book adapter under the same conditions.

I played Book X against bookdepth=2 (Cutechess-CLI option) generic book Formula12.bin. It is 12 move generic book built from CCRL games. I got the link from the Rybka forum, https://drive.google.com/open?id=0B4pL2 ... 29BWXFscEk
First results seem different from those with "naked" Stockfish:

Code: Select all

20s+0.2s
Score of SF2 vs SF1&#58; 142 - 18 - 240  &#91;0.655&#93; 400
ELO difference&#58; 111.37 +/- 20.77

360s+3.6s
Score of SF2 vs SF1&#58; 23 - 6 - 131  &#91;0.553&#93; 160
ELO difference&#58; 37.05 +/- 22.45

Now I am testing at 3600s+36s, sparse results probably tomorrow. Apparently, now the importance of the book diminishes with time control. But I have to check if everything is ok.

I don't have any book leaving info. Your data on that would be important. I can imagine, if you are able to have that, maybe a faster testing would be just to play games until the BookX exits and check only scores. The time scaling of exiting scores might go in parallel with outcomes, but with much shorter "games".

jdart · Post by **jdart** » Sun Sep 25, 2016 5:15 pm

there are indications from games like TCEC and SSDF or others, when books produced moves which were weaker than the programs moves, if they had played for themselves.

TCEC does not use the engines's own book, but SSDF does. I see many examples where the first move out of book by an engine is a bad one, based on known opening theory. Don't dismiss theory because it is a human construct. It includes for example moves strong correspondence players prefer: they have days to select a move, with computer assistance.

I think engines in general still play the opening poorly. They will find tactical shots if there is one. But many openings involve steering the game into one of a large set of possible strategic configurations. It is hard to look forward and see through this maze. The advantage of opening theory is that you have a body of previous games that allow you to look backwards from the outcome of the game and the endgame that led up to it and select moves that lead from the starting point into a favorable result.

--Jon

Guenther · Post by **Guenther** » Sun Sep 25, 2016 8:01 pm

Code: Select all

StockfishASM_150916NB-64 => TestNB.bin
StockfishASM_150916-64 => GS_medium.bin
StockfishASM_150916AB-64 => Noomen2016.ctg via Aquarium bookadapter

Book details&#58;
----------------
GS_medium.bin = general book, a bit handtuned, max depth 30 plies, 3.8MB

TestNB.bin = just a general 4 plies book for variation/randomization for the 'NB' engine &#40;avoid dupes and only few lines&#41;, 9KB

Noomen2016.ctg = set to max 80 plies, tuned ex-commercial book, 1.05GB! &#40;one caveat is that some settings cannot be set in the book adapter,
e.g. min games, but at least it seems to play only marked tournament moves&#41;

Time control 20s+0.25s, 1000 games, 2 cores, Ponder Off, 256MB (per core it seems? Taskmanager says so), Syzygy 5 men, WB 4.80b, adjudication at move 120, manual result correction (BTW no time losses)

Code: Select all

   # PLAYER                      &#58;   RATING  ERROR  POINTS  PLAYED   (%)
   1 StockfishASM_150916-64      &#58;  3208.53   4.97   524.0    1000  52.4
   2 StockfishASM_150916NB-64    &#58;  3191.47   4.97   476.0    1000  47.6

White advantage = 40.94 +/- 5.23
Draw rate &#40;equal opponents&#41; = 79.67 % +/- 1.26

Head to head statistics&#58;

1&#41; StockfishASM_150916-64   3208.53 &#58;   1000 (+140,=768,-92&#41;,  52.4 %

   vs.                              &#58;  games (   +,   =,  -),   (%) &#58;     Diff,     SD, CFS (%)
   StockfishASM_150916NB-64         &#58;   1000 ( 140, 768, 92&#41;,  52.4 &#58;   +17.07,   5.07,  100.0

Same conditions as above, but in progress, 411 games played, no manual correction yet

Code: Select all

   # PLAYER                      &#58;   RATING  ERROR  POINTS  PLAYED   (%)
   1 StockfishASM_150916AB-64    &#58;  3247.81   9.02   259.0     411  63.0
   2 StockfishASM_150916NB-64    &#58;  3152.19   9.02   152.0     411  37.0

White advantage = 52.87 +/- 9.37
Draw rate &#40;equal opponents&#41; = 76.32 % +/- 2.37

Head to head statistics&#58;

1&#41; StockfishASM_150916AB-64 3247.81 &#58;    411 (+125,=268,-18&#41;,  63.0 %

   vs.                              &#58;  games (   +,   =,  -),   (%) &#58;     Diff,     SD, CFS (%)
   StockfishASM_150916NB-64         &#58;    411 ( 125, 268, 18&#41;,  63.0 &#58;   +95.61,   9.21,  100.0

-17 for NB SFAsm against GS_medium
-100(ca.) against Noomen2016
More detailed data in the next days.

Ozymandias · Post by **Ozymandias** » Mon Sep 26, 2016 9:26 am

Laskos wrote:
Uri Blass wrote:
Laskos wrote:There is not much talk here about the importance of opening books, although online events and WCCC are heavily dependent on them. I was curious a bit about their impact. I saw two dismissive opinions made here:

1/ Opening books only save time.
2/ With increased time control their importance diminishes, as engines themselves become better at openings.

I took a good (and small) Polyglot opening book BookX.bin by Adam Hair with Komodo and played at different time controls against Komodo no book. Games are from ultra-fast 20s+0.2s to blitz 360s+3.6s on one core. The results seem to contradict both 1/ and 2/.

ELO difference:

20s+0.2s --> 61.7
60s+0.6s --> 59.6
180s+1.8s --> 70.4
360s+3.6s --> 79.5
Code: Select all
20s+0.2s
Score of K2 vs K1&#58; 105 - 34 - 261  &#91;0.589&#93; 400
ELO difference&#58; 61.68 +/- 20.17

60s+0.6s
Score of K2 vs K1&#58; 109 - 41 - 250  &#91;0.585&#93; 400
ELO difference&#58; 59.64 +/- 20.62

180s+1.8s
Score of K2 vs K1&#58; 20 - 4 - 56  &#91;0.600&#93; 80
ELO difference&#58; 70.44 +/- 40.59

360s+3.6s
Score of K2 vs K1&#58; 22 - 4 - 54  &#91;0.613&#93; 80
ELO difference&#58; 79.53 +/- 42.10
Opening books do indeed save time, about 15-20% from total time used. But the gain in the case 360s+3.6s is about equal to doubling in time (100% difference), which is about 80 ELO points for Komodo at this blitz time control. So, book does more than saving time. Also, the importance of the book with longer time control seems to increase, not decrease.
I think that the time control was not long enough for the importance to go down.

Common sense tell me that if the engines can find the right moves by themselves they do not need book so it is obvious that the importance should go down.

I guess that 360+3.6 time control is not long enough for the importance to go down but I believe that the importance go down if you use TCEC time control.

If the time control is long enough books can be even counter productive because engine may find better moves than the book moves.
It's nice and easy to make unverifiable and unfalsifiable claims. May it happen that all engines misplay the openings given them even almost infinite time control? As of now the verified claim is that book factor is important and increases with longer time control, you better present some experimental rebuttal or a solid theory behind the claim.

It's one of the most difficult things to ascertain, and probably the most decisive factor in successful centaur play. When to start trusting your engines and stop following your book/database, is an art all in itself. You'll never be able to test that properly.

The scaling with time of opening books

Re: The scaling with time of opening books

Re: The scaling with time of opening books

Re: The scaling with time of opening books

Re: The scaling with time of opening books

Re: The scaling with time of opening books

Re: The scaling with time of opening books

Re: The scaling with time of opening books

Re: The scaling with time of opening books

Re: The scaling with time of opening books

Re: The scaling with time of opening books