ICGA's 2015 World Computer Chess Championship/Events

jdart · Post by **jdart** » Fri Feb 27, 2015 3:47 pm

Personally I think opening books are important. The shallow fixed book competitions have their place and they do provide a uniform platform for testing. But sometimes even good engines quickly go into known sub-optimal lines. Opening theory is very deep in some areas now, especially with correspondence play that is engine-aided. Search during a game does not easily replace this accumulated knowledge, especially in the early opening stages where the end points of known lines are far away.

Some development teams (Hiarcs in particular) have put a huge amount of effort into book development including finding strong novelties. However, I know some engine authors don't care about books and don't want to spend any effort there.

--Jon

Laskos · Post by **Laskos** » Fri Feb 27, 2015 5:04 pm

jdart wrote:Personally I think opening books are important. The shallow fixed book competitions have their place and they do provide a uniform platform for testing. But sometimes even good engines quickly go into known sub-optimal lines. Opening theory is very deep in some areas now, especially with correspondence play that is engine-aided. Search during a game does not easily replace this accumulated knowledge, especially in the early opening stages where the end points of known lines are far away.

Some development teams (Hiarcs in particular) have put a huge amount of effort into book development including finding strong novelties. However, I know some engine authors don't care about books and don't want to spend any effort there.

--Jon

People not involved directly tend to underestimate the importance of books. This night I tested a newer polyglot book than those I had. Komodo 8 in Round-Robin:

Code: Select all

   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 Oops.bin           &#58; 3066.3    1221.0    2000   61.0%
   2 K8 RpC.bin            &#58; 3033.2    1111.5    2000   55.6%
   3 K8 Komodo.bin         &#58; 3003.9    1013.0    2000   50.6%
   4 K8 SF book.bin        &#58; 2992.9     976.0    2000   48.8%
   5 K8 Performance.bin    &#58; 2977.5     924.0    2000   46.2%
   6 K8 No Book            &#58; 2926.2     754.5    2000   37.7%

The newer book, Oops.bin, added 33 Elo points to what little I tested before. Komodo 8 own book performs 63 Elo points below, showing that the authors don't care much. Book.bin of Stockfish is also weak. And this compared to generalist books like Oops.bin and RpC.bin. There are books which specialize for and against a particular engine. And there are book fights, for example even newer MyFriend.bin is tuned against Oops.bin, and beats it in direct much using both Stockfish and K8.

There is so much complexity (and importance strength-wise) in book building, that it's a serious profession, some involved with Playchess Engine Room spend several hours a day working on them. And the books age in a matter of days. After some 2 weeks, a top book might slide to be significantly weaker than the new ones. Here is an extract from "BooksWar" recent rating list using Stockfish, a list which includes some powerful CTG books:

Code: Select all

   # PLAYER                        &#58; RATING    POINTS  PLAYED    (%)
   1 Rising Star.ctg               &#58; 3387.7     676.5    1140   59.3%
   2 Book X.bin                    &#58; 3359.0     287.5     540   53.2%
   3 PlaychessNightmare.ctg        &#58; 3355.0     284.0     540   52.6%
   4 Anaconda.ctg                  &#58; 3337.6     268.5     540   49.7%
   5 Invisible Stars.ctg           &#58; 3336.7     580.0    1140   50.9%
   6 Hiarcs 14l.ctg                &#58; 3333.7     265.0     540   49.1%
   7 King Asad Vipre.ctg           &#58; 3322.0     650.0    1200   54.2%
   8 Sicilian Avenged.ctg          &#58; 3320.2     646.5    1200   53.9%
   9 IPmanbook.ctg                 &#58; 3301.4     276.5     600   46.1%
  10 Stormworm 5.0.bin             &#58; 3281.9     312.5     600   52.1%
  11 Rodent.bin                    &#58; 3279.6     254.5     600   42.4%
  12 My Friends 10.1.bin           &#58; 3264.6     204.5     540   37.9%
   .............

   .............
  78 Monks 1.4.ctg                 &#58; 2745.7     516.5    1050   49.2%
  79 RedKep.ctg                    &#58; 2738.0     249.5     450   55.4%
  80 AMALIA frANCINE.ctg           &#58; 2737.9     194.0     420   46.2%
  81 The Sniper 6.ctg              &#58; 2729.7     503.0    1020   49.3%
  82 Amalia Power.ctg              &#58; 2724.4     309.0     600   51.5%
  83 Doom Re-Born.ctg              &#58; 2719.3     181.5     420   43.2%
  84 Boom Lite 3.ctg               &#58; 2678.9     199.5     450   44.3%
  85 Doom 3 revisited.ctg          &#58; 2673.6     195.0     450   43.3%
  86 Doom FX.ctg                   &#58; 2670.9     253.0     600   42.2%
  87 Rock 7.ctg                    &#58; 2662.7     244.5     600   40.8%

The books using Stockfish vary the strength by some amazing 700 Elo points. Observe there, on the top, Hiarcs 14l.ctg, Hiarcs team puts a lot of effort on their book, and their book version is labeled "l" because they updated it many times to keep its competitiveness.

Laskos · Post by **Laskos** » Fri Feb 27, 2015 8:54 pm

Milos wrote: Also there is plenty of good books available so even +/- 50 Elo range on books is improbable.
Your assumptions are simply grossly exaggerated.

You seem to know little about opening books. One cannot just pick a "good book" and think it will hold against anything. With books it often happens that A>B>C>A, and I will exemplify by some tests here:

Suppose one has a new book MyFriends.bin, performing as in the following test:

Code: Select all

   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 MyFriends.bin      &#58; 3057.6    1192.5    2000   59.6%
   2 K8 RpC.bin            &#58; 3033.2    1111.5    2000   55.6%
   3 K8 Komodo.bin         &#58; 3005.1    1017.0    2000   50.9%
   4 K8 SF Book.bin        &#58; 2992.8     975.5    2000   48.8%
   5 K8 Performance.bin    &#58; 2984.8     948.5    2000   47.4%
   6 K8 No Book            &#58; 2926.5     755.0    2000   37.8%

And another new book, Oops.bin, against the same opponents:

Code: Select all

   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 Oops.bin           &#58; 3066.3    1221.0    2000   61.0%
   2 K8 RpC.bin            &#58; 3033.2    1111.5    2000   55.6%
   3 K8 Komodo.bin         &#58; 3003.9    1013.0    2000   50.6%
   4 K8 SF Book.bin        &#58; 2992.9     976.0    2000   48.8%
   5 K8 Performance.bin    &#58; 2977.5     924.0    2000   46.2%
   6 K8 No Book            &#58; 2926.2     754.5    2000   37.7%

You will conclude that Oops.bin is better by 9 Elo points, and pick that book. But MyFriends.bin was specifically tuned against Oops.bin, and the direct match-up is the following:

Code: Select all

   # PLAYER              &#58; RATING    POINTS  PLAYED    (%)
   1 K8 MyFriends.bin    &#58; 3022.5    1127.5    2000   56.4%
   2 K8 Oops.bin         &#58; 2977.5     872.5    2000   43.6%

So you picked a book which loses to another book by 45 Elo points, but performs better against other opponents.

Hence a variability of at least 50 points is a minimum for books, despite your bragging, as the books are tricky.

Adam Hair · Post by **Adam Hair** » Sat Feb 28, 2015 12:26 am

I can not believe that Book X is ranked that high among the publicly available engines. I created it by playing thousands of games against other books and by adding Playchess and Infinity won games to it. I did not add any lines to it by hand.

Anyway, after the 2 months I spent creating Book X, I agree with Kai. There is a lot of variability associated with books. Transitivity is not a given property.

APassionForCriminalJustic · Sat Feb 28, 2015 4:24 am

Laskos wrote:
Milos wrote: Also there is plenty of good books available so even +/- 50 Elo range on books is improbable.
Your assumptions are simply grossly exaggerated.
You seem to know little about opening books. One cannot just pick a "good book" and think it will hold against anything. With books it often happens that A>B>C>A, and I will exemplify by some tests here:

Suppose one has a new book MyFriends.bin, performing as in the following test:
Code: Select all
   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 MyFriends.bin      &#58; 3057.6    1192.5    2000   59.6%
   2 K8 RpC.bin            &#58; 3033.2    1111.5    2000   55.6%
   3 K8 Komodo.bin         &#58; 3005.1    1017.0    2000   50.9%
   4 K8 SF Book.bin        &#58; 2992.8     975.5    2000   48.8%
   5 K8 Performance.bin    &#58; 2984.8     948.5    2000   47.4%
   6 K8 No Book            &#58; 2926.5     755.0    2000   37.8%
And another new book, Oops.bin, against the same opponents:
Code: Select all
   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 Oops.bin           &#58; 3066.3    1221.0    2000   61.0%
   2 K8 RpC.bin            &#58; 3033.2    1111.5    2000   55.6%
   3 K8 Komodo.bin         &#58; 3003.9    1013.0    2000   50.6%
   4 K8 SF Book.bin        &#58; 2992.9     976.0    2000   48.8%
   5 K8 Performance.bin    &#58; 2977.5     924.0    2000   46.2%
   6 K8 No Book            &#58; 2926.2     754.5    2000   37.7%
You will conclude that Oops.bin is better by 9 Elo points, and pick that book. But MyFriends.bin was specifically tuned against Oops.bin, and the direct match-up is the following:
Code: Select all
   # PLAYER              &#58; RATING    POINTS  PLAYED    (%)
   1 K8 MyFriends.bin    &#58; 3022.5    1127.5    2000   56.4%
   2 K8 Oops.bin         &#58; 2977.5     872.5    2000   43.6%
So you picked a book which loses to another book by 45 Elo points, but performs better against other opponents.

Hence a variability of at least 50 points is a minimum for books, despite your bragging, as the books are tricky.

Kai, what is your opinion surrounding both the 1337 ChessPRO series and the KING ASAD books? I use the 1337 commercial books.

Laskos · Post by **Laskos** » Sat Feb 28, 2015 6:50 am

Adam Hair wrote:I can not believe that Book X is ranked that high among the publicly available engines. I created it by playing thousands of games against other books and by adding Playchess and Infinity won games to it. I did not add any lines to it by hand.

Anyway, after the 2 months I spent creating Book X, I agree with Kai. There is a lot of variability associated with books. Transitivity is not a given property.

Book X is indeed a good book, with some peculiarity of his own. With what little I played, I can show how it behaves:

In general pool of some polyglot books, it is outdone by MyFriends and Oops:

Code: Select all

   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 MyFriends.bin      &#58; 3047.8     574.5     997   57.6%
   2 K8 Oops.bin           &#58; 3034.0     556.0    1003   55.4%
   3 K8 Book X.bin         &#58; 3031.7     549.0     997   55.1%
   4 K8 RpC.bin            &#58; 3012.8     522.0    1003   52.0%
   5 K8 Performance.bin    &#58; 2992.1     485.5     997   48.7%
   6 K8 SF Book.bin        &#58; 2987.1     477.5     997   47.9%
   7 K8 Komodo.bin         &#58; 2977.9     465.5    1003   46.4%
   8 K8 No Book            &#58; 2916.7     367.0     997   36.8%

But it's unbeatable head to head:

Code: Select all

3 K8 Book X.bin             &#58;   31  1750 (+544,=838,-368&#41;, 55.0 %

K8 Komodo.bin                 &#58; 250 (+ 96,=103,- 51&#41;, 59.0 %
K8 SF Book.bin                &#58; 250 (+ 69,=124,- 57&#41;, 52.4 %
K8 Performance.bin            &#58; 250 (+ 69,=124,- 57&#41;, 52.4 %
K8 RpC.bin                    &#58; 250 (+ 82,=108,- 60&#41;, 54.4 %
K8 No Book                    &#58; 250 (+110,=109,- 31&#41;, 65.8 %
K8 MyFriends.bin              &#58; 250 (+ 53,=144,- 53&#41;, 50.0 %
K8 Oops.bin                   &#58; 250 (+ 65,=126,- 59&#41;, 51.2 %

And replaying for confidence against my best public books:

Code: Select all

   # PLAYER              &#58; RATING    POINTS  PLAYED    (%)
   1 K8 Book X.bin       &#58; 3012.5    1071.0    2000   53.5%
   2 K8 MyFriends.bin    &#58; 2987.5     929.0    2000   46.5%

Code: Select all

   # PLAYER           &#58; RATING    POINTS  PLAYED    (%)
   1 K8 Book X.bin    &#58; 3006.8    1038.5    2000   51.9%
   2 K8 Oops.bin      &#58; 2993.2     961.5    2000   48.1%

So, a tough book, which has the only drawback that it doesn't beat other books by larger margins. And it's a small book compared to some dinosaurs there.

Laskos · Post by **Laskos** » Sat Feb 28, 2015 6:58 am

APassionForCriminalJustic wrote:
Kai, what is your opinion surrounding both the 1337 ChessPRO series and the KING ASAD books? I use the 1337 commercial books.

I am no expert, and it depends on version used. AFAIK King Asad is larger and tends to be better with white. In BooksWar, the older version of 1337 performs worse than the public (and older) version of King Asad. That's all I can suggest, my bet would be on an updated King Asad.

Laskos · Post by **Laskos** » Sat Feb 28, 2015 5:07 pm

Laskos wrote:
APassionForCriminalJustic wrote:
Kai, what is your opinion surrounding both the 1337 ChessPRO series and the KING ASAD books? I use the 1337 commercial books.
I am no expert, and it depends on version used. AFAIK King Asad is larger and tends to be better with white. In BooksWar, the older version of 1337 performs worse than the public (and older) version of King Asad. That's all I can suggest, my bet would be on an updated King Asad.

After a long while (several years) I tested some good CTG books, I got some taste after seeing polyglot ones. Yes, using the latest Stockfish, King Asad Vipre from January clobbers at 1'+1'' both ChessPro 1337 from November (300 games, +39 -9, the rest equal, high draw rate) and the BooksWar champion Rising Star from December (100 games, +36 -2, the rest equal, lower draw rate). I suspect that the later King Asad Vipre (January) is tuned against the two strong, but earlier books. It's important to be updated with these books

APassionForCriminalJustic · Sat Feb 28, 2015 5:30 pm

Laskos wrote:
Laskos wrote:
APassionForCriminalJustic wrote:
Kai, what is your opinion surrounding both the 1337 ChessPRO series and the KING ASAD books? I use the 1337 commercial books.
I am no expert, and it depends on version used. AFAIK King Asad is larger and tends to be better with white. In BooksWar, the older version of 1337 performs worse than the public (and older) version of King Asad. That's all I can suggest, my bet would be on an updated King Asad.
After a long while (several years) I tested some good CTG books, I got some taste after seeing polyglot ones. Yes, using the latest Stockfish, King Asad Vipre from January clobbers at 1'+1'' both ChessPro 1337 from November (300 games, +39 -9, the rest equal, high draw rate) and the BooksWar champion Rising Star from December (100 games, +36 -2, the rest equal, lower draw rate). I suspect that the later King Asad Vipre (January) is tuned against the two strong, but earlier books. It's important to be updated with these books

Thank you Kai. I have been committed to 1337 for sometime, but King Asad Vipre version 5 if I am not mistaken has just been released this February, so I will probably go with that. The book is a lot cheaper too. The only issue is that I do not use CTG; I only use .bin books in Winboard. I do not know if the author is willing to create polyglot versions of his books. I've heard it's easy.

michiguel · Post by **michiguel** » Sat Feb 28, 2015 11:12 pm

Laskos wrote:
michiguel wrote:
Laskos wrote:
michiguel wrote:
I have the feeling that using books may increase white advantage. Anyway, simulations with 30 elo of white advantage are below.
Tie breaks are not applied. Those cases are the ones listed as "shared". When there was only one winner is listed as "outright".
I can try to implement some SB rules.

Miguel
Code: Select all
===========

Color when 1 plays white against 2

Total engines = 11
Total games = 55
Total rounds = 11
Total boards = 5
Total cycles = 1000000
draw rate &#40;equal strength&#41; = 64.0%
White advantage = 30.0
rating&#91;0&#93;=3200
rating&#91;1&#93;=3100
rating&#91;2&#93;=3050
rating&#91;3&#93;=3000
rating&#91;4&#93;=2950
rating&#91;5&#93;=2900
rating&#91;6&#93;=2700
rating&#91;7&#93;=2700
rating&#91;8&#93;=2200
rating&#91;9&#93;=2200
rating&#91;10&#93;=2200

won    = 622003
shared = 177919
loss   = 200078
total  = 1000000
won outright % = 62.2  <===============
won shared   % = 17.8

========

Reversed colors

Total engines = 11
Total games = 55
Total rounds = 11
Total boards = 5
Total cycles = 1000000
draw rate &#40;equal strength&#41; = 64.0%
White advantage = 30.0
rating&#91;0&#93;=3200
rating&#91;1&#93;=3100
rating&#91;2&#93;=3050
rating&#91;3&#93;=3000
rating&#91;4&#93;=2950
rating&#91;5&#93;=2900
rating&#91;6&#93;=2700
rating&#91;7&#93;=2700
rating&#91;8&#93;=2200
rating&#91;9&#93;=2200
rating&#91;10&#93;=2200

won    = 586411
shared = 186268
loss   = 227321
total  = 1000000
won outright % = 58.6 <===============
won shared   % = 18.6
I used drawelo of 200, which gives lower draw rate than your 64% for equal opponents. 64% seems a bit high, it's valid only for several top engines, the rest have lower draw rates. Can you use 56%? Probably my drawelo of 200 is a bit too low, though.
I used 64% just to compare with Peter's. Just to make it clear, this is between equal opponents. Between opponents of different strength scales down automatically.

One of the limitations of Ordo (and any other rating software for that matter) is that the draw rate (between equal opponents) is assumed constant throughout the rating spectrum, and this is not true. So, an average needs to be used (the model will be improved when I work with the wilo model).

Here are several combinations (reversed colors, #2 plays white against #1)

draw rate (equal strength) = 56.0%
White advantage = 30.0
won = 564497
shared = 179396
loss = 256107
total = 1000000
won outright % = 56.4
won shared % = 17.9

draw rate (equal strength) = 50.0%
White advantage = 30.0
won = 548826
shared = 173783
loss = 277391
total = 1000000
won outright % = 54.9
won shared % = 17.4

draw rate (equal strength) = 40.0%
White advantage = 30.0
won = 524417
shared = 164144
loss = 311439
total = 1000000
won outright % = 52.4
won shared % = 16.4
Thanks, that is very close to my results (52% draw rate for drawelo 200 and equal opponents). In fact I used a hacked in 5 minutes simulator for TCEC, where the drawelo_var was a variable function of Elos of opponents as drawelo*Elo/3200, with SF and Komodo Elo at 3200, and drawelo of 260. This is a gross approximation to take into account the strength of engines. When hacking the sim, I set drawelo 200 for WCCC, but inadvertently kept the dependence on Elos.

Then, in case of ties, I used random assignment of places, say 2 engines are tied, one engine has 50% of a win in that run, 50% the other. Didn't use colors.

I cleaned it, and a I am releasing the code and binaries.
http://www.talkchess.com/forum/viewtopic.php?t=55514

Miguel

ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events