ICGA's 2015 World Computer Chess Championship/Events

Milos · Post by **Milos** » Thu Feb 26, 2015 8:51 pm

hgm wrote:
IanO wrote:There is a difference: in this case the opponents themselves are using their own logic to resign, offer draws, and accept draws. To that I have no objection (but does the UCI protocol allow draw negotiation? I forget...) What I am against is a third-party making the decisions (TCEC interpretation of engine scores).
There is no obligation to use UCI, right? It is a choice of the participants. If my engine is defective in other respects, say it cannot find checkmate in KQK, that would be no reason to alter the rules of Chess and declare KQK won by the 'baring rule'. I should bloody well fix my engine...

Chess programmers and tournament organizers should also have responsibility towards ppl watching games.
As the rules are now, both engines should agree that the game is won/lost with their score.
If other engine also says (with its score) that your KQK is won, it is practically resigning. If your engine cannot see the mate it should have displayed score zero then and game would continue till the 50-move rule kicked in.
What you are suggesting would be human equivalent of "I am with bare king in KQK, but I won't resign, because I want to see if my opponent will be able to checkmate me". Now imagine this on GM level, and tell me what would you think about GM that behaved like this, and never resigned, instead forcing his opponents to mate him?

hgm · Post by **hgm** » Thu Feb 26, 2015 9:21 pm

Milos wrote:
hgm wrote:
IanO wrote:There is a difference: in this case the opponents themselves are using their own logic to resign, offer draws, and accept draws. To that I have no objection (but does the UCI protocol allow draw negotiation? I forget...) What I am against is a third-party making the decisions (TCEC interpretation of engine scores).
There is no obligation to use UCI, right? It is a choice of the participants. If my engine is defective in other respects, say it cannot find checkmate in KQK, that would be no reason to alter the rules of Chess and declare KQK won by the 'baring rule'. I should bloody well fix my engine...
Chess programmers and tournament organizers should also have responsibility towards ppl watching games.
As the rules are now, both engines should agree that the game is won/lost with their score.
If other engine also says (with its score) that your KQK is won, it is practically resigning. If your engine cannot see the mate it should have displayed score zero then and game would continue till the 50-move rule kicked in.
What you are suggesting would be human equivalent of "I am with bare king in KQK, but I won't resign, because I want to see if my opponent will be able to checkmate me". Now imagine this on GM level, and tell me what would you think about GM that behaved like this, and never resigned, instead forcing his opponents to mate him?

I think it would not be smart to resign KBNK even against a GM.

And in OTB computer tournaments I have scored a draw as the weaker side in KQBK, against a 2700+ engine.

One thing is for sure: if I would make an engine to participate in such tourneys that abuse the score for such practices I would NEVER have it display scores below -1. I have made that mistake once (with Joker), to naively display the heuristic evaluation score, with as a result that it was frequently assigned a loss in obviously drawn positions.

Michel · Post by **Michel** » Fri Feb 27, 2015 7:07 am

bob wrote:
Michel wrote:
I should have been more specific. I was referring to the initial observation(s) that provide the genesis of a new theory:
Yes sorry, I now realize that. I was thinking in terms of established scientific theories (which exist in many fields).

I assume you are specifically referring to computer chess. In computer chess there is currently simply no theory to speak off (although many people here present their dogmas as undisputable truth). To start nobody understands why minimax search is so effective in chess. The success of the tuning methods in Gaviota (to which you are contributing!) and Texel "suggests" that is is good to have a static evaluation reflecting the statistical properties of a position. But there is theoretically no reason why such "objective evaluaton" would propagate through search (min/max are functions which are notoriously difficult to handle statistically). In other words if you think of the static evaluation as somehow statistically summarizing what a deeper search would reveal then you run into contradictions.

So yes. Personally I appreciate very much the experiments you (and Kai Laskos) are doing!

BTW. What people refer to as "theory" in computer chess is actually tree search algorithms which is really mathematics (or perhaps theoretical computer science). But as I said above, nobody understands why tree search + static evaluation produces a good chess program.
Actually quite a few people DO understand why this works. In fact, Claude Shannon understood it back in the 1950's...

I do not understand how you can write that. Shannon described an architecture for game playing programs based on minimax search and static evaluation. He did not explain why it would work.

He probably thought this is self evident but in fact minimax search can be completely pathological even in very simple games

http://www.cs.umd.edu/~nau/papers/nau1983decision.pdf

The reason is precisely what I wrote above: min and max are very badly behaving functions from the point of view of probability theory.

Adam Hair · Post by **Adam Hair** » Fri Feb 27, 2015 11:57 am

Milos wrote:
Adam Hair wrote:White advantage increases as the average strength of the opponents increases. I am not saying that 50 Elo is a good assumption under the assumed conditions, for I have not had time to follow what the 3 of you are doing. But, you should not just assume that the advantage should be under 30 Elo.
True, but white advantage also decreases as the Elo gap between engines increases (i.e. higher white advantage artificially increases Elo gap), and in this hypothetical case we were discussing (of SF participating ICGA), gap would be quite substantial.

White advantage is not as sensitive to Elo distance as you may think. I do not think that it is a given that White advantage should be equal to or less than 30 Elo with the given conditions.

jdart · Post by **jdart** » Fri Feb 27, 2015 3:47 pm

Personally I think opening books are important. The shallow fixed book competitions have their place and they do provide a uniform platform for testing. But sometimes even good engines quickly go into known sub-optimal lines. Opening theory is very deep in some areas now, especially with correspondence play that is engine-aided. Search during a game does not easily replace this accumulated knowledge, especially in the early opening stages where the end points of known lines are far away.

Some development teams (Hiarcs in particular) have put a huge amount of effort into book development including finding strong novelties. However, I know some engine authors don't care about books and don't want to spend any effort there.

--Jon

Laskos · Post by **Laskos** » Fri Feb 27, 2015 5:04 pm

jdart wrote:Personally I think opening books are important. The shallow fixed book competitions have their place and they do provide a uniform platform for testing. But sometimes even good engines quickly go into known sub-optimal lines. Opening theory is very deep in some areas now, especially with correspondence play that is engine-aided. Search during a game does not easily replace this accumulated knowledge, especially in the early opening stages where the end points of known lines are far away.

Some development teams (Hiarcs in particular) have put a huge amount of effort into book development including finding strong novelties. However, I know some engine authors don't care about books and don't want to spend any effort there.

--Jon

People not involved directly tend to underestimate the importance of books. This night I tested a newer polyglot book than those I had. Komodo 8 in Round-Robin:

Code: Select all

   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 Oops.bin           &#58; 3066.3    1221.0    2000   61.0%
   2 K8 RpC.bin            &#58; 3033.2    1111.5    2000   55.6%
   3 K8 Komodo.bin         &#58; 3003.9    1013.0    2000   50.6%
   4 K8 SF book.bin        &#58; 2992.9     976.0    2000   48.8%
   5 K8 Performance.bin    &#58; 2977.5     924.0    2000   46.2%
   6 K8 No Book            &#58; 2926.2     754.5    2000   37.7%

The newer book, Oops.bin, added 33 Elo points to what little I tested before. Komodo 8 own book performs 63 Elo points below, showing that the authors don't care much. Book.bin of Stockfish is also weak. And this compared to generalist books like Oops.bin and RpC.bin. There are books which specialize for and against a particular engine. And there are book fights, for example even newer MyFriend.bin is tuned against Oops.bin, and beats it in direct much using both Stockfish and K8.

There is so much complexity (and importance strength-wise) in book building, that it's a serious profession, some involved with Playchess Engine Room spend several hours a day working on them. And the books age in a matter of days. After some 2 weeks, a top book might slide to be significantly weaker than the new ones. Here is an extract from "BooksWar" recent rating list using Stockfish, a list which includes some powerful CTG books:

Code: Select all

   # PLAYER                        &#58; RATING    POINTS  PLAYED    (%)
   1 Rising Star.ctg               &#58; 3387.7     676.5    1140   59.3%
   2 Book X.bin                    &#58; 3359.0     287.5     540   53.2%
   3 PlaychessNightmare.ctg        &#58; 3355.0     284.0     540   52.6%
   4 Anaconda.ctg                  &#58; 3337.6     268.5     540   49.7%
   5 Invisible Stars.ctg           &#58; 3336.7     580.0    1140   50.9%
   6 Hiarcs 14l.ctg                &#58; 3333.7     265.0     540   49.1%
   7 King Asad Vipre.ctg           &#58; 3322.0     650.0    1200   54.2%
   8 Sicilian Avenged.ctg          &#58; 3320.2     646.5    1200   53.9%
   9 IPmanbook.ctg                 &#58; 3301.4     276.5     600   46.1%
  10 Stormworm 5.0.bin             &#58; 3281.9     312.5     600   52.1%
  11 Rodent.bin                    &#58; 3279.6     254.5     600   42.4%
  12 My Friends 10.1.bin           &#58; 3264.6     204.5     540   37.9%
   .............

   .............
  78 Monks 1.4.ctg                 &#58; 2745.7     516.5    1050   49.2%
  79 RedKep.ctg                    &#58; 2738.0     249.5     450   55.4%
  80 AMALIA frANCINE.ctg           &#58; 2737.9     194.0     420   46.2%
  81 The Sniper 6.ctg              &#58; 2729.7     503.0    1020   49.3%
  82 Amalia Power.ctg              &#58; 2724.4     309.0     600   51.5%
  83 Doom Re-Born.ctg              &#58; 2719.3     181.5     420   43.2%
  84 Boom Lite 3.ctg               &#58; 2678.9     199.5     450   44.3%
  85 Doom 3 revisited.ctg          &#58; 2673.6     195.0     450   43.3%
  86 Doom FX.ctg                   &#58; 2670.9     253.0     600   42.2%
  87 Rock 7.ctg                    &#58; 2662.7     244.5     600   40.8%

The books using Stockfish vary the strength by some amazing 700 Elo points. Observe there, on the top, Hiarcs 14l.ctg, Hiarcs team puts a lot of effort on their book, and their book version is labeled "l" because they updated it many times to keep its competitiveness.

Laskos · Post by **Laskos** » Fri Feb 27, 2015 8:54 pm

Milos wrote: Also there is plenty of good books available so even +/- 50 Elo range on books is improbable.
Your assumptions are simply grossly exaggerated.

You seem to know little about opening books. One cannot just pick a "good book" and think it will hold against anything. With books it often happens that A>B>C>A, and I will exemplify by some tests here:

Suppose one has a new book MyFriends.bin, performing as in the following test:

Code: Select all

   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 MyFriends.bin      &#58; 3057.6    1192.5    2000   59.6%
   2 K8 RpC.bin            &#58; 3033.2    1111.5    2000   55.6%
   3 K8 Komodo.bin         &#58; 3005.1    1017.0    2000   50.9%
   4 K8 SF Book.bin        &#58; 2992.8     975.5    2000   48.8%
   5 K8 Performance.bin    &#58; 2984.8     948.5    2000   47.4%
   6 K8 No Book            &#58; 2926.5     755.0    2000   37.8%

And another new book, Oops.bin, against the same opponents:

Code: Select all

   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 Oops.bin           &#58; 3066.3    1221.0    2000   61.0%
   2 K8 RpC.bin            &#58; 3033.2    1111.5    2000   55.6%
   3 K8 Komodo.bin         &#58; 3003.9    1013.0    2000   50.6%
   4 K8 SF Book.bin        &#58; 2992.9     976.0    2000   48.8%
   5 K8 Performance.bin    &#58; 2977.5     924.0    2000   46.2%
   6 K8 No Book            &#58; 2926.2     754.5    2000   37.7%

You will conclude that Oops.bin is better by 9 Elo points, and pick that book. But MyFriends.bin was specifically tuned against Oops.bin, and the direct match-up is the following:

Code: Select all

   # PLAYER              &#58; RATING    POINTS  PLAYED    (%)
   1 K8 MyFriends.bin    &#58; 3022.5    1127.5    2000   56.4%
   2 K8 Oops.bin         &#58; 2977.5     872.5    2000   43.6%

So you picked a book which loses to another book by 45 Elo points, but performs better against other opponents.

Hence a variability of at least 50 points is a minimum for books, despite your bragging, as the books are tricky.

Adam Hair · Post by **Adam Hair** » Sat Feb 28, 2015 12:26 am

I can not believe that Book X is ranked that high among the publicly available engines. I created it by playing thousands of games against other books and by adding Playchess and Infinity won games to it. I did not add any lines to it by hand.

Anyway, after the 2 months I spent creating Book X, I agree with Kai. There is a lot of variability associated with books. Transitivity is not a given property.

APassionForCriminalJustic · Sat Feb 28, 2015 4:24 am

Laskos wrote:
Milos wrote: Also there is plenty of good books available so even +/- 50 Elo range on books is improbable.
Your assumptions are simply grossly exaggerated.
You seem to know little about opening books. One cannot just pick a "good book" and think it will hold against anything. With books it often happens that A>B>C>A, and I will exemplify by some tests here:

Suppose one has a new book MyFriends.bin, performing as in the following test:
Code: Select all
   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 MyFriends.bin      &#58; 3057.6    1192.5    2000   59.6%
   2 K8 RpC.bin            &#58; 3033.2    1111.5    2000   55.6%
   3 K8 Komodo.bin         &#58; 3005.1    1017.0    2000   50.9%
   4 K8 SF Book.bin        &#58; 2992.8     975.5    2000   48.8%
   5 K8 Performance.bin    &#58; 2984.8     948.5    2000   47.4%
   6 K8 No Book            &#58; 2926.5     755.0    2000   37.8%
And another new book, Oops.bin, against the same opponents:
Code: Select all
   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 Oops.bin           &#58; 3066.3    1221.0    2000   61.0%
   2 K8 RpC.bin            &#58; 3033.2    1111.5    2000   55.6%
   3 K8 Komodo.bin         &#58; 3003.9    1013.0    2000   50.6%
   4 K8 SF Book.bin        &#58; 2992.9     976.0    2000   48.8%
   5 K8 Performance.bin    &#58; 2977.5     924.0    2000   46.2%
   6 K8 No Book            &#58; 2926.2     754.5    2000   37.7%
You will conclude that Oops.bin is better by 9 Elo points, and pick that book. But MyFriends.bin was specifically tuned against Oops.bin, and the direct match-up is the following:
Code: Select all
   # PLAYER              &#58; RATING    POINTS  PLAYED    (%)
   1 K8 MyFriends.bin    &#58; 3022.5    1127.5    2000   56.4%
   2 K8 Oops.bin         &#58; 2977.5     872.5    2000   43.6%
So you picked a book which loses to another book by 45 Elo points, but performs better against other opponents.

Hence a variability of at least 50 points is a minimum for books, despite your bragging, as the books are tricky.

Kai, what is your opinion surrounding both the 1337 ChessPRO series and the KING ASAD books? I use the 1337 commercial books.

Laskos · Post by **Laskos** » Sat Feb 28, 2015 6:50 am

Adam Hair wrote:I can not believe that Book X is ranked that high among the publicly available engines. I created it by playing thousands of games against other books and by adding Playchess and Infinity won games to it. I did not add any lines to it by hand.

Anyway, after the 2 months I spent creating Book X, I agree with Kai. There is a lot of variability associated with books. Transitivity is not a given property.

Book X is indeed a good book, with some peculiarity of his own. With what little I played, I can show how it behaves:

In general pool of some polyglot books, it is outdone by MyFriends and Oops:

Code: Select all

   # PLAYER                &#58; RATING    POINTS  PLAYED    (%)
   1 K8 MyFriends.bin      &#58; 3047.8     574.5     997   57.6%
   2 K8 Oops.bin           &#58; 3034.0     556.0    1003   55.4%
   3 K8 Book X.bin         &#58; 3031.7     549.0     997   55.1%
   4 K8 RpC.bin            &#58; 3012.8     522.0    1003   52.0%
   5 K8 Performance.bin    &#58; 2992.1     485.5     997   48.7%
   6 K8 SF Book.bin        &#58; 2987.1     477.5     997   47.9%
   7 K8 Komodo.bin         &#58; 2977.9     465.5    1003   46.4%
   8 K8 No Book            &#58; 2916.7     367.0     997   36.8%

But it's unbeatable head to head:

Code: Select all

3 K8 Book X.bin             &#58;   31  1750 (+544,=838,-368&#41;, 55.0 %

K8 Komodo.bin                 &#58; 250 (+ 96,=103,- 51&#41;, 59.0 %
K8 SF Book.bin                &#58; 250 (+ 69,=124,- 57&#41;, 52.4 %
K8 Performance.bin            &#58; 250 (+ 69,=124,- 57&#41;, 52.4 %
K8 RpC.bin                    &#58; 250 (+ 82,=108,- 60&#41;, 54.4 %
K8 No Book                    &#58; 250 (+110,=109,- 31&#41;, 65.8 %
K8 MyFriends.bin              &#58; 250 (+ 53,=144,- 53&#41;, 50.0 %
K8 Oops.bin                   &#58; 250 (+ 65,=126,- 59&#41;, 51.2 %

And replaying for confidence against my best public books:

Code: Select all

   # PLAYER              &#58; RATING    POINTS  PLAYED    (%)
   1 K8 Book X.bin       &#58; 3012.5    1071.0    2000   53.5%
   2 K8 MyFriends.bin    &#58; 2987.5     929.0    2000   46.5%

Code: Select all

   # PLAYER           &#58; RATING    POINTS  PLAYED    (%)
   1 K8 Book X.bin    &#58; 3006.8    1038.5    2000   51.9%
   2 K8 Oops.bin      &#58; 2993.2     961.5    2000   48.1%

So, a tough book, which has the only drawback that it doesn't beat other books by larger margins. And it's a small book compared to some dinosaurs there.

ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events

Re: ICGA's 2015 World Computer Chess Championship/Events