I was recently made aware of this engine, and as the author of CuckooChess I got interested and decided to investigate it a bit. It initially seemed interesting, but I got the first clue that something was not right when I read the abstract.Sylwy wrote:Take a look - please - to this paper:
http://rahular.com/phoenix/
The source-code is here:
https://github.com/rahular/phoenix/archive/master.zip
Interesting ?
Making such a claim in 2014 (the thesis is dated June 2014) is ridiculous.However there are still many areas in which humans excel in comparison with the machines. One such area is chess. Even with great advances in the speed and computational power of modern machines, Grandmasters often beat the best chess programs in the world with relative ease.
The result section claims that CuckooChess is rated 2530 and the modified engine based on 1000 games got a rating of 2546, and therefor the modified engine is 19 elo stronger. How the author got 19 from 2546-2530 is a mystery. The report also contains output from elostat which says the rating difference is 2546-2514 = 32. However, if you look at the actual games it can be seen that the modified engine played white in all games. If the games are instead fed into bayeselo that takes the white advantage into account, the result is:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 mgGenes4 1 10 10 1000 55% -1 25%
2 defaultGenes -1 10 10 1000 45% 1 25%
It does not end here though. In appendix A.2 of the report, the modified piece square tables that performed best are reported. I inserted these values into CuckooChess and played a match against the unmodified CuckooChess.
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Cuckoo113a9 213 23 21 525 93% -213 7%
2 CuckooPhoenix -213 21 23 525 7% 213 7%
From reading the paper and the source code, I believe what the author intended to do was the following:
1. Remove all evaluation features except material and piece square tables from CuckooChess.
2. Add a framework for deep learning and genetic algorithms to be able to train the piece square table values.
3. Perform the training.
4. Insert the modified values into the original engine and measure the change in playing strength.
However, based on my findings above, it seems like step 4 failed and the modified values were not actually used. Possibly this bug affected the training step too.
The patch containing the new piece square table values is available here.
It is not clear to me if the author intentionally made his method look good, or if he just made serious mistakes, but I suppose Hanlon's razor applies.