New results in the ERET chess test

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Glarean
Posts: 262
Joined: Sun Oct 05, 2008 1:04 pm
Location: Switzerland
Full name: Walter Eigenmann

New results in the ERET chess test

Post by Glarean »

The so-called ERET chess test (Eigenmann Rapid Engine Test) was created in Spring 2017 and is a collection of 111 difficult chess puzzles, which were specially selected to enable an approximate estimation of the playing strength of a (new) chess engine within a very short time, without having to play long engine tournaments. The programs should solve as many of the tasks as possible in a given time.

The 111-part collection covers a very broad spectrum of chess motifs, and the individual positional images are chosen paradigmatically, so that the puzzles represent thousands of analog positional patterns.
All chess and technical basics of the ERET are explained in detail in GLAREAN MAGAZIN:
https://glarean-magazin.ch/2017/03/05/c ... test-eret/

For own tests the author recommends the following test setting:

a) 10-15 sec. per position & engine
b) 1-8 CPU / 1024 Mb Hash / Syzygy-Tablebases / No Books
c) GPU for Leela-(NN-)Engines: from RTX 2060 on
d) GUI: "Fritz" (up to ver. 13) / "Arena" / "Shredder" / "Chess Assistant"

The author has already published earlier test results in March 2019:
https://glarean-magazin.ch/2019/03/07/e ... mann-test/

It is remarkable that the Artificial Chess Intelligence in the form of the NN program Leela is still on the advance;
A next ranking in a few months could already show a new ERET winner...

The following ranking list of 41 engines was generated in April 2019 with the following (new) hardware & software:
- 15 sec. per position / AMD-Ryzen-7-2070x / GPU RTX 2080
- 1-4 CPU / 1024 MB Hash / Syzygy-5men-TBs / No Opening Books
- Fritz 12 / Windows 10

Code: Select all

P R O G R A M                            S O L U T I O N S

01. Stockfish 200419 4CPU                84/111
02. Leela Chess Zero 21.1 (41812) 2CPU   82/111
03. Houdini 6.03 4CPU                    82/111
04. Komodo 12.3 4CPU                     77/111
05. Ethereal 11.25 4CPU                  67/111
06. Deep Shredder 13 4CPU                65/111
07. Sting 14 4CPU                        65/111
08. Xiphos 0.5 4CPU                      64/111
09. Andscacs 0.95 4CPU                   63/111
10. Booot 6.3 4CPU                       63/111
11. Fizbo 2 4CPU                         62/111
12. Laser 1.7 4CPU                       62/111
13. Critter 1.6a 4CPU                    59/111
14. Fire 7.1 4CPU                        59/111
15. Fritz 16 4CPU                        55/111
16. Gull.3.1 4CPU                        51/111
17. Equinox 3.30 4CPU                    51/111
18. Chiron 4 4CPU                        51/111
19. Deep Rybka 4.1 4CPU                  48/111
20. Wasp 3.50 4CPU                       47/111
21. Naum 4.6 4CPU                        41/111
22. Rybka WinFinder 2.2 4CPU             38/111
23. Spike 1.4 4CPU                       34/111
24. Senpai 1.0 4CPU                      34/111
25. Deep Junior Yokohama 4CPU            33/111
26. Crafty 25.2 4CPU                     30/111
27. Deep Gandalf 7.0 2CPU                29/111
28. Deep Sjeng WC2008 4CPU               27/111
29. Bright 0.5c 4CPU                     25/111
30. Deep Fritz 10 4CPU                   24/111
31. Deep Onno 1-2-70 4CPU                23/111
32. SOS 5.1 1CPU                         22/111
33. Comet B68 1CPU                       19/111
34. Pharaon 3.5.1 4CPU                   19/111
35. LittleGoliath2000 2.9a 1CPU          18/111
36. Nimzo 2000b 1CPU                     16/111
37. ProDeo 2.6 1CPU                      16/111
38. Monarch 1.7 1CPU                     14/111
39. Warrior 1.03 1CPU                    11/111
40. Clueless 1.4 1CPU                    11/111
41. Roce 0.0360 1CPU                     10/111
Downloads:

- Excel table with all ranking list solution times (zip)
https://glarean-magazin.ch/wp-content/u ... l-2019.zip

- Fritz database with all engine results (zip)
https://glarean-magazin.ch/wp-content/u ... l-2019.zip

- EPD file for importing the test into GUI's (zip)
https://glarean-magazin.ch/wp-content/u ... -Chess.zip

- PGN file with all analyses of the 111 tasks (CB-Reader)
http://view.chessbase.com/cbreader/2019 ... 64031.html

Have fun!

Walter

.
ThatsIt
Posts: 991
Joined: Thu Mar 09, 2006 2:11 pm

Re: New results in the ERET chess test

Post by ThatsIt »

Fire 7.1 behind Critter, Laser, Fizbo, Booot, Andscacs, Xiphos, Sting, Deep Shredder and Ethereal?
Critter 1.6 better than Fritz, Equinox and Gull?
Deep Shredder 13 better than Xiphos?
And so on ...

Looks like a lottery.
Glarean
Posts: 262
Joined: Sun Oct 05, 2008 1:04 pm
Location: Switzerland
Full name: Walter Eigenmann

Re: New results in the ERET chess test

Post by Glarean »

ThatsIt wrote: Tue Apr 23, 2019 4:01 pm Fire 7.1 behind Critter, Laser, Fizbo, Booot, Andscacs, Xiphos, Sting, Deep Shredder and Ethereal?
Maybe Fire should be repeated with other versions (bmi, pop, default?), maybe the Fritz-GUI is to blame, maybe Fire calculates worse on an AMD processor, maybe Fire is just a "runaway" - don't know. Who cares...
ThatsIt wrote: Tue Apr 23, 2019 4:01 pm Critter 1.6 better than Fritz...?
Yes:
20. Critter 1.6a
22. Fritz 16
http://ccrl.chessdom.com/ccrl/404/
ThatsIt wrote: Tue Apr 23, 2019 4:01 pm Critter 1.6 better than Equinox...?
Yes:
6. Critter 1.6
8. Equinox 3.3
http://www.fastgm.de/240-2.40.html
ThatsIt wrote: Tue Apr 23, 2019 4:01 pm Critter 1.6 better than Gull...?
Maybe - there's only about 20 Elo between Gull and Critter:
Gull 3 3180
Critter 3159
http://ccrl.chessdom.com/ccrl/4040/rating_list_all.html
ThatsIt wrote: Tue Apr 23, 2019 4:01 pm And so on ...
Nope, most relations in ERET are very plausible and comparable with the tournament results. And furthermore, the ERET should only offer an approximation, no more and no less.
ThatsIt wrote: Tue Apr 23, 2019 4:01 pm Looks like a lottery.
Nope - all engines rankings are different from each other.

This is proven by the following comparison of various older lists:
https://www.glarean-verlag.ch/schach/Ei ... atings.png

Many results of the CEGT list differ from the CCRL list - who cares... ;-)

Each ranking list depends on its test environment, the absolute ranking list doesn't exist.

Greetings: Walter

.
Branko Radovanovic
Posts: 89
Joined: Sat Sep 13, 2014 4:12 pm
Location: Zagreb, Croatia
Full name: Branko Radovanović

Re: New results in the ERET chess test

Post by Branko Radovanovic »

ThatsIt wrote: Tue Apr 23, 2019 4:01 pm Looks like a lottery.
Quite the contrary: the list appears to be remarkably well-corelated with playing strength. The above-mentioned German-language article says "surprisingly accurate" (überraschend genau), and personally I cannot but agree. Frankly, I thought such tests are outdated now, but consider this scenario: you receive an unknown engine, and you have to assess its strength in a reliable way in 15 minutes. Well, good luck with other methods!

It would be interesting to see how this test was assembled from a purely statistical point of view. Perhaps, by using a stepwise method of elimination of certain items, it might be possible to reduce the number of items while maintaining or even improving its predictive ability.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: New results in the ERET chess test

Post by Laskos »

Branko Radovanovic wrote: Tue Apr 23, 2019 7:27 pm
ThatsIt wrote: Tue Apr 23, 2019 4:01 pm Looks like a lottery.
Quite the contrary: the list appears to be remarkably well-corelated with playing strength. The above-mentioned German-language article says "surprisingly accurate" (überraschend genau), and personally I cannot but agree. Frankly, I thought such tests are outdated now, but consider this scenario: you receive an unknown engine, and you have to assess its strength in a reliable way in 15 minutes. Well, good luck with other methods!

It would be interesting to see how this test was assembled from a purely statistical point of view. Perhaps, by using a stepwise method of elimination of certain items, it might be possible to reduce the number of items while maintaining or even improving its predictive ability.
In these suites the rule of thumb would be to extract the square root of the total number of positions and divide the result by two (more or less, mostly an empirical factor) to get the two standard deviations error of the result (number of correct solutions).
So, the results should read like 75 +/- 5 (2SD), for example. Reducing further the number of items will hardly improve the predictive ability. I do not believe in some "silver bullet" test positions, which reduce significantly the dispersion while measuring the _strength_, more so with the arrival of diverse NN MCTS engines.
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: New results in the ERET chess test

Post by Dann Corbit »

I assume that Lela is 2 GPU not 2 CPU. Is that correct?
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
j.korhonen
Posts: 19
Joined: Tue Feb 26, 2019 12:34 am
Full name: Juhani Korhonen

Re: New results in the ERET chess test

Post by j.korhonen »

2 CPU + GPU I belive
Uri Blass
Posts: 10267
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: New results in the ERET chess test

Post by Uri Blass »

ThatsIt wrote: Tue Apr 23, 2019 4:01 pm Fire 7.1 behind Critter, Laser, Fizbo, Booot, Andscacs, Xiphos, Sting, Deep Shredder and Ethereal?
Critter 1.6 better than Fritz, Equinox and Gull?
Deep Shredder 13 better than Xiphos?
And so on ...

Looks like a lottery.
Not to me.

The probability of an engine that is 200 elo better to be in a better place in the list is clearly more than 90% and not 50%.
I think that there is no single case when A is more than 400 elo better than B and the list have B in a better place than A.
Glarean
Posts: 262
Joined: Sun Oct 05, 2008 1:04 pm
Location: Switzerland
Full name: Walter Eigenmann

Re: New results in the ERET chess test

Post by Glarean »

j.korhonen wrote: Tue Apr 23, 2019 8:57 pm 2 CPU + GPU I belive
Exactly.
ThatsIt
Posts: 991
Joined: Thu Mar 09, 2006 2:11 pm

Re: New results in the ERET chess test

Post by ThatsIt »

Uri Blass wrote: Tue Apr 23, 2019 9:01 pm
ThatsIt wrote: Tue Apr 23, 2019 4:01 pm Fire 7.1 behind Critter, Laser, Fizbo, Booot, Andscacs, Xiphos, Sting, Deep Shredder and Ethereal?
Critter 1.6 better than Fritz, Equinox and Gull?
Deep Shredder 13 better than Xiphos?
And so on ...

Looks like a lottery.
Not to me.

The probability of an engine that is 200 elo better to be in a better place in the list is clearly more than 90% and not 50%.
I think that there is no single case when A is more than 400 elo better than B and the list have B in a better place than A.
Thoose things are enough to talk about a lottery:

Code: Select all

30. Deep Fritz 10 4CPU                   24/111   ---   2676*
31. Deep Onno 1-2-70 4CPU                23/111   ---   2793* <---
32. SOS 5.1 1CPU                         22/111   ---   2382* <---
33. Comet B68 1CPU                       19/111   ---   2202*
34. Pharaon 3.5.1 4CPU                   19/111   ---   n/a
35. LittleGoliath2000 2.9a 1CPU          18/111   ---   n/a
36. Nimzo 2000b 1CPU                     16/111   ---   2286*
* = CEGT 40/4 + ...