Why do you assume the test is "deeply flawed"? It it was correctly executed then the test shows that the elo model fails in certain specific cases when large elo differences are involved. We all know that the elo model is only an approximation and this gives some extra confirmation that it is not exact.Marco wrote:I am amazed that the idea that your tests are deeply flawed does not cross your mind even for a moment.
Top 3 engines have TB implementations obeying Bible morals
Moderators: hgm, Rebel, chrisw
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: Top 3 engines have TB implementations obeying Bible mora
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 937
- Joined: Fri Mar 10, 2006 4:29 pm
- Location: Germany
Re: Top 3 engines have TB implementations obeying Bible mora
Even the access on a fast SSD comes not for free.Michel wrote:Why do you assume the test is "deeply flawed"? It it was correctly executed then the test shows that the elo model fails in certain specific cases when large elo differences are involved. We all know that the elo model is only an approximation and this gives some extra confirmation that it is not exact.Marco wrote:I am amazed that the idea that your tests are deeply flawed does not cross your mind even for a moment.
Nor the caching of TB info.
Now the question arises what exactly does this test measure?
Jörg Oster
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: Top 3 engines have TB implementations obeying Bible mora
If the test was correctly executed it shows that the elo model gives a drastically wrong prediction in this case.Joerg wrote:Even the access on a fast SSD comes not for free.
Nor the caching of TB info.
Now the question arises what exactly does this test measure?
To be totally convincing there should also be a measurement of elo(TB vs no-TB) under the exactly the same conditions. I have assumed it is positive but it would be better to be sure. If it were negative that would yield a simpler explanation of Kai's result.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Top 3 engines have TB implementations obeying Bible mora
The results in identical conditions TB vs non-TB for top-3 are here (the sole difference here is that it is Round-Robin):Michel wrote:If the test was correctly executed it shows that the elo model gives a drastically wrong prediction in this case.Joerg wrote:Even the access on a fast SSD comes not for free.
Nor the caching of TB info.
Now the question arises what exactly does this test measure?
To be totally convincing there should also be a measurement of elo(TB vs no-TB) under the exactly the same conditions. I have assumed it is positive but it would be better to be sure. If it were negative that would yield a simpler explanation of Kai's result.
http://www.talkchess.com/forum/viewtopi ... 92&start=0
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) CFS(next)
1 Houdini_602 Syzygy-6 : 29.6 13.7 550.5 1000 55.0 91
2 BrainFish Syzygy-6 : 15.2 13.7 526.0 1000 52.6 85
3 Komodo 11.2 Syzygy-6 : 3.8 13.7 506.5 1000 50.6 93
4 Komodo 11.2 NO TB : -12.6 13.6 478.5 1000 47.9 52
5 BrainFish NO TB : -13.2 13.7 477.5 1000 47.8 81
6 Houdini_602 NO TB : -22.9 13.7 461.0 1000 46.1 ---
I am a bit tired of these Stockfish developers, who combine extreme "rigour", "scientific scepticism" with arbitrary things like "naturalness", and are sometimes uttering complete nonsenses ("DTZ are useless") without being sanctioned inside their community (I have browsed several pull requests for NTB to see that, I am not following them closely).
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: Top 3 engines have TB implementations obeying Bible mora
Thanks for clarifying! I knew you had done tests of TB versus no-TB but I was not sure that they were under the same conditions.Laskos wrote:The results in identical conditions TB vs non-TB for top-3 are here (the sole difference here is that it is Round-Robin):Michel wrote:If the test was correctly executed it shows that the elo model gives a drastically wrong prediction in this case.Joerg wrote:Even the access on a fast SSD comes not for free.
Nor the caching of TB info.
Now the question arises what exactly does this test measure?
To be totally convincing there should also be a measurement of elo(TB vs no-TB) under the exactly the same conditions. I have assumed it is positive but it would be better to be sure. If it were negative that would yield a simpler explanation of Kai's result.
http://www.talkchess.com/forum/viewtopi ... 92&start=0
The average gain of top-3 engines from 6-men Syzygy on this same suite and time control is 32 +/- 5 (pentanomial) ELO points. I mentioned 30 ELO points gain among top dogs with 6-men Syzygy on this suite in the opening post here.Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) CFS(next) 1 Houdini_602 Syzygy-6 : 29.6 13.7 550.5 1000 55.0 91 2 BrainFish Syzygy-6 : 15.2 13.7 526.0 1000 52.6 85 3 Komodo 11.2 Syzygy-6 : 3.8 13.7 506.5 1000 50.6 93 4 Komodo 11.2 NO TB : -12.6 13.6 478.5 1000 47.9 52 5 BrainFish NO TB : -13.2 13.7 477.5 1000 47.8 81 6 Houdini_602 NO TB : -22.9 13.7 461.0 1000 46.1 ---
I am a bit tired of these Stockfish developers, who combine extreme "rigour", "scientific scepticism" with arbitrary things like "naturalness", and are sometimes uttering complete nonsenses without being sanctioned inside their community (I have browsed several pull requests for NTB to see that, I am not following them closely).
BTW. I would expect that people that call your tests "deeply flawed" would provide some arguments to back up their claims. No such arguments seem to be forthcoming.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Top 3 engines have TB implementations obeying Bible mora
From my point of view, all the reducing/pruning and also the static eval are shaping a subtree of the search game space that has some concrete properties. One of those properties is that this search space tend to contain lines of play that are intrinsically safe for pruning a lot. This has a lot of implications, like some specific drawing chances.Michel wrote:Chess engine development is based on the idea that elo improvements are additive.
Change the shaping of the tree and you will have differences probably even on how well different rating formulas adapt to the results.
Daniel José - http://www.andscacs.com
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Top 3 engines have TB implementations obeying Bible mora
With books, the non-transitivity can be achieved more easily than with no-book pure engines. Using the same engine, book B can be tuned against book A, C against B, and A against C. I have done two and half years ago some tests only indirectly showing this, but book builders know better these issues.Michel wrote:Chess engine development is based on the idea that elo improvements are additive. The experience with fishtest shows that to a large extent this hypothesis is correct. However Kai's examples show it is not an absolute truth as
elo(TB vs no-TB)+elo(no-TB vs Fruit) != elo(TB vs Fruit)
Contempt, if it works at advertised, also causes non-additivity.
As far as I know no one has reported non-transitivity yet. I.e.
engineA is stronger than engineB
engineB is stronger than engineC
engineC is stronger than engineA
Non-transitivity has been reported for opening books, but I do not know how reliable this information is.
http://www.talkchess.com/forum/viewtopi ... &start=226
http://www.talkchess.com/forum/viewtopi ... &start=229
I tried some improvisations yesterday, but still have not been able to show the non-transitivity due to TBs. I took BrainFish at 0.1s/move and Zurichess Neuchatel at 0.4s/move. In general play from normal openings at these time controls, they are separated by some 200 ELO points. Zurichess is a fastly improving engine with each new version, and in my impression, it has a very strong search, but not that strong eval. My intuition would say that it has more chances to get such inversions in performance against TB vs no-TB.A well-known game exhibiting non-transitive behavior is Penney's game https://en.wikipedia.org/wiki/Penney%27s_game .]
Perhaps one can make a non-transitive example by starting from a non-additive example and then tweak the time controls for each engine so that they become approximately of equal strength.
Instead of tweaking the time controls one can also tweak the engines' nps if one has access to their source code.
Suite is the same unbalanced 7-8-9 men. 6-men Syzygy from fast SSD.
First, I had to check whether Brainfish itself benefits from TBs at this fast time control, as access times and other issues might interfere:
Pentanomial error margins are 1.8-2.0 smaller than those shown in Cutechess in all the following results, because the positions are unbalanced. No time losses.
Time control 0.1s/move.
Code: Select all
Score of BrainFish NO TB vs BrainFish Syzygy-6: 261 - 335 - 404 [0.463] 1000
ELO difference: -25.76 +/- 16.63
Finished match
But against Zurichess Neuchatel at 0.4s/move (about 200 ELO points weaker than BrainFish at 0.1s/move) the result is:
Code: Select all
Rank Name ELO +/- Games Score Draws
3 Zurichess Neuchatel -101 11 2000 36% 49%
1 BrainFish NO TB 117 16 1000 66% 47%
2 BrainFish Syzygy-6 85 15 1000 62% 50%
Finished match
-
- Posts: 937
- Joined: Fri Mar 10, 2006 4:29 pm
- Location: Germany
Re: Top 3 engines have TB implementations obeying Bible mora
Hello Kai,
would you mind sharing your unbalanced 7,8,9-men opening set?
(You could also send it via PM.)
I would love to do my own test-runs.
would you mind sharing your unbalanced 7,8,9-men opening set?
(You could also send it via PM.)
I would love to do my own test-runs.
Jörg Oster
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Top 3 engines have TB implementations obeying Bible mora
I have sent you via PM the link.Joerg Oster wrote:Hello Kai,
would you mind sharing your unbalanced 7,8,9-men opening set?
(You could also send it via PM.)
I would love to do my own test-runs.
-
- Posts: 937
- Joined: Fri Mar 10, 2006 4:29 pm
- Location: Germany
Re: Top 3 engines have TB implementations obeying Bible mora
Thank you!Laskos wrote:I have sent you via PM the link.Joerg Oster wrote:Hello Kai,
would you mind sharing your unbalanced 7,8,9-men opening set?
(You could also send it via PM.)
I would love to do my own test-runs.
Jörg Oster