Top 3 engines have TB implementations obeying Bible morals

Michel · Post by **Michel** » Wed Oct 11, 2017 10:50 am

Marco wrote:I am amazed that the idea that your tests are deeply flawed does not cross your mind even for a moment.

Why do you assume the test is "deeply flawed"? It it was correctly executed then the test shows that the elo model fails in certain specific cases when large elo differences are involved. We all know that the elo model is only an approximation and this gives some extra confirmation that it is not exact.

Joerg Oster · Post by **Joerg Oster** » Wed Oct 11, 2017 12:01 pm

Michel wrote:
Marco wrote:I am amazed that the idea that your tests are deeply flawed does not cross your mind even for a moment.
Why do you assume the test is "deeply flawed"? It it was correctly executed then the test shows that the elo model fails in certain specific cases when large elo differences are involved. We all know that the elo model is only an approximation and this gives some extra confirmation that it is not exact.

Even the access on a fast SSD comes not for free.
Nor the caching of TB info.

Now the question arises what exactly does this test measure?

Michel · Post by **Michel** » Wed Oct 11, 2017 12:16 pm

Joerg wrote:Even the access on a fast SSD comes not for free.
Nor the caching of TB info.

Now the question arises what exactly does this test measure?

If the test was correctly executed it shows that the elo model gives a drastically wrong prediction in this case.

To be totally convincing there should also be a measurement of elo(TB vs no-TB) under the exactly the same conditions. I have assumed it is positive but it would be better to be sure. If it were negative that would yield a simpler explanation of Kai's result.

Laskos · Post by **Laskos** » Wed Oct 11, 2017 1:43 pm

Michel wrote:
Joerg wrote:Even the access on a fast SSD comes not for free.
Nor the caching of TB info.

Now the question arises what exactly does this test measure?
If the test was correctly executed it shows that the elo model gives a drastically wrong prediction in this case.

To be totally convincing there should also be a measurement of elo(TB vs no-TB) under the exactly the same conditions. I have assumed it is positive but it would be better to be sure. If it were negative that would yield a simpler explanation of Kai's result.

The results in identical conditions TB vs non-TB for top-3 are here (the sole difference here is that it is Round-Robin):
http://www.talkchess.com/forum/viewtopi ... 92&start=0

Code: Select all

   # PLAYER                  &#58; RATING  ERROR    POINTS  PLAYED     (%)   CFS&#40;next&#41; 

   1 Houdini_602 Syzygy-6    &#58;   29.6   13.7     550.5    1000    55.0      91    
   2 BrainFish Syzygy-6      &#58;   15.2   13.7     526.0    1000    52.6      85    
   3 Komodo 11.2 Syzygy-6    &#58;    3.8   13.7     506.5    1000    50.6      93    
   4 Komodo 11.2 NO TB       &#58;  -12.6   13.6     478.5    1000    47.9      52    
   5 BrainFish NO TB         &#58;  -13.2   13.7     477.5    1000    47.8      81    
   6 Houdini_602 NO TB       &#58;  -22.9   13.7     461.0    1000    46.1     ---

The average gain of top-3 engines from 6-men Syzygy on this same suite and time control is 32 +/- 5 (pentanomial) ELO points. I mentioned 30 ELO points gain among top dogs with 6-men Syzygy on this suite in the opening post of this thread.

I am a bit tired of these Stockfish developers, who combine extreme "rigour", "scientific scepticism" with arbitrary things like "naturalness", and are sometimes uttering complete nonsenses ("DTZ are useless") without being sanctioned inside their community (I have browsed several pull requests for NTB to see that, I am not following them closely).

Michel · Post by **Michel** » Wed Oct 11, 2017 1:58 pm

Laskos wrote:
Michel wrote:
Joerg wrote:Even the access on a fast SSD comes not for free.
Nor the caching of TB info.

Now the question arises what exactly does this test measure?
If the test was correctly executed it shows that the elo model gives a drastically wrong prediction in this case.

To be totally convincing there should also be a measurement of elo(TB vs no-TB) under the exactly the same conditions. I have assumed it is positive but it would be better to be sure. If it were negative that would yield a simpler explanation of Kai's result.
The results in identical conditions TB vs non-TB for top-3 are here (the sole difference here is that it is Round-Robin):
http://www.talkchess.com/forum/viewtopi ... 92&start=0
Code: Select all
   # PLAYER                  &#58; RATING  ERROR    POINTS  PLAYED     (%)   CFS&#40;next&#41; 

   1 Houdini_602 Syzygy-6    &#58;   29.6   13.7     550.5    1000    55.0      91    
   2 BrainFish Syzygy-6      &#58;   15.2   13.7     526.0    1000    52.6      85    
   3 Komodo 11.2 Syzygy-6    &#58;    3.8   13.7     506.5    1000    50.6      93    
   4 Komodo 11.2 NO TB       &#58;  -12.6   13.6     478.5    1000    47.9      52    
   5 BrainFish NO TB         &#58;  -13.2   13.7     477.5    1000    47.8      81    
   6 Houdini_602 NO TB       &#58;  -22.9   13.7     461.0    1000    46.1     ---   
The average gain of top-3 engines from 6-men Syzygy on this same suite and time control is 32 +/- 5 (pentanomial) ELO points. I mentioned 30 ELO points gain among top dogs with 6-men Syzygy on this suite in the opening post here.

I am a bit tired of these Stockfish developers, who combine extreme "rigour", "scientific scepticism" with arbitrary things like "naturalness", and are sometimes uttering complete nonsenses without being sanctioned inside their community (I have browsed several pull requests for NTB to see that, I am not following them closely).

Thanks for clarifying! I knew you had done tests of TB versus no-TB but I was not sure that they were under the same conditions.

BTW. I would expect that people that call your tests "deeply flawed" would provide some arguments to back up their claims. No such arguments seem to be forthcoming.

cdani · Post by **cdani** » Wed Oct 11, 2017 2:37 pm

Michel wrote:Chess engine development is based on the idea that elo improvements are additive.

From my point of view, all the reducing/pruning and also the static eval are shaping a subtree of the search game space that has some concrete properties. One of those properties is that this search space tend to contain lines of play that are intrinsically safe for pruning a lot. This has a lot of implications, like some specific drawing chances.
Change the shaping of the tree and you will have differences probably even on how well different rating formulas adapt to the results.

Laskos · Post by **Laskos** » Fri Oct 13, 2017 12:04 pm

Michel wrote:Chess engine development is based on the idea that elo improvements are additive. The experience with fishtest shows that to a large extent this hypothesis is correct. However Kai's examples show it is not an absolute truth as

elo(TB vs no-TB)+elo(no-TB vs Fruit) != elo(TB vs Fruit)

Contempt, if it works at advertised, also causes non-additivity.

As far as I know no one has reported non-transitivity yet. I.e.

engineA is stronger than engineB
engineB is stronger than engineC
engineC is stronger than engineA

Non-transitivity has been reported for opening books, but I do not know how reliable this information is.

With books, the non-transitivity can be achieved more easily than with no-book pure engines. Using the same engine, book B can be tuned against book A, C against B, and A against C. I have done two and half years ago some tests only indirectly showing this, but book builders know better these issues.
http://www.talkchess.com/forum/viewtopi ... &start=226
http://www.talkchess.com/forum/viewtopi ... &start=229

A well-known game exhibiting non-transitive behavior is Penney's game https://en.wikipedia.org/wiki/Penney%27s_game .]

Perhaps one can make a non-transitive example by starting from a non-additive example and then tweak the time controls for each engine so that they become approximately of equal strength.

Instead of tweaking the time controls one can also tweak the engines' nps if one has access to their source code.

I tried some improvisations yesterday, but still have not been able to show the non-transitivity due to TBs. I took BrainFish at 0.1s/move and Zurichess Neuchatel at 0.4s/move. In general play from normal openings at these time controls, they are separated by some 200 ELO points. Zurichess is a fastly improving engine with each new version, and in my impression, it has a very strong search, but not that strong eval. My intuition would say that it has more chances to get such inversions in performance against TB vs no-TB.

Suite is the same unbalanced 7-8-9 men. 6-men Syzygy from fast SSD.
First, I had to check whether Brainfish itself benefits from TBs at this fast time control, as access times and other issues might interfere:

Pentanomial error margins are 1.8-2.0 smaller than those shown in Cutechess in all the following results, because the positions are unbalanced. No time losses.

Time control 0.1s/move.

Code: Select all

Score of BrainFish NO TB vs BrainFish Syzygy-6&#58; 261 - 335 - 404  &#91;0.463&#93; 1000
ELO difference&#58; -25.76 +/- 16.63
Finished match

So, from fast SSD, there are no problems with Syzygy-6 even at 0.1s/move, their benefit in self-play is as consistent as before.

But against Zurichess Neuchatel at 0.4s/move (about 200 ELO points weaker than BrainFish at 0.1s/move) the result is:

Code: Select all

Rank Name                          ELO     +/-   Games   Score   Draws

   3 Zurichess Neuchatel          -101      11    2000     36%     49%

   1 BrainFish NO TB               117      16    1000     66%     47%
   2 BrainFish Syzygy-6             85      15    1000     62%     50%

Finished match

32 ELO points loss, instead of 26 ELO points gain. So, even if I didn't achieve non-transitivity, Stockfish might lose ELO points because of the presence of TBs even against only 200 ELO points weaker engine.

Joerg Oster · Post by **Joerg Oster** » Fri Oct 13, 2017 12:57 pm

Hello Kai,

would you mind sharing your unbalanced 7,8,9-men opening set?
(You could also send it via PM.)

I would love to do my own test-runs.

Laskos · Post by **Laskos** » Fri Oct 13, 2017 3:16 pm

Joerg Oster wrote:Hello Kai,

would you mind sharing your unbalanced 7,8,9-men opening set?
(You could also send it via PM.)

I would love to do my own test-runs.

I have sent you via PM the link.

Joerg Oster · Post by **Joerg Oster** » Fri Oct 13, 2017 3:51 pm

Laskos wrote:
Joerg Oster wrote:Hello Kai,

would you mind sharing your unbalanced 7,8,9-men opening set?
(You could also send it via PM.)

I would love to do my own test-runs.
I have sent you via PM the link.

Thank you!

Top 3 engines have TB implementations obeying Bible morals

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora