Top 3 engines have TB implementations obeying Bible morals

corres · Post by **corres** » Tue Oct 10, 2017 8:48 pm

In general, if you enhance the knowledge of an engine about draw positions this engine will strive to draw. This behavior is disadvantageous for a strong engine when it plays against a weaker one. And the positions investigated by the engines contain much more draw positions than win positions...
Moreover when an engine uses TB it plays such a manner as if its enemy also would use TB. Because of this the stronger engine can not utilize well the mistakes of the weaker engine.

petero2 · Post by **petero2** » Tue Oct 10, 2017 9:01 pm

Laskos wrote:Imagine a cunning, swindling implementation of Syzygy against hapless Fruit. I would think it can gain 100+ ELO points compared to NO TB instead of losing 10 ELO points.

Texel contains TB swindle code but I have not tried to measure how efficient it is in terms of ELO.

syzygy · Post by **syzygy** » Tue Oct 10, 2017 9:25 pm

Laskos wrote:
syzygy wrote:The problem here is that knowing too much (and expecting the opponent to know the same) can hurt against an engine that knows a lot less.
Yes, but the same applies to simply stronger engine without any TBs. Say Stockfish at 0.35s/move will perform better in these conditions against Fruit at 0.25s/move than the same Stockfish at 0.25s/move.

Yes, if you increase SF's thinking time, also without TBs at some point it will start doing worse against Fruit in positions that are theoretically drawn or theoretically lost.

Swindling with TBs is different, an engine using "TB swindling" can be made to theoretically not lose anything, but possibly gain in score significantly.

If the root position is in the TBs, swindling is relatively easy: use the TBs only to avoid moves that change the game-theoretic outcome and let the regular search do the rest. This is essentially what SF does.

Things are more complicated if the root position is not yet in the TBs. If the opponent does not use TBs, then a line leading to a complicated 6-piece loss might give the best practical chances to draw. But if the opponent uses TBs, that same complicated 6-piece TB loss is a certain loss with zero practical chances.

Current SF+TB basically assumes that the opponent plays TB endings as well as it can play them itself. This is probably the optimal strategy against Komodo+TB and against Houdini+TB. But not against Fruit...

That said, there is a scenario where SF could do better against weak opponents without losing strength against strong opponents: once it has determined that all moves lead to a TB loss, there is nothing to lose and it could re-search the position without TBs.

Jouni · Post by **Jouni** » Tue Oct 10, 2017 9:59 pm

Obviously TBs give even less ELO gain than previously thought. Maybe 0-3 ELO for 5 piece and 4-6 ELO for 6 piece? And it cannot measured in selfplay properly!

Laskos · Post by **Laskos** » Wed Oct 11, 2017 8:26 am

petero2 wrote:
Laskos wrote:Imagine a cunning, swindling implementation of Syzygy against hapless Fruit. I would think it can gain 100+ ELO points compared to NO TB instead of losing 10 ELO points.
Texel contains TB swindle code but I have not tried to measure how efficient it is in terms of ELO.

Yes, it does seem to have some effect. I tested in the same conditions with the same 7-8-9-men fairly unbalanced suite of openings against an about 1000 ELO points weaker engine Zurichess Bern (no TBs):

Code: Select all

Rank Name                          ELO     +/-   Games   Score   Draws

   3 Zurichess Bern               -169      10    2000     27%     51%

   1 Texel_Syzygy_Gaviota          173      15    1000     73%     50%
   2 Texel NO TB                   165      15    1000     72%     51%

Texel 1.07 enabled with 6-men Syzygy and 5-men Gaviota does seem to gain some ELO points (not many), instead of losing, like top 3 engines.

Laskos · Post by **Laskos** » Wed Oct 11, 2017 8:42 am

syzygy wrote:
Laskos wrote:
syzygy wrote:The problem here is that knowing too much (and expecting the opponent to know the same) can hurt against an engine that knows a lot less.
Yes, but the same applies to simply stronger engine without any TBs. Say Stockfish at 0.35s/move will perform better in these conditions against Fruit at 0.25s/move than the same Stockfish at 0.25s/move.
Yes, if you increase SF's thinking time, also without TBs at some point it will start doing worse against Fruit in positions that are theoretically drawn or theoretically lost.

Well, one has to find such time controls, opening suite and strength difference, and it's probably a rare event, while TBs losing ELO points compared to non-TB against a much weaker engine seems systematic.

I took BrainFish NO TB at 0.35s/move instead of the old 0.25s/move, against the same Fruit 2.1 at 0.25s/move from the same suite, and I got:

Score of Fruit 2.1 vs BrainFish NO TB 0.35s/move: 54 - 427 - 519 [0.314] 1000
ELO difference: -136.16 +/- 14.62
Finished match

At 0.25s/move all engines, the results were:

Code: Select all

                                   ELO     +/-   Games   Score   Draws

   7 Fruit 2.1                    -116       6    6000     34%     56%

   1 BrainFish NO TB               127      14    1000     67%     55%
   2 Komodo 11.2 NO TB             119      14    1000     66%     58%
   3 Houdini_602 NO TB             116      14    1000     66%     55%
   4 Komodo 11.2 Syzygy-6          113      14    1000     66%     57%
   5 BrainFish Syzygy-6            112      14    1000     66%     56%
   6 Houdini_602 Syzygy-6          108      14    1000     65%     56%

Finished match

So, at 0.35s/move Stockfish does perform better than at 0.25s/move. Observe also lower draw rate.

Swindling with TBs is different, an engine using "TB swindling" can be made to theoretically not lose anything, but possibly gain in score significantly.
If the root position is in the TBs, swindling is relatively easy: use the TBs only to avoid moves that change the game-theoretic outcome and let the regular search do the rest. This is essentially what SF does.

Regular search is itself very moral. Swindling can be done in some clever way from root positions in TB, just in case the opposing engine doesn't have Syzygy too.

Things are more complicated if the root position is not yet in the TBs. If the opponent does not use TBs, then a line leading to a complicated 6-piece loss might give the best practical chances to draw. But if the opponent uses TBs, that same complicated 6-piece TB loss is a certain loss with zero practical chances.

Current SF+TB basically assumes that the opponent plays TB endings as well as it can play them itself. This is probably the optimal strategy against Komodo+TB and against Houdini+TB. But not against Fruit...

That said, there is a scenario where SF could do better against weak opponents without losing strength against strong opponents: once it has determined that all moves lead to a TB loss, there is nothing to lose and it could re-search the position without TBs.

Laskos · Post by **Laskos** » Wed Oct 11, 2017 8:55 am

Jouni wrote:Obviously TBs give even less ELO gain than previously thought. Maybe 0-3 ELO for 5 piece and 4-6 ELO for 6 piece? And it cannot measured in selfplay properly!

It can be measured in self play from regular openings, and it's probably higher in self-play than your numbers. Just that TB implementations are too tame against very weak engines with no TBs.

Laskos · Post by **Laskos** » Wed Oct 11, 2017 9:04 am

Laskos wrote:More precisely, top 3 engines using Syzygy-6 are following two of the "Ten Commandments":

"Thou shalt not kill"
"Thou shalt not covet"

Yesterday night, improvising on how to improve the sensitivity of my fairly unbalanced 7-8-9 men openings suite to 6-men Syzygy (on a fast SSD), I took a plunge to leave overnight my desktop play games (0.25s/move) from this suite of top 3 engines enabled with 6-men Syzygy, and the same engines without any TBs, in a gauntlet against a much weaker engine, Fruit 2.1 (800 or so ELO points weaker). My idea was that the ELO benefit due to Syzygy-6 will increase, but error margins will also increase, and after the test, I will see if all in all the sensitivity increases (ELO difference over error margins).

Well, when seeing the results in Cutechess-Cli, first I thought I did something wrong with my batch file. Checking and re-checking everything, I came to conclusion that the Syzygy implementation in top 3 engines are well mannered, almost pious.

The result in Cutechess-Cli:
Code: Select all
                                   ELO     +/-   Games   Score   Draws

   7 Fruit 2.1                    -116       6    6000     34%     56%

   1 BrainFish NO TB               127      14    1000     67%     55%
   2 Komodo 11.2 NO TB             119      14    1000     66%     58%
   3 Houdini_602 NO TB             116      14    1000     66%     55%
   4 Komodo 11.2 Syzygy-6          113      14    1000     66%     57%
   5 BrainFish Syzygy-6            112      14    1000     66%     56%
   6 Houdini_602 Syzygy-6          108      14    1000     65%     56%

Finished match  
The correct pentanomial error margins are about 2 times smaller than those shown in Cutechess. Combining the results: 3 engines enabled with 6-men Syzygy are WEAKER than 3 engines NO TB by 10 +/- 4 ELO points against Fruit 2.1 on 7-8-9 men suite.

Conclusions:
Engines NO TB are themselves not very cunning in killing the weak. Contempt=0 was set in all 6 TB or NON TB engines.
Engines 6-men Syzygy are simply moral fanatics to perform weaker against Fruit 2.1 than NO TB engines. When top 3 are facing each other as worthy opponents, as we saw earlier, the benefit of 6-men Syzygy is on average about +30 ELO points from this suite. Imagine a cunning, swindling implementation of Syzygy against hapless Fruit. I would think it can gain 100+ ELO points compared to NO TB instead of losing 10 ELO points.

I took an even weaker engine, Zurichess Bern, about 1200 ELO points weaker than top dogs, in the same conditions, and got even more spectacular results:

Code: Select all

Rank Name                          ELO     +/-   Games   Score   Draws

   7 Zurichess Bern               -178       6    6000     26%     49%

   1 BrainFish NO TB               196      15    1000     76%     46%
   2 Komodo 11.2 NO TB             186      15    1000     74%     47%
   3 Houdini_602 NO TB             185      15    1000     74%     48%
   4 Komodo 11.2 Syzygy-6          172      14    1000     73%     51%
   5 BrainFish Syzygy-6            167      14    1000     72%     51%
   6 Houdini_602 Syzygy-6          160      14    1000     72%     52%

Finished match

On average for top-3 engines, 23 +/- 5 ELO points loss due to TB implementations in top-3 engines.

Michel · Post by **Michel** » Wed Oct 11, 2017 10:28 am

Chess engine development is based on the idea that elo improvements are additive. The experience with fishtest shows that to a large extent this hypothesis is correct. However Kai's examples show it is not an absolute truth as

elo(TB vs no-TB)+elo(no-TB vs Fruit) != elo(TB vs Fruit)

Contempt, if it works at advertised, also causes non-additivity.

As far as I know no one has reported non-transitivity yet. I.e.

engineA is stronger than engineB
engineB is stronger than engineC
engineC is stronger than engineA

Non-transitivity has been reported for opening books, but I do not know how reliable this information is. A well-known game exhibiting non-transitive behavior is Penney's game https://en.wikipedia.org/wiki/Penney%27s_game .

Perhaps one can make a non-transitive example by starting from a non-additive example and then tweak the time controls for each engine so that they become approximately of equal strength.

Instead of tweaking the time controls one can also tweak the engines' nps if one has access to their source code.

mcostalba · Post by **mcostalba** » Wed Oct 11, 2017 10:40 am

Laskos wrote:Combining the results: 3 engines enabled with 6-men Syzygy are WEAKER than 3 engines NO TB

I am amazed that the idea that your tests are deeply flawed does not cross your mind even for a moment.

Top 3 engines have TB implementations obeying Bible morals

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora