In general, if you enhance the knowledge of an engine about draw positions this engine will strive to draw. This behavior is disadvantageous for a strong engine when it plays against a weaker one. And the positions investigated by the engines contain much more draw positions than win positions...
Moreover when an engine uses TB it plays such a manner as if its enemy also would use TB. Because of this the stronger engine can not utilize well the mistakes of the weaker engine.
Top 3 engines have TB implementations obeying Bible morals
Moderators: hgm, Rebel, chrisw
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
-
- Posts: 690
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: Top 3 engines have TB implementations obeying Bible mora
Texel contains TB swindle code but I have not tried to measure how efficient it is in terms of ELO.Laskos wrote:Imagine a cunning, swindling implementation of Syzygy against hapless Fruit. I would think it can gain 100+ ELO points compared to NO TB instead of losing 10 ELO points.
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Top 3 engines have TB implementations obeying Bible mora
Yes, if you increase SF's thinking time, also without TBs at some point it will start doing worse against Fruit in positions that are theoretically drawn or theoretically lost.Laskos wrote:Yes, but the same applies to simply stronger engine without any TBs. Say Stockfish at 0.35s/move will perform better in these conditions against Fruit at 0.25s/move than the same Stockfish at 0.25s/move.syzygy wrote:The problem here is that knowing too much (and expecting the opponent to know the same) can hurt against an engine that knows a lot less.
If the root position is in the TBs, swindling is relatively easy: use the TBs only to avoid moves that change the game-theoretic outcome and let the regular search do the rest. This is essentially what SF does.Swindling with TBs is different, an engine using "TB swindling" can be made to theoretically not lose anything, but possibly gain in score significantly.
Things are more complicated if the root position is not yet in the TBs. If the opponent does not use TBs, then a line leading to a complicated 6-piece loss might give the best practical chances to draw. But if the opponent uses TBs, that same complicated 6-piece TB loss is a certain loss with zero practical chances.
Current SF+TB basically assumes that the opponent plays TB endings as well as it can play them itself. This is probably the optimal strategy against Komodo+TB and against Houdini+TB. But not against Fruit...
That said, there is a scenario where SF could do better against weak opponents without losing strength against strong opponents: once it has determined that all moves lead to a TB loss, there is nothing to lose and it could re-search the position without TBs.
-
- Posts: 3293
- Joined: Wed Mar 08, 2006 8:15 pm
Re: Top 3 engines have TB implementations obeying Bible mora
Obviously TBs give even less ELO gain than previously thought. Maybe 0-3 ELO for 5 piece and 4-6 ELO for 6 piece? And it cannot measured in selfplay properly!
Jouni
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Top 3 engines have TB implementations obeying Bible mora
Yes, it does seem to have some effect. I tested in the same conditions with the same 7-8-9-men fairly unbalanced suite of openings against an about 1000 ELO points weaker engine Zurichess Bern (no TBs):petero2 wrote:Texel contains TB swindle code but I have not tried to measure how efficient it is in terms of ELO.Laskos wrote:Imagine a cunning, swindling implementation of Syzygy against hapless Fruit. I would think it can gain 100+ ELO points compared to NO TB instead of losing 10 ELO points.
Code: Select all
Rank Name ELO +/- Games Score Draws
3 Zurichess Bern -169 10 2000 27% 51%
1 Texel_Syzygy_Gaviota 173 15 1000 73% 50%
2 Texel NO TB 165 15 1000 72% 51%
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Top 3 engines have TB implementations obeying Bible mora
Well, one has to find such time controls, opening suite and strength difference, and it's probably a rare event, while TBs losing ELO points compared to non-TB against a much weaker engine seems systematic.syzygy wrote:Yes, if you increase SF's thinking time, also without TBs at some point it will start doing worse against Fruit in positions that are theoretically drawn or theoretically lost.Laskos wrote:Yes, but the same applies to simply stronger engine without any TBs. Say Stockfish at 0.35s/move will perform better in these conditions against Fruit at 0.25s/move than the same Stockfish at 0.25s/move.syzygy wrote:The problem here is that knowing too much (and expecting the opponent to know the same) can hurt against an engine that knows a lot less.
I took BrainFish NO TB at 0.35s/move instead of the old 0.25s/move, against the same Fruit 2.1 at 0.25s/move from the same suite, and I got:
Score of Fruit 2.1 vs BrainFish NO TB 0.35s/move: 54 - 427 - 519 [0.314] 1000
ELO difference: -136.16 +/- 14.62
Finished match
At 0.25s/move all engines, the results were:
Code: Select all
ELO +/- Games Score Draws
7 Fruit 2.1 -116 6 6000 34% 56%
1 BrainFish NO TB 127 14 1000 67% 55%
2 Komodo 11.2 NO TB 119 14 1000 66% 58%
3 Houdini_602 NO TB 116 14 1000 66% 55%
4 Komodo 11.2 Syzygy-6 113 14 1000 66% 57%
5 BrainFish Syzygy-6 112 14 1000 66% 56%
6 Houdini_602 Syzygy-6 108 14 1000 65% 56%
Finished match
Regular search is itself very moral. Swindling can be done in some clever way from root positions in TB, just in case the opposing engine doesn't have Syzygy too.If the root position is in the TBs, swindling is relatively easy: use the TBs only to avoid moves that change the game-theoretic outcome and let the regular search do the rest. This is essentially what SF does.Swindling with TBs is different, an engine using "TB swindling" can be made to theoretically not lose anything, but possibly gain in score significantly.
Things are more complicated if the root position is not yet in the TBs. If the opponent does not use TBs, then a line leading to a complicated 6-piece loss might give the best practical chances to draw. But if the opponent uses TBs, that same complicated 6-piece TB loss is a certain loss with zero practical chances.
Current SF+TB basically assumes that the opponent plays TB endings as well as it can play them itself. This is probably the optimal strategy against Komodo+TB and against Houdini+TB. But not against Fruit...
That said, there is a scenario where SF could do better against weak opponents without losing strength against strong opponents: once it has determined that all moves lead to a TB loss, there is nothing to lose and it could re-search the position without TBs.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Top 3 engines have TB implementations obeying Bible mora
It can be measured in self play from regular openings, and it's probably higher in self-play than your numbers. Just that TB implementations are too tame against very weak engines with no TBs.Jouni wrote:Obviously TBs give even less ELO gain than previously thought. Maybe 0-3 ELO for 5 piece and 4-6 ELO for 6 piece? And it cannot measured in selfplay properly!
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Top 3 engines have TB implementations obeying Bible mora
I took an even weaker engine, Zurichess Bern, about 1200 ELO points weaker than top dogs, in the same conditions, and got even more spectacular results:Laskos wrote:More precisely, top 3 engines using Syzygy-6 are following two of the "Ten Commandments":
"Thou shalt not kill"
"Thou shalt not covet"
Yesterday night, improvising on how to improve the sensitivity of my fairly unbalanced 7-8-9 men openings suite to 6-men Syzygy (on a fast SSD), I took a plunge to leave overnight my desktop play games (0.25s/move) from this suite of top 3 engines enabled with 6-men Syzygy, and the same engines without any TBs, in a gauntlet against a much weaker engine, Fruit 2.1 (800 or so ELO points weaker). My idea was that the ELO benefit due to Syzygy-6 will increase, but error margins will also increase, and after the test, I will see if all in all the sensitivity increases (ELO difference over error margins).
Well, when seeing the results in Cutechess-Cli, first I thought I did something wrong with my batch file. Checking and re-checking everything, I came to conclusion that the Syzygy implementation in top 3 engines are well mannered, almost pious.
The result in Cutechess-Cli:
The correct pentanomial error margins are about 2 times smaller than those shown in Cutechess. Combining the results: 3 engines enabled with 6-men Syzygy are WEAKER than 3 engines NO TB by 10 +/- 4 ELO points against Fruit 2.1 on 7-8-9 men suite.Code: Select all
ELO +/- Games Score Draws 7 Fruit 2.1 -116 6 6000 34% 56% 1 BrainFish NO TB 127 14 1000 67% 55% 2 Komodo 11.2 NO TB 119 14 1000 66% 58% 3 Houdini_602 NO TB 116 14 1000 66% 55% 4 Komodo 11.2 Syzygy-6 113 14 1000 66% 57% 5 BrainFish Syzygy-6 112 14 1000 66% 56% 6 Houdini_602 Syzygy-6 108 14 1000 65% 56% Finished match
Conclusions:
Engines NO TB are themselves not very cunning in killing the weak. Contempt=0 was set in all 6 TB or NON TB engines.
Engines 6-men Syzygy are simply moral fanatics to perform weaker against Fruit 2.1 than NO TB engines. When top 3 are facing each other as worthy opponents, as we saw earlier, the benefit of 6-men Syzygy is on average about +30 ELO points from this suite. Imagine a cunning, swindling implementation of Syzygy against hapless Fruit. I would think it can gain 100+ ELO points compared to NO TB instead of losing 10 ELO points.
Code: Select all
Rank Name ELO +/- Games Score Draws
7 Zurichess Bern -178 6 6000 26% 49%
1 BrainFish NO TB 196 15 1000 76% 46%
2 Komodo 11.2 NO TB 186 15 1000 74% 47%
3 Houdini_602 NO TB 185 15 1000 74% 48%
4 Komodo 11.2 Syzygy-6 172 14 1000 73% 51%
5 BrainFish Syzygy-6 167 14 1000 72% 51%
6 Houdini_602 Syzygy-6 160 14 1000 72% 52%
Finished match
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: Top 3 engines have TB implementations obeying Bible mora
Chess engine development is based on the idea that elo improvements are additive. The experience with fishtest shows that to a large extent this hypothesis is correct. However Kai's examples show it is not an absolute truth as
elo(TB vs no-TB)+elo(no-TB vs Fruit) != elo(TB vs Fruit)
Contempt, if it works at advertised, also causes non-additivity.
As far as I know no one has reported non-transitivity yet. I.e.
engineA is stronger than engineB
engineB is stronger than engineC
engineC is stronger than engineA
Non-transitivity has been reported for opening books, but I do not know how reliable this information is. A well-known game exhibiting non-transitive behavior is Penney's game https://en.wikipedia.org/wiki/Penney%27s_game .
Perhaps one can make a non-transitive example by starting from a non-additive example and then tweak the time controls for each engine so that they become approximately of equal strength.
Instead of tweaking the time controls one can also tweak the engines' nps if one has access to their source code.
elo(TB vs no-TB)+elo(no-TB vs Fruit) != elo(TB vs Fruit)
Contempt, if it works at advertised, also causes non-additivity.
As far as I know no one has reported non-transitivity yet. I.e.
engineA is stronger than engineB
engineB is stronger than engineC
engineC is stronger than engineA
Non-transitivity has been reported for opening books, but I do not know how reliable this information is. A well-known game exhibiting non-transitive behavior is Penney's game https://en.wikipedia.org/wiki/Penney%27s_game .
Perhaps one can make a non-transitive example by starting from a non-additive example and then tweak the time controls for each engine so that they become approximately of equal strength.
Instead of tweaking the time controls one can also tweak the engines' nps if one has access to their source code.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Top 3 engines have TB implementations obeying Bible mora
I am amazed that the idea that your tests are deeply flawed does not cross your mind even for a moment.Laskos wrote:Combining the results: 3 engines enabled with 6-men Syzygy are WEAKER than 3 engines NO TB