Top 3 engines have TB implementations obeying Bible morals

Laskos · Post by **Laskos** » Tue Oct 10, 2017 5:43 pm

More precisely, top 3 engines using Syzygy-6 are following two of the "Ten Commandments":

"Thou shalt not kill"
"Thou shalt not covet"

Yesterday night, improvising on how to improve the sensitivity of my fairly unbalanced 7-8-9 men openings suite to 6-men Syzygy (on a fast SSD), I took a plunge to leave overnight my desktop play games (0.25s/move) from this suite of top 3 engines enabled with 6-men Syzygy, and the same engines without any TBs, in a gauntlet against a much weaker engine, Fruit 2.1 (800 or so ELO points weaker). My idea was that the ELO benefit due to Syzygy-6 will increase, but error margins will also increase, and after the test, I will see if all in all the sensitivity increases (ELO difference over error margins).

Well, when seeing the results in Cutechess-Cli, first I thought I did something wrong with my batch file. Checking and re-checking everything, I came to conclusion that the Syzygy implementation in top 3 engines are well mannered, almost pious.

The result in Cutechess-Cli:

Code: Select all

                                   ELO     +/-   Games   Score   Draws

   7 Fruit 2.1                    -116       6    6000     34%     56%

   1 BrainFish NO TB               127      14    1000     67%     55%
   2 Komodo 11.2 NO TB             119      14    1000     66%     58%
   3 Houdini_602 NO TB             116      14    1000     66%     55%
   4 Komodo 11.2 Syzygy-6          113      14    1000     66%     57%
   5 BrainFish Syzygy-6            112      14    1000     66%     56%
   6 Houdini_602 Syzygy-6          108      14    1000     65%     56%

Finished match

The correct pentanomial error margins are about 2 times smaller than those shown in Cutechess. Combining the results: 3 engines enabled with 6-men Syzygy are WEAKER than 3 engines NO TB by 10 +/- 4 ELO points against Fruit 2.1 on 7-8-9 men suite.

Conclusions:
Engines NO TB are themselves not very cunning in killing the weak. Contempt=0 was set in all 6 TB or NON TB engines.
Engines 6-men Syzygy are simply moral fanatics to perform weaker against Fruit 2.1 than NO TB engines. When top 3 are facing each other as worthy opponents, as we saw earlier, the benefit of 6-men Syzygy is on average about +30 ELO points from this suite. Imagine a cunning, swindling implementation of Syzygy against hapless Fruit. I would think it can gain 100+ ELO points compared to NO TB instead of losing 10 ELO points.

jhellis3 · Post by **jhellis3** » Tue Oct 10, 2017 5:50 pm

And what happens when you use a time control that people might actually use in the real world?

Not to mention give the defending engine access to TBs... (fruit committing sepukku is hardly a reflection of the other engines).

Laskos · Post by **Laskos** » Tue Oct 10, 2017 5:56 pm

jhellis3 wrote:And what happens when you use a time control that people might actually use in the real world?

0.25s/move is real world for Stockfish Testing Framework, 10''+ 0.1'' is on average 0.25s/move, but I don't need time control subtleties of often idiotic Stockfish tests from 2moves_v1.pgn for every sort pf patch, here are 7-8-9 men endgames, not full games. Also, I have no any time losses with the settings I use in Cutechess-Cli, even if latter Stockfishes have time losses issue.

jhellis3 · Post by **jhellis3** » Tue Oct 10, 2017 6:00 pm

Real world for end users. Nobody is seriously analyzing games/positions at 0.25 seconds per move. Period.

You continually perform "scientific" tests and come to some rather bizarre conclusions, holding the subjects of your tests completely responsible without ever considering that maybe it is your procedures or perspective which is in error.

But I am sure that little kernel of truth won't stop you....

Laskos · Post by **Laskos** » Tue Oct 10, 2017 6:07 pm

jhellis3 wrote:Real world for end users. Nobody is seriously analyzing games/positions at 0.25 seconds per move. Period.

You continually perform "scientific" tests and come to some rather bizarre conclusions, holding the subjects of your tests completely responsible without ever considering that maybe it is your procedures or perspective which is in error.

But I am sure that little kernel of truth won't stop you....

I hope you won't start spamming this thread.

Volker Pittlik · Post by **Volker Pittlik** » Tue Oct 10, 2017 6:40 pm

jhellis3 wrote:Real world for end users. Nobody is seriously analyzing games/positions at 0.25 seconds per move. Period. ...

I do. I even use faster TCs! It always depends on what I want to test. The alternatives would be to invest years of my time or more money I'm willing to spend for this hobby. The quality of the games played do not devaluate the results of the test if the test is designed properly.

Volker

syzygy · Post by **syzygy** » Tue Oct 10, 2017 6:46 pm

The problem here is that knowing too much (and expecting the opponent to know the same) can hurt against an engine that knows a lot less.

jhellis3 · Post by **jhellis3** » Tue Oct 10, 2017 6:53 pm

I do.

No, you don't. Re-read what I actually wrote. Think about it. Think some more. Or not.... IDGAF.

Volker Pittlik · Post by **Volker Pittlik** » Tue Oct 10, 2017 7:36 pm

jhellis3 wrote:
I do.
No, you don't. ..

Happy Easter!

Laskos · Post by **Laskos** » Tue Oct 10, 2017 8:00 pm

syzygy wrote:The problem here is that knowing too much (and expecting the opponent to know the same) can hurt against an engine that knows a lot less.

Yes, but the same applies to simply stronger engine without any TBs. Say Stockfish at 0.35s/move will perform better in these conditions against Fruit at 0.25s/move than the same Stockfish at 0.25s/move. Second issue is that swindling into unknown territory with heuristic not very decisive scores (the simplest form of that swindling is Contempt) cannot accommodate all the opponents to get a better score, a positive Contempt will be good against weaker opponents, but will be bad against equal or stronger opponents. Swindling with TBs is different, an engine using "TB swindling" can be made to theoretically not lose anything, but possibly gain in score significantly.

So, the play with TB implementations in these 3 top engines seems not only more Biblical than the NO TB heuristic engines' play, but the potential of cunning and swindling play in endgames with TBs is much higher than without TBs.

Top 3 engines have TB implementations obeying Bible morals

Top 3 engines have TB implementations obeying Bible morals

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora

Re: Top 3 engines have TB implementations obeying Bible mora