My troubles with MultiPV and Syzygy in Stockfish 7

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: My troubles with MultiPV and Syzygy in Stockfish 7

Post by syzygy »

petero2 wrote:Another difference is that texel has "TB swindle mode" permanently enabled, which means that it does not give a 0 score for a TB draw. Cursed wins are scored between 0.35 and 0.70 depending on how far away they are from a real win, and "normal draws" are scored between 0 and 0.34 depending on how good texel's normal search+eval think the position is. (And correspondingly for negative scores between 0 and -0.70).
This is not going to help if both engines play with TBs. SF+TB will know that a cursed win is a draw and that a blessed loss is a draw.

SF+TB does have a slight preference for reaching a cursed TB win over a TB draw and for reaching a TB draw over a cursed TB loss, but against an engine that correctly uses the same TBs that will not help at all. SF won't be able to swindle the opponent engine and SF won't be swindled by the opponent engine.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: My troubles with MultiPV and Syzygy in Stockfish 7

Post by syzygy »

Laskos wrote:Stockfish play even enabled with 5-men bases is far from perfect, in fact it misses about a half of the 6-men hard Wins. I am not saying, though, that the effect described by you is necessarily small, that's why I asked Peter whether it is possible Texel has a different game-play implementation (not just aesthetically different).
Once a 5-men position is reached, and assuming that Texel's implementation is correct, both engines will play out the game perfectly.

Before the 5-men position is reached, SF gets "perfect information" out of the TBs as well. Assuming probedepth is kept at 1 and the TBs are essentially fully cached in RAM (pretty likely with 5-piece TBs), the only thing that could give Texel an advantage is if it probed even more aggressively, i.e. also in the qsearch. I doubt that Texel does that (but did not check). And I'm not convinced that probing in qsearch would actually improve play (I think the Komodo team has tested this and did not find it to be a win).

The "optimality" of SF's TB implementation, when playing an engine correctlly using the same TBs, is almost "mathematically" certain.

Anyway, it is a bit surprising that 5-men TBs have a big effect on 6-men play. I would expect 5-piece positions are not relevant for most hard 6-men positions. The trick will usually be to force (or prevent) an exchange into a won/drawn/lost 4-men position (which will normally be just 1 forced ply away from the intermediate 5-men position). I assume both SF and Texel evaluate and play almost all 4-men positions perfectly.

But maybe 4-men TBs like KBNvK, KQvKR, KQvKP play a big role in your hard 6-men positions. It might be interesting to repeat your test with 4-men TBs only. The result might be close to your 5-men TB test.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: My troubles with MultiPV and Syzygy in Stockfish 7

Post by Laskos »

syzygy wrote:
Laskos wrote:Stockfish play even enabled with 5-men bases is far from perfect, in fact it misses about a half of the 6-men hard Wins. I am not saying, though, that the effect described by you is necessarily small, that's why I asked Peter whether it is possible Texel has a different game-play implementation (not just aesthetically different).
Once a 5-men position is reached, and assuming that Texel's implementation is correct, both engines will play out the game perfectly.

Before the 5-men position is reached, SF gets "perfect information" out of the TBs as well. Assuming probedepth is kept at 1 and the TBs are essentially fully cached in RAM (pretty likely with 5-piece TBs), the only thing that could give Texel an advantage is if it probed even more aggressively, i.e. also in the qsearch. I doubt that Texel does that (but did not check). And I'm not convinced that probing in qsearch would actually improve play (I think the Komodo team has tested this and did not find it to be a win).

The "optimality" of SF's TB implementation, when playing an engine correctlly using the same TBs, is almost "mathematically" certain.

Anyway, it is a bit surprising that 5-men TBs have a big effect on 6-men play. I would expect 5-piece positions are not relevant for most hard 6-men positions. The trick will usually be to force (or prevent) an exchange into a won/drawn/lost 4-men position (which will normally be just 1 forced ply away from the intermediate 5-men position). I assume both SF and Texel evaluate and play almost all 4-men positions perfectly.

But maybe 4-men TBs like KBNvK, KQvKR, KQvKP play a big role in your hard 6-men positions. It might be interesting to repeat your test with 4-men TBs only. The result might be close to your 5-men TB test.
Interesting, I performed the proposed test from hard 6-men positions:

Code: Select all

Rank Name                          ELO   Games   Score   Draws
   1 SF Syzygy345                   55    1436     58%     39%
   2 SF Syzygy34                   -18    1436     47%     47%
   3 SF No TB                      -37    1436     45%     47%
It seems 5-men are contributing decisively.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: My troubles with MultiPV and Syzygy in Stockfish 7

Post by syzygy »

Laskos wrote:Interesting, I performed the proposed test from hard 6-men positions:

Code: Select all

Rank Name                          ELO   Games   Score   Draws
   1 SF Syzygy345                   55    1436     58%     39%
   2 SF Syzygy34                   -18    1436     47%     47%
   3 SF No TB                      -37    1436     45%     47%
It seems 5-men are contributing decisively.
Very interesting.

I see two explanations:
- in a significant number of positions the winning strategy involves capturing or sacrificing the right pawn at the right moment (without an immediate recapture), or
- the availability of 5-men TBs allows SF to reach much higher depths because each capture results in an immediate cutoff.

Of course it could be a combination of both.
petero2
Posts: 688
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: My troubles with MultiPV and Syzygy in Stockfish 7

Post by petero2 »

Laskos wrote:
petero2 wrote:
Laskos wrote:I also tried to tentatively test the quality of implementation by using a file of hard 6-men Wins having only 5-men Syzygy. On these openings, the sensitivity to Syzygy is high. It seems Syzygy helps Texel significantly more than they help Stockfish:

Code: Select all

Score of SF7 No TB vs Texel 1.06 No TB: 291 - 74 - 353  [0.651] 718
ELO difference: 108
Finished match

Score of SF7 Syzygy vs Texel 1.06 Syzygy: 257 - 166 - 295  [0.563] 718
ELO difference: 44
Finished match
Do you have a reason for Texel implementation of Syzygy being better than Stockfish one?
I would like Ronald guess that this is mainly caused by texel being weaker to begin with.

Another difference is that texel has "TB swindle mode" permanently enabled, which means that it does not give a 0 score for a TB draw. Cursed wins are scored between 0.35 and 0.70 depending on how far away they are from a real win, and "normal draws" are scored between 0 and 0.34 depending on how good texel's normal search+eval think the position is. (And correspondingly for negative scores between 0 and -0.70).

I don't know if this swindle mode gives any elo advantage in your setup. I can prepare a special version that lets you disable swindle mode if you want to test this theory.
Yes, I will test it in the evening.
Here is a test version of texel with an extra UCI option "TBSwindle". If you set it to false all TB probes that return "draw" are evaluated as 0. Note that this is still not equivalent to what stockfish does, because Ronald said stockfish prefers cursed wins over "regular" draws.
petero2
Posts: 688
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: My troubles with MultiPV and Syzygy in Stockfish 7

Post by petero2 »

syzygy wrote:
petero2 wrote:Another difference is that texel has "TB swindle mode" permanently enabled, which means that it does not give a 0 score for a TB draw. Cursed wins are scored between 0.35 and 0.70 depending on how far away they are from a real win, and "normal draws" are scored between 0 and 0.34 depending on how good texel's normal search+eval think the position is. (And correspondingly for negative scores between 0 and -0.70).
This is not going to help if both engines play with TBs. SF+TB will know that a cursed win is a draw and that a blessed loss is a draw.

SF+TB does have a slight preference for reaching a cursed TB win over a TB draw and for reaching a TB draw over a cursed TB loss, but against an engine that correctly uses the same TBs that will not help at all. SF won't be able to swindle the opponent engine and SF won't be swindled by the opponent engine.
I think it could in theory help if the root position is not in the TB and texel searches deep enough to hit the TB but the opponent does not search deep enough to hit the TB. Texel's swindle mode is enabled even if the root position is not a TB position.

In practice though this seems extremely unlikely given that stockfish is much stronger than texel and texel (like stockfish) only probes the TB when remaining depth is >= 1.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: My troubles with MultiPV and Syzygy in Stockfish 7

Post by syzygy »

petero2 wrote:Here is a test version of texel with an extra UCI option "TBSwindle". If you set it to false all TB probes that return "draw" are evaluated as 0. Note that this is still not equivalent to what stockfish does, because Ronald said stockfish prefers cursed wins over "regular" draws.
SF evaluates cursed TB wins found within the search tree as 0.01, blessed losses as -0.01.

As I said, if the engine is a playing another engine using the same TBs, then cursed TB wins, TB draws and blessed TB losses will (or at least should) all end in a draw. But it would be silly to let the engine voluntarily get itself into a blessed loss situation if it could also go for a regular draw.

In general it is certainly better to award a higher score to a cursed win. Most opponents won't be able to defend such positions. (But cursed losses, from the TB-using engine's point of view, should of course be evaluated as close to a draw as possible, since the engine will be able to defend that draw.)
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: My troubles with MultiPV and Syzygy in Stockfish 7

Post by syzygy »

petero2 wrote:
syzygy wrote:
petero2 wrote:Another difference is that texel has "TB swindle mode" permanently enabled, which means that it does not give a 0 score for a TB draw. Cursed wins are scored between 0.35 and 0.70 depending on how far away they are from a real win, and "normal draws" are scored between 0 and 0.34 depending on how good texel's normal search+eval think the position is. (And correspondingly for negative scores between 0 and -0.70).
This is not going to help if both engines play with TBs. SF+TB will know that a cursed win is a draw and that a blessed loss is a draw.

SF+TB does have a slight preference for reaching a cursed TB win over a TB draw and for reaching a TB draw over a cursed TB loss, but against an engine that correctly uses the same TBs that will not help at all. SF won't be able to swindle the opponent engine and SF won't be swindled by the opponent engine.
I think it could in theory help if the root position is not in the TB and texel searches deep enough to hit the TB but the opponent does not search deep enough to hit the TB. Texel's swindle mode is enabled even if the root position is not a TB position.
Well, if Texel finds a cursed win deep in the tree and manages to steer the game to that cursed win position, against SF+TB the game is certain to end in a draw. This does not depend on how well SF+TB searches but results from SF+TB being able to defend that (from its point of view) blessed loss.

Against SF without TBs it will of course help to prefer cursed TB wins over TB draws.

So your approach seems sound. Except that I would personally not recommend to probe DTZ tables from within the search (and if you do it, make sure to probe them under a lock).
petero2
Posts: 688
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: My troubles with MultiPV and Syzygy in Stockfish 7

Post by petero2 »

syzygy wrote:In general it is certainly better to award a higher score to a cursed win. Most opponents won't be able to defend such positions. (But cursed losses, from the TB-using engine's point of view, should of course be evaluated as close to a draw as possible, since the engine will be able to defend that draw.)
I agree, but treating cursed wins differently for texel and its opponent would introduce an asymmetry that makes the search a bit more complicated. I have not tried to handle that yet in texel, but I suppose I should implement it some time in the future.

Another weakness in texel's swindle logic is that in a lost position texel tries to maximize DTM, although in practice it would probably often be better to maximize DTZ, in the hope that the opponent plays inaccurately and lets a real win get converted to a cursed win.
petero2
Posts: 688
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: My troubles with MultiPV and Syzygy in Stockfish 7

Post by petero2 »

syzygy wrote:
petero2 wrote:
syzygy wrote:
petero2 wrote:Another difference is that texel has "TB swindle mode" permanently enabled, which means that it does not give a 0 score for a TB draw. Cursed wins are scored between 0.35 and 0.70 depending on how far away they are from a real win, and "normal draws" are scored between 0 and 0.34 depending on how good texel's normal search+eval think the position is. (And correspondingly for negative scores between 0 and -0.70).
This is not going to help if both engines play with TBs. SF+TB will know that a cursed win is a draw and that a blessed loss is a draw.

SF+TB does have a slight preference for reaching a cursed TB win over a TB draw and for reaching a TB draw over a cursed TB loss, but against an engine that correctly uses the same TBs that will not help at all. SF won't be able to swindle the opponent engine and SF won't be swindled by the opponent engine.
I think it could in theory help if the root position is not in the TB and texel searches deep enough to hit the TB but the opponent does not search deep enough to hit the TB. Texel's swindle mode is enabled even if the root position is not a TB position.
Well, if Texel finds a cursed win deep in the tree and manages to steer the game to that cursed win position, against SF+TB the game is certain to end in a draw. This does not depend on how well SF+TB searches but results from SF+TB being able to defend that (from its point of view) blessed loss.
I guess steering towards a cursed win against a weak opponent that is using TB could increase the chances of the opponent making a mistake before the position gets into the TB, and that that mistake would let texel find a forced TB win. I think this is extremely unlikely to happen when texel plays against stockfish though.
syzygy wrote:So your approach seems sound. Except that I would personally not recommend to probe DTZ tables from within the search (and if you do it, make sure to probe them under a lock).
I have modified the syzygy code quite a bit in texel, so DTZ probes are lockless in the same way WDL probes are. A lock is only needed when a DTZ file has to be mmapped. There is still a performance hit of course because the DTZ files are one-sided, and because they require probing the corresponding WDL table first.

The texel probing code tries to only probe DTZ (and DTM) tables when it is really needed though. Sometimes (depending on alpha, beta, and the WDL probe result), the DTZ probe can be proven unnecessary even when texel is trying to optimize the DTM score.