Stockfish Rollercoaster Effect

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Stockfish Rollercoaster Effect

Post by Ovyron »

Stockfish's Contempt implementation seems to be causing problems to users due to the "Rollercoaster Effect", where, on a given position, analyzing with white you may get some 0.30 score (roller coaster goes up) but after you make the move, Stockfish shows some -0.30 score (roller coaster goes down).

People are suggesting that the solution is to turn analysis contempt to 0 and be done with it. In this thread, I claim that's wrong, that analysis contempt is good, and that it's the users that need to learn to analyze with Contempt on. I also suggest code changes to the engine that would make the transition to this paradigm easier.

From here:
BeyondCritics wrote: Sun Dec 02, 2018 7:56 pm
MikeB wrote: Sun Dec 02, 2018 7:50 pm
BeyondCritics wrote: Wed Nov 21, 2018 8:12 pm Please can you explain, why setting "Contempt=Both" is so unbearable in analysis mode?
At the very least it relieves us from useless 3-fold repetitions in the pv, which were really a plague, as you might recall.
You can't use it in analysis mode when playing moves back and forth, it pollutes the hash table and it takes much longer to get an accurate score. This is because with "Both" on , the score is different from the white POV as compared to the black POV . You like "Both", that's fine, different strokes for different folks, but for many of us, for the way we use the engine, it's totally useless.
I fully agree, it pollutes the hash table and this is likely _the_ serious issue with it. At the time i wrotes this, i wasn't really into the source code, so i just asked.
Nope! The reason you're getting that behavior is that you're using the engine wrong. This paradigm shift introduced by Stockfish Contempt is that White and Black should hold different evaluations.

There's no such thing as a correct evaluation of a position

(I mean, other than 0.00 for for mostly everything and +M or -M otherwise, because a side is getting mated)

What you should be doing is opening 2 instances of your GUI, each with a Stockfish loaded, you'll analyze White on the first instance, and Black with the second instance. This will give you the best move choices, and each instance will not pollute the other one.

I actually have a private Stockfish where this isn't necessary because it has different hashes for each side, but more on that later.

The old paradigm is flawed

The old paradigm says that if a position is advantageous for white then it's disadvatageous for black. This has huge problems in equal positions, and, suprise, surprise! Most postions are near equal. Actually, if one of the sides already has some significant advantage, so much that Stockfish Contempt says this side has the edge, no matter what side you're analyzing, both paradigms are in agreement, it doesn't matter which one you use.

Otherwise:

You're on an equal position (though this is harder to achieve for black because of white's 1 ply advantage), this means what you want to do is increase your winning chances.

On the old paradigm, a position that would be 20% win for each side with 60% draws would score as 0.00 (the advantages cancel out), while a position that would be 30% win for each side with 40% draws would score as 0.00 as well! The engine can't tell the difference between the moves, its choice will be random. This isn't good. C=0 will approach this behavior.

In the new paradigm, the 30% move increases the chances of white winning, and increases the chance of black winning, so it makes sense that white would show a bigger score for itself, and black would show a bigger score for itself, causing the roller coaster effect. The problem is the user wanting to use white's score for black, and black's score for white.

The new paradigm is right, and that's why it leads to better move choices and more elo.

But Dann Corbit is being lied to

Oh, apparently Dann is a big proponent of Contempt=0 and his biggest argument is that, if the "correct score" of a position is 0.00 (because no side has more chance to win than the other) and the engine shows 0.16 or -0.16 for either side, it's a lie.

No, you just have different score for each side.

Now, if white thinks the position is 0.16, and black thinks the position is -0.16, both sides would be happy aiming for this position. While the old paradigm would say that any advantage white gets reduces black's advantage, the new one claims both sides can have the advantage.

The scores are actually irrelevant

The only relevant thing about a chess position is the centipawn's difference between the moves. If the difference is 0, both moves are of similar quality. This allows one to sort the moves from best to worse. If the best move's score falls behind the second move's, the second one becomes best. If any of the non-best moves' score raises up to be higher than the top move's score, it becomes best.

And that's it, the actual scores don't matter in a position, and it's the same thing if the engine shows 1.16 or 0.16, what you care about is that second best is not higher than the top move's score. The only reason we want scores close to 0.00 is so that some 1.16 score move in a later position doesn't lead to a side wanting to reach it at all costs (because all of white's allternatives before this are <1.16), but an engine that multiplied all scores by 10 (and showed 1.60 for the opening position) would be functionally the same (even though the users wouldn't know what to do if black's responses showed some -1.40 scores, they'd deem the wiggling too strong due to magnitude.)

Knowing this, is it possible to make it so that we get all the benefits of the new paradigm without confusing users? I think so.

Fixing the Rollercoaster Effect

Since this is about the magnitude, would users be happy if the swings went from 0.01 to -0.01 instead? My proposal is to just have 2 new configurable settings that does not change the engine's functionality, only what the user sees:

White Score Offset
Black Score Offset

This is a value, in centipawns, that would be added, or subtracted, from the scores of each side before showing it to the user. Each user would be able to configue how wild does the rollercoaster go, while still enjoying the Contempt's better move choices.

The Draw Problem

A problem with offsets is that you can't really implement them while keeping the same functionality, as that would cause draws to be scored weirdly (say White Score Offset -15 makes draws be shown with -0.15 value). Keeping the draw value as 0.00 would cause a change in functionality (say White Score Offset -15 makes a move that after Contempt is 0.14 be -0.01, so Stockfish would rather go for one that is 0.00 from a draw instead.)

The solution to this is to have one last setting:

Draw Value

Because, having a draw value being 0.00 is arbitrary anyway. With this setting, new magical things are available, like setting Draw value to 200 and seeing Stockfish abandon any advantage less than 200.00 if it can force a draw (Drawfinder), or being able to use very high Contempt, but still go for a forced draw if it's score isn't high enough, or being able to use low Contempt, so Stockfish tries to play in a drawish style, but still avoid draws on its face if the draw value is set very low. All things that one can't do currently.

Note: Most likely we'd need White Draw Value and Black Draw Value for this to work.

I have never seen an engine that has Contempt (Houdini solution), Score Offset (Rybka 4 solution) and Draw Value (Komodo solution) all in one engine, I believe some settings could be found so that in games Stockfish plays functionally the same (same move choices) while keeping everyone happy with much better rollercoasting :)

The Double Hash

Finally, as I've said, a double hash could be implemented, so that this paradigm can function without the user needing to fire up an instance for each side. Stockfish would just keep one hash file for white and another for black and use them accordingly so one can analyze without polluting the hash table.
Your beliefs create your reality, so be careful what you wish for.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Stockfish Rollercoaster Effect

Post by MikeB »

Ovyron wrote: Mon Dec 03, 2018 1:45 am Stockfish's Contempt implementation seems to be causing problems to users due to the "Rollercoaster Effect", where, on a given position, analyzing with white you may get some 0.30 score (roller coaster goes up) but after you make the move, Stockfish shows some -0.30 score (roller coaster goes down).

People are suggesting that the solution is to turn analysis contempt to 0 and be done with it. In this thread, I claim that's wrong, that analysis contempt is good, and that it's the users that need to learn to analyze with Contempt on. I also suggest code changes to the engine that would make the transition to this paradigm easier.

From here:
BeyondCritics wrote: Sun Dec 02, 2018 7:56 pm
MikeB wrote: Sun Dec 02, 2018 7:50 pm
BeyondCritics wrote: Wed Nov 21, 2018 8:12 pm Please can you explain, why setting "Contempt=Both" is so unbearable in analysis mode?
At the very least it relieves us from useless 3-fold repetitions in the pv, which were really a plague, as you might recall.
You can't use it in analysis mode when playing moves back and forth, it pollutes the hash table and it takes much longer to get an accurate score. This is because with "Both" on , the score is different from the white POV as compared to the black POV . You like "Both", that's fine, different strokes for different folks, but for many of us, for the way we use the engine, it's totally useless.
I fully agree, it pollutes the hash table and this is likely _the_ serious issue with it. At the time i wrotes this, i wasn't really into the source code, so i just asked.
Nope! The reason you're getting that behavior is that you're using the engine wrong. This paradigm shift introduced by Stockfish Contempt is that White and Black should hold different evaluations.

There's no such thing as a correct evaluation of a position

(I mean, other than 0.00 for for mostly everything and +M or -M otherwise, because a side is getting mated)

What you should be doing is opening 2 instances of your GUI, each with a Stockfish loaded, you'll analyze White on the first instance, and Black with the second instance. This will give you the best move choices, and each instance will not pollute the other one.

I actually have a private Stockfish where this isn't necessary because it has different hashes for each side, but more on that later.

The old paradigm is flawed

The old paradigm says that if a position is advantageous for white then it's disadvatageous for black. This has huge problems in equal positions, and, suprise, surprise! Most postions are near equal. Actually, if one of the sides already has some significant advantage, so much that Stockfish Contempt says this side has the edge, no matter what side you're analyzing, both paradigms are in agreement, it doesn't matter which one you use.

Otherwise:

You're on an equal position (though this is harder to achieve for black because of white's 1 ply advantage), this means what you want to do is increase your winning chances.

On the old paradigm, a position that would be 20% win for each side with 60% draws would score as 0.00 (the advantages cancel out), while a position that would be 30% win for each side with 40% draws would score as 0.00 as well! The engine can't tell the difference between the moves, its choice will be random. This isn't good. C=0 will approach this behavior.

In the new paradigm, the 30% move increases the chances of white winning, and increases the chance of black winning, so it makes sense that white would show a bigger score for itself, and black would show a bigger score for itself, causing the roller coaster effect. The problem is the user wanting to use white's score for black, and black's score for white.

The new paradigm is right, and that's why it leads to better move choices and more elo.

But Dann Corbit is being lied to

Oh, apparently Dann is a big proponent of Contempt=0 and his biggest argument is that, if the "correct score" of a position is 0.00 (because no side has more chance to win than the other) and the engine shows 0.16 or -0.16 for either side, it's a lie.

No, you just have different score for each side.

Now, if white thinks the position is 0.16, and black thinks the position is -0.16, both sides would be happy aiming for this position. While the old paradigm would say that any advantage white gets reduces black's advantage, the new one claims both sides can have the advantage.

The scores are actually irrelevant

The only relevant thing about a chess position is the centipawn's difference between the moves. If the difference is 0, both moves are of similar quality. This allows one to sort the moves from best to worse. If the best move's score falls behind the second move's, the second one becomes best. If any of the non-best moves' score raises up to be higher than the top move's score, it becomes best.

And that's it, the actual scores don't matter in a position, and it's the same thing if the engine shows 1.16 or 0.16, what you care about is that second best is not higher than the top move's score. The only reason we want scores close to 0.00 is so that some 1.16 score move in a later position doesn't lead to a side wanting to reach it at all costs (because all of white's allternatives before this are <1.16), but an engine that multiplied all scores by 10 (and showed 1.60 for the opening position) would be functionally the same (even though the users wouldn't know what to do if black's responses showed some -1.40 scores, they'd deem the wiggling too strong due to magnitude.)

Knowing this, is it possible to make it so that we get all the benefits of the new paradigm without confusing users? I think so.

Fixing the Rollercoaster Effect

Since this is about the magnitude, would users be happy if the swings went from 0.01 to -0.01 instead? My proposal is to just have 2 new configurable settings that does not change the engine's functionality, only what the user sees:

White Score Offset
Black Score Offset

This is a value, in centipawns, that would be added, or subtracted, from the scores of each side before showing it to the user. Each user would be able to configue how wild does the rollercoaster go, while still enjoying the Contempt's better move choices.

The Draw Problem

A problem with offsets is that you can't really implement them while keeping the same functionality, as that would cause draws to be scored weirdly (say White Score Offset -15 makes draws be shown with -0.15 value). Keeping the draw value as 0.00 would cause a change in functionality (say White Score Offset -15 makes a move that after Contempt is 0.14 be -0.01, so Stockfish would rather go for one that is 0.00 from a draw instead.)

The solution to this is to have one last setting:

Draw Value

Because, having a draw value being 0.00 is arbitrary anyway. With this setting, new magical things are available, like setting Draw value to 200 and seeing Stockfish abandon any advantage less than 200.00 if it can force a draw (Drawfinder), or being able to use very high Contempt, but still go for a forced draw if it's score isn't high enough, or being able to use low Contempt, so Stockfish tries to play in a drawish style, but still avoid draws on its face if the draw value is set very low. All things that one can't do currently.

Note: Most likely we'd need White Draw Value and Black Draw Value for this to work.

I have never seen an engine that has Contempt (Houdini solution), Score Offset (Rybka 4 solution) and Draw Value (Komodo solution) all in one engine, I believe some settings could be found so that in games Stockfish plays functionally the same (same move choices) while keeping everyone happy with much better rollercoasting :)

The Double Hash

Finally, as I've said, a double hash could be implemented, so that this paradigm can function without the user needing to fire up an instance for each side. Stockfish would just keep one hash file for white and another for black and use them accordingly so one can analyze without polluting the hash table.
Hi,

First, stating a new thread while quoting other threads is rather silly. This is really not an issues that would be successfully debated to a satisfactory conclusion for everyone. It's great that you found a way to use it in way that you like. I also have a way to use it to my satisfaction. Best is to amicably agree to disagree and move on.
Image
User avatar
Marek Soszynski
Posts: 581
Joined: Wed May 10, 2006 7:28 pm
Location: Birmingham, England

Re: Stockfish Rollercoaster Effect

Post by Marek Soszynski »

Ovyron wrote: Mon Dec 03, 2018 1:45 am
[...]

What you should be doing is opening 2 instances of your GUI, each with a Stockfish loaded, you'll analyze White on the first instance, and Black with the second instance. This will give you the best move choices, and each instance will not pollute the other one.

I actually have a private Stockfish where this isn't necessary because it has different hashes for each side, but more on that later.

[...]

The Double Hash

Finally, as I've said, a double hash could be implemented, so that this paradigm can function without the user needing to fire up an instance for each side. Stockfish would just keep one hash file for white and another for black and use them accordingly so one can analyze without polluting the hash table.
So, if I am running 2 Stockfishes, it isn't enough for their filenames to be different and for them to be in different directories: they will use the same hashfile? 2 GUIs are required.

And the same goes for their access to the one tablebase directory?
Marek Soszynski
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Stockfish Rollercoaster Effect

Post by corres »

Some note:
1, Every type of contempt weakens the engines.
2, In Stockfish we can switch off the the static contempt only but dynamic contempt can not be switched off.
For switching off dynamic contempt one ought to modify the source of Stockfish. This is a real issue what
should be solved by developers of Stockfish.
3, Everybody use Stockfish for analysis such a manner as he want and as he can.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Stockfish Rollercoaster Effect

Post by Ovyron »

Marek Soszynski wrote: Mon Dec 03, 2018 9:34 amSo, if I am running 2 Stockfishes, it isn't enough for their filenames to be different and for them to be in different directories: they will use the same hashfile? 2 GUIs are required.
I was talking about users that use the same Stockfish instance to analyze a game from both sides of the board, where using 2 GUIs would solve the problem (one GUI for white and one for black), but using 2 Stockfishes would also work so each side doesn't contaminate the other's hash.
Marek Soszynski wrote: Mon Dec 03, 2018 9:34 amAnd the same goes for their access to the one tablebase directory?
There's no problem with all your Stockfishes accessing the same directory.
Your beliefs create your reality, so be careful what you wish for.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Stockfish Rollercoaster Effect

Post by Ovyron »

corres wrote: Mon Dec 03, 2018 10:11 am1, Every type of contempt weakens the engines.
No, if this was the case it'd have lower elo on the rating lists. This contempt is just a draw-avoider, for analysis, the user is expected to take care of the cases where avoiding draw is worse than the alternatives, for the rest of scenarios Contempt is superior.
corres wrote: Mon Dec 03, 2018 10:11 am2, In Stockfish we can switch off the the static contempt only but dynamic contempt can not be switched off.
For switching off dynamic contempt one ought to modify the source of Stockfish. This is a real issue what
should be solved by developers of Stockfish.
My point is people doing that are hurting themselves because their new compile provides worse move choices than the one with both Contempts on.
corres wrote: Mon Dec 03, 2018 10:11 am3, Everybody use Stockfish for analysis such a manner as he want and as he can.
Yes, but this is as if nobody was taking advantage of all their CPUs by keeping the Default at 1. If you have more than one core, it's best to adjust the setting to use all your cores, just like it's best to learn how to work with the new Contempt On paradigm, instead of mish-mashing the evaluations of white and black like with the other engines.
Your beliefs create your reality, so be careful what you wish for.
Robert Pope
Posts: 558
Joined: Sat Mar 25, 2006 8:27 pm

Re: Stockfish Rollercoaster Effect

Post by Robert Pope »

Ovyron wrote: Tue Dec 18, 2018 12:00 pm
corres wrote: Mon Dec 03, 2018 10:11 am1, Every type of contempt weakens the engines.
No, if this was the case it'd have lower elo on the rating lists. This contempt is just a draw-avoider, for analysis, the user is expected to take care of the cases where avoiding draw is worse than the alternatives, for the rest of scenarios Contempt is superior.
Contempt does weaken the engine, relative to perfect play. By avoiding draw when draw is the best outcome, contempt opens you up to blunders that lose the game. The only reason it doesn't lower the elo is that it is being used against even weaker engines. Avoiding draws extends the games and exposes Stockfish to the risk of blunders, but it happens that the other engines will blunder more often and it is a net gain against those engines. But from a perfect-player perspective, it is playing sub-optimal moves and will score worse against a better player.
User avatar
Eelco de Groot
Posts: 4561
Joined: Sun Mar 12, 2006 2:40 am
Full name:   

Re: Stockfish Rollercoaster Effect

Post by Eelco de Groot »

I think it is more than a year that contempt was introduced, for the second time, I'm not sure. All tuning is done against Stockfish itself, not against other engines. It is safe to say that by now, if you switch contempt off, you are using a Stockfish that is out of tune with itself :P See the last post from Jon Dart about the Benoni when he pointed out to me that Stockfish scores of about one and a half pawn pawn still mean nothing; the opening is a draw. People place far too much reliance on the scores they get from their computer. Me included I will not deny it.
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: Stockfish Rollercoaster Effect

Post by carldaman »

Robert Pope wrote: Wed Dec 19, 2018 3:58 pm
Ovyron wrote: Tue Dec 18, 2018 12:00 pm
corres wrote: Mon Dec 03, 2018 10:11 am1, Every type of contempt weakens the engines.
No, if this was the case it'd have lower elo on the rating lists. This contempt is just a draw-avoider, for analysis, the user is expected to take care of the cases where avoiding draw is worse than the alternatives, for the rest of scenarios Contempt is superior.
Contempt does weaken the engine, relative to perfect play. By avoiding draw when draw is the best outcome, contempt opens you up to blunders that lose the game. The only reason it doesn't lower the elo is that it is being used against even weaker engines. Avoiding draws extends the games and exposes Stockfish to the risk of blunders, but it happens that the other engines will blunder more often and it is a net gain against those engines. But from a perfect-player perspective, it is playing sub-optimal moves and will score worse against a better player.

Contempt, if small, introduces the risk of making (slightly) inaccurate moves, not outright blunders. It's a risk I'm willing to live with, because the moves chosen will be of better use - from my perspective. Very often the contempt-induced moves will also be of better quality, leading to favorable situations that the stronger engine can exploit. Quite useful for analysis as well.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Stockfish Rollercoaster Effect

Post by Ovyron »

Robert Pope wrote: Wed Dec 19, 2018 3:58 pm Contempt does weaken the engine, relative to perfect play.
Perfect play is huge. Most moves are perfect. Games at the highest levels are won in places where the opponent is given a very tiny slice to walk on, where perfect moves are scarce, and engines sometimes give similar scores to moves that keep the draw and moves that lose.

We have the advantage of hindsight, and what I have seen from my games that I've lost is, that I'd have lost them even if I used Contempt=0. The ones I'd drawn, I'd just have drawn them differently with contempt=0. But the ones I've won, Contempt on has been critical, and I've only been happier with the higher Contempt and more aggressive style of recent Stockfish ("recent" since S9.)

In conclusion, if analyzing with Contempt on would be risky because you could play a non-perfect move (a move that allows the opponent to mate in X), that's a problem with your analysis method, and not Contempt. Using Contempt would just highlight the analysis method's flaws.
Your beliefs create your reality, so be careful what you wish for.