Page 8 of 8

Re: Komodo vs. Larry K on chess.com

Posted: Sun Sep 08, 2019 1:21 am
by lkaufman
Ovyron wrote: Sat Sep 07, 2019 9:14 pm
lkaufman wrote: Thu Sep 05, 2019 6:11 pm With SF it is the basic definition of what score to return when a clean pawn ahead on a full board that has changed many times. I usually use the opening position with f2 removed to measure this
I really don't think that works in practice for actual chess positions. I have full records of actual chess positions (usually openings) that I've analyzed over the years since 2007, with Glaurung and Stockfish versions more and more modern included, where it's clear the trend has been getting a score closer to 0.00 - positions where old Stockfish used to say "0.80" and today's Stockfish says "0.19". And I hadn't seen a jump in recent years in the scale.

If anything, over the years Stockfish's eval and the material on the board has become disjointed, 1.00 as an advantage has not meant "a pawn advantage" for a long while now, and I don't even know what it means anymore. All I know is that, all things being equal, you'd rather have that 1.00 advantage than a 0.90 advantage, and it usually correlates with chances of winning, specially against weak opposition.
lkaufman wrote: Thu Sep 05, 2019 6:11 pm I certainly don't think that a 0.1 change has no significance, just that it is small enough to allow a strongly held belief by a 2800 player that the second move is better to overrule it.
I'd agree, specially if the 0.1 better move is trading a very important piece that may be key to winning, or break a pawn structure that one just knows is worse than the alternative, etc. My question is, how many of such overrules are you going to allow per game, and then, why are the overrules closer to the limit more significant than the others? (say, if you allow 5 such overrules, once you have burned out 4, the last one is significant because it means you can't do another in the game.)

My claim is overruling depends on the advantage or disadvantage that you already have on the game, so a "0.1" score difference can't be applied to all situations. A difference between 0.90 and 1.00 disadvantage might be the difference between saving and losing the game, while a difference between 0.00 and 0.10 might be the difference between increasing your chances of winning or giving the opponent an easy draw. Both are significant scenarios at the edge, and there's no point I can discern where a difference of this magnitude is insignificant.
A 0.1 difference is always enough to take seriously. I think the question is, if we are playing a correspondence game (if it's OTB we won't know the engine eval without cheating), we can consider lots of evidence. The eval from multiple engines, the stats from database if it's a known position, and the opinion of the player if the player is strong enough for this to be relevant. Also the result of interactive analysis. The number of previous "overrules" is irrelevant, just like knowing the history of dice throws is irrelevant to predicting the next one. The absolute score may have something to do with the decision, but there's no obvious rule about this. I'm just saying that if the score diff is 0.2, especially if not contradicted by another strong engine, then you probably shouldn't waste much time on the decision, but if it is 0.1, it is probably worthwhile to look further if you are a very strong player or at least very skilled in interactive analysis.

Re: Komodo vs. Larry K on chess.com

Posted: Sun Sep 08, 2019 1:35 am
by jp
lkaufman wrote: Sun Sep 08, 2019 1:21 am A 0.1 difference is always enough to take seriously. I think the question is, if we are playing a correspondence game (if it's OTB we won't know the engine eval without cheating), we can consider lots of evidence.
Let's say correspondence or freestyle, when it's similar to OTB (timewise, at least) but not cheating.

Re: Komodo vs. Larry K on chess.com

Posted: Sun Sep 08, 2019 4:12 am
by Ovyron
lkaufman wrote: Sun Sep 08, 2019 1:21 am A 0.1 difference is always enough to take seriously. I think the question is, if we are playing a correspondence game (if it's OTB we won't know the engine eval without cheating), we can consider lots of evidence. The eval from multiple engines, the stats from database if it's a known position, and the opinion of the player if the player is strong enough for this to be relevant. Also the result of interactive analysis. The number of previous "overrules" is irrelevant, just like knowing the history of dice throws is irrelevant to predicting the next one. The absolute score may have something to do with the decision, but there's no obvious rule about this. I'm just saying that if the score diff is 0.2, especially if not contradicted by another strong engine, then you probably shouldn't waste much time on the decision, but if it is 0.1, it is probably worthwhile to look further if you are a very strong player or at least very skilled in interactive analysis.
The "0.1" I'm talking about is the one you get after going through all that. At this point you already have a tree of all important variations (from all relevant engines, your human variations, books and databases if we're before a novelty, etc.), and that "0.1" is some backsolved score from a distant position that you currently predict will be reached if both you and your opponent play best.

IF there's another move that after all this leads to 0.2, then you'd be out of your mind if you play the 0.1 one instead of your 0.2 one! Because, if the 0.1 is actually better, then your analysis method and scores of your leaf nodes should reflect this.

What I'm talking about is actually the greatest flaw on the IDEA analysis method of the Aquarium's interface, as exemplified by Corr Chess World Champion of 2011 José Sanz on the article by ChessOK:

(José Sanz in Italics)

Image

Quite a few moves had similar scores in this position as the following screen-shot of the IDeA tree shows.

Image

15…a4! is the right move here, even if the engines may prefer others. They do not seem to understand what is happening, since the alternatives lead to positions with plenty of traps for Black. I am not saying Black is lost, but I would have suffered a lot against such a strong opponent if I had not played 15…a4!

So after a while of guiding the engine and building a huge tree of positions, the best move is hidden in a sea of others, and the human has to overrule the move that seems 0.08 worse. He plays 15...a4! and goes to win the game. My claim is that when your analysis method is good, then you can make it sync with the truth that you already know. In this case, 15...a4 should appear with best score in the tree.

The fact then would be that only one of these statements is true:

A) Whenever the best moves in a position are 0.2 and 0.1, the 0.2 one is the best and you must play it.

OR

B) Whenever the best move in a position is 0.1, and there's a 0.2 move that is worse, there's something wrong with the analysis method, because either the first move should appear with a >0.2 score, or the second one should appear with a <0.1 score.

In all the cases where the GM overrules the engine and plays the 0.1 move over the 0.2 one and it's the B) case, he'd be better using an analysis method that from the get-go shows the 0.1 move, that is best, as best (what scores are used is irrelevant, as I've used Komodo Contempt -100 to 100 for this and what matters is the ranking of the moves and difference in scores), so they shouldn't need to overrule anything.

In the cases where it's A) (the ones I've been arguing about), the GM is playing a suboptimal move, and it's significant because by the 10th time they do it they'll have a 1.00 disadvantage.