Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

mmt · Post by **mmt** » Wed Feb 05, 2020 4:16 pm

jp wrote: ↑Wed Feb 05, 2020 4:11 pm
mmt wrote: ↑Wed Feb 05, 2020 10:21 am 3-1-0 soccer-like scoring system has been used in this tournament starting in 2008. It was also used in 2nd London Classic in 2010. I think it's a superior method.
In soccer and Bilbao, the scoring is designed to encourage going for wins. It's not designed to be better at determining the elos of the players.

I know, please see what paragraph I was replying to. It was an aside.

mmt · Post by **mmt** » Wed Feb 05, 2020 4:28 pm

As to what score corresponds to what win %, Laskos has already done the type of calculation I was thinking about: http://www.talkchess.com/forum3/viewtop ... =2&t=68072. We can normalize engines' centipawn scores this way into W/D/L %, so engines showing high evals won't have an edge.

An exact definition of what scores are used and how might lead to other possible ways for engines to game this scoring system (I'll call it "Maz score" here if this wasn't defined yet) but I really doubt any engines would bother unless it becomes really popular. The way to do it right would be to find a function to maximize the accuracy of predictions of normal W/D/L results given engine scores during the game in drawn games.

We can see example graphs of how scores are changing during games here https://www.chess.com/computer-chess-championship.

jp · Post by jp » Wed Feb 05, 2020 4:33 pm

mmt wrote: ↑Wed Feb 05, 2020 2:29 pm Can anybody point me to a good size set of tournament engine games with scores at each ply using current engines? Or matches?
Edit: LC0 Discord has a good set of LC0 games.

Lc0 evals have in the past been a bit crazy. The equation to translate to cp was changed, so maybe that's fixed, but you should be careful using those.

mmt · Post by **mmt** » Wed Feb 05, 2020 4:35 pm

jp wrote: ↑Wed Feb 05, 2020 4:33 pm Lc0 evals have in the past been a bit crazy. The equation to translate to cp was changed, so maybe that's fixed, but you should be careful using those.

Yes, I actually just started a little tournament with newest nets

jp · Post by jp » Wed Feb 05, 2020 4:42 pm

mmt wrote: ↑Wed Feb 05, 2020 4:35 pm Yes, I actually just started a little tournament with newest nets

Well, the nets will affect the actual % eval, but the formula to convert to cp eval was changed in Lc0 v0.21.2 (2019-06-09).

New: cp = 111.714640912 * tan(1.5620688421 * Q).
Old: 290.680623072 * tan(1.548090806 * Q.

(So if they have cp evals for games on Lc0 versions prior to Lc0 v.0.21.2 (2019-06-09), they could be converted to the new cp scoring.)

mmt · Post by **mmt** » Wed Feb 05, 2020 4:51 pm

jp wrote: ↑Wed Feb 05, 2020 4:42 pm Well, the nets will affect the actual % eval, but the formula to convert to cp eval was changed in Lc0 v0.21.2 (2019-06-09).

New: cp = 111.714640912 * tan(1.5620688421 * Q).
Old: 290.680623072 * tan(1.548090806 * Q.

(So if they have cp evals for games on Lc0 versions prior to Lc0 v.0.21.2 (2019-06-09), they could be converted to the new cp scoring.)

Yes, I'm using the latest 0.23.1. Just a test run for this idea so far.

Uri Blass · Post by **Uri Blass** » Wed Feb 05, 2020 10:12 pm

hgm wrote: ↑Wed Feb 05, 2020 12:24 pm One should never award a result on the basis of the evaluation. That would just encourage engines to lie about their evaluation.

I agree

I can add that I am also against giving more points to the engine that got better position in the endgame based on evaluation of a third engine because it means giving higher rating for engines that get a winning positions but do not know to win them.

An engine that often get winning endgames but blunder and get a draw should get 0.5 points for the endgames and not 0.6 points for them.

lkaufman · Post by **lkaufman** » Wed Feb 05, 2020 11:57 pm

mmt wrote: ↑Wed Feb 05, 2020 2:28 am There have been various proposals over the years to change the rules of chess to reduce the number of draws. Correspondence players want something done https://en.chessbase.com/post/how-many- ... for-a-draw and there is a precedent in other games like Janggi (Korean chess) https://en.wikipedia.org/wiki/Janggi#Mi ... eous_rules.

I'm not proposing any rule changes but we could look at the draws in engine vs engine games and determine who had an advantage near the end (e.g. if both engines scored previous positions before the draw scores as >0.5 for white, then white should get 0.6-0.7 points instead of 0.5). The idea is that 90% of time spent playing engine matches goes to draws and we get no information from them. If one side can consistently get an advantage, this should be rewarded. If chess engines optimize towards this scoring, it becomes a worthless measure. But the current engines don't, so it's possible.

It is quite interesting for me that the tournament version of a major form of chess (Janggi) is played with rules that eliminate draws. The chess analog would be if we were to break ties with a point count like (3,10,11,16,31 for p,n,b,r,q) with Black winning in case of a tied score. White's goal would be to win the bishop pair (or bishop for knight), without getting into a bad position, so defenses like the Nimzo would disappear. In shogi in some clubs or amateur events no-draw rules are also used, with reps (other than illegal perps) being wins for the second player and impasse draws (like 50 move rule in chess) being decided by material point count with second player winning ties.
I agree with the post that information can be extracted from draws that is useful for improving engines, but probably it should be done in a way that is independent of the engine eval. At the very least, if one side has material and is on-move vs a lone king, giving him 3/4 of the point (as proposed in the correspondence chess thread) (perhaps conditioned on the pawn or piece being not lost by force) in testing seems reasonable.

Ovyron · Post by **Ovyron** » Thu Feb 06, 2020 12:40 am

lkaufman wrote: ↑Wed Feb 05, 2020 11:57 pm At the very least, if one side has material and is on-move vs a lone king, giving him 3/4 of the point (as proposed in the correspondence chess thread) (perhaps conditioned on the pawn or piece being not lost by force) in testing seems reasonable.

So an engine that doesn't know two knights can't mate a king and gives away its advantage to go for it, against an engine that knows it's a draw (and that's why it allows it), would get more points than one that knows so it tries harder to win on positions with equal material or even material deficiency?

The former is rewarding the engine's ignorance, and sacrificing material to get a draw is a beautiful part of chess, so I'd rather give 3/4 of a point to engines that managed to draw the game with material deficiency, because at least it was an interesting game (the dullest draws tend to end with equal material.)

lkaufman · Post by **lkaufman** » Thu Feb 06, 2020 4:55 am

Ovyron wrote: ↑Thu Feb 06, 2020 12:40 am
lkaufman wrote: ↑Wed Feb 05, 2020 11:57 pm At the very least, if one side has material and is on-move vs a lone king, giving him 3/4 of the point (as proposed in the correspondence chess thread) (perhaps conditioned on the pawn or piece being not lost by force) in testing seems reasonable.
So an engine that doesn't know two knights can't mate a king and gives away its advantage to go for it, against an engine that knows it's a draw (and that's why it allows it), would get more points than one that knows so it tries harder to win on positions with equal material or even material deficiency?

The former is rewarding the engine's ignorance, and sacrificing material to get a draw is a beautiful part of chess, so I'd rather give 3/4 of a point to engines that managed to draw the game with material deficiency, because at least it was an interesting game (the dullest draws tend to end with equal material.)

This is not a relevant argument today. Every decent engine knows what is mating material and what is not. The typical situation that ends up with (for example) king and knight vs king is that one side gets a favorable ending, neither engine can read out to the end whether Black can reach the drawn king vs king and knight or not, but they both know that otherwise White will win. So they score it something like plus 1.00. If it turns out that Black can eliminate the last pawn and reach the draw, it doesn't change the fact that both engines assessed that White had outplayed Black, and Black was just lucky to have enough resources (unknown to both engines) to reach the draw. White should be considered the stronger engine in this example if there is no other information available, though by a lesser margin than if White had actually won.

Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?