I know, please see what paragraph I was replying to. It was an aside.
Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?
Moderators: hgm, Rebel, chrisw
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?
As to what score corresponds to what win %, Laskos has already done the type of calculation I was thinking about: http://www.talkchess.com/forum3/viewtop ... =2&t=68072. We can normalize engines' centipawn scores this way into W/D/L %, so engines showing high evals won't have an edge.
An exact definition of what scores are used and how might lead to other possible ways for engines to game this scoring system (I'll call it "Maz score" here if this wasn't defined yet) but I really doubt any engines would bother unless it becomes really popular. The way to do it right would be to find a function to maximize the accuracy of predictions of normal W/D/L results given engine scores during the game in drawn games.
We can see example graphs of how scores are changing during games here https://www.chess.com/computer-chess-championship.
An exact definition of what scores are used and how might lead to other possible ways for engines to game this scoring system (I'll call it "Maz score" here if this wasn't defined yet) but I really doubt any engines would bother unless it becomes really popular. The way to do it right would be to find a function to maximize the accuracy of predictions of normal W/D/L results given engine scores during the game in drawn games.
We can see example graphs of how scores are changing during games here https://www.chess.com/computer-chess-championship.
Last edited by mmt on Wed Feb 05, 2020 4:37 pm, edited 2 times in total.
-
- Posts: 1470
- Joined: Mon Apr 23, 2018 7:54 am
Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?
Lc0 evals have in the past been a bit crazy. The equation to translate to cp was changed, so maybe that's fixed, but you should be careful using those.
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
-
- Posts: 1470
- Joined: Mon Apr 23, 2018 7:54 am
Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?
Well, the nets will affect the actual % eval, but the formula to convert to cp eval was changed in Lc0 v0.21.2 (2019-06-09).
New: cp = 111.714640912 * tan(1.5620688421 * Q).
Old: 290.680623072 * tan(1.548090806 * Q.
(So if they have cp evals for games on Lc0 versions prior to Lc0 v.0.21.2 (2019-06-09), they could be converted to the new cp scoring.)
-
- Posts: 343
- Joined: Sun Aug 25, 2019 8:33 am
- Full name: .
Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?
Yes, I'm using the latest 0.23.1. Just a test run for this idea so far.jp wrote: ↑Wed Feb 05, 2020 4:42 pm Well, the nets will affect the actual % eval, but the formula to convert to cp eval was changed in Lc0 v0.21.2 (2019-06-09).
New: cp = 111.714640912 * tan(1.5620688421 * Q).
Old: 290.680623072 * tan(1.548090806 * Q.
(So if they have cp evals for games on Lc0 versions prior to Lc0 v.0.21.2 (2019-06-09), they could be converted to the new cp scoring.)
-
- Posts: 10309
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?
I agree
I can add that I am also against giving more points to the engine that got better position in the endgame based on evaluation of a third engine because it means giving higher rating for engines that get a winning positions but do not know to win them.
An engine that often get winning endgames but blunder and get a draw should get 0.5 points for the endgames and not 0.6 points for them.
-
- Posts: 5960
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?
It is quite interesting for me that the tournament version of a major form of chess (Janggi) is played with rules that eliminate draws. The chess analog would be if we were to break ties with a point count like (3,10,11,16,31 for p,n,b,r,q) with Black winning in case of a tied score. White's goal would be to win the bishop pair (or bishop for knight), without getting into a bad position, so defenses like the Nimzo would disappear. In shogi in some clubs or amateur events no-draw rules are also used, with reps (other than illegal perps) being wins for the second player and impasse draws (like 50 move rule in chess) being decided by material point count with second player winning ties.mmt wrote: ↑Wed Feb 05, 2020 2:28 am There have been various proposals over the years to change the rules of chess to reduce the number of draws. Correspondence players want something done https://en.chessbase.com/post/how-many- ... for-a-draw and there is a precedent in other games like Janggi (Korean chess) https://en.wikipedia.org/wiki/Janggi#Mi ... eous_rules.
I'm not proposing any rule changes but we could look at the draws in engine vs engine games and determine who had an advantage near the end (e.g. if both engines scored previous positions before the draw scores as >0.5 for white, then white should get 0.6-0.7 points instead of 0.5). The idea is that 90% of time spent playing engine matches goes to draws and we get no information from them. If one side can consistently get an advantage, this should be rewarded. If chess engines optimize towards this scoring, it becomes a worthless measure. But the current engines don't, so it's possible.
I agree with the post that information can be extracted from draws that is useful for improving engines, but probably it should be done in a way that is independent of the engine eval. At the very least, if one side has material and is on-move vs a lone king, giving him 3/4 of the point (as proposed in the correspondence chess thread) (perhaps conditioned on the pawn or piece being not lost by force) in testing seems reasonable.
Komodo rules!
-
- Posts: 4556
- Joined: Tue Jul 03, 2007 4:30 am
Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?
So an engine that doesn't know two knights can't mate a king and gives away its advantage to go for it, against an engine that knows it's a draw (and that's why it allows it), would get more points than one that knows so it tries harder to win on positions with equal material or even material deficiency?
The former is rewarding the engine's ignorance, and sacrificing material to get a draw is a beautiful part of chess, so I'd rather give 3/4 of a point to engines that managed to draw the game with material deficiency, because at least it was an interesting game (the dullest draws tend to end with equal material.)
-
- Posts: 5960
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?
This is not a relevant argument today. Every decent engine knows what is mating material and what is not. The typical situation that ends up with (for example) king and knight vs king is that one side gets a favorable ending, neither engine can read out to the end whether Black can reach the drawn king vs king and knight or not, but they both know that otherwise White will win. So they score it something like plus 1.00. If it turns out that Black can eliminate the last pawn and reach the draw, it doesn't change the fact that both engines assessed that White had outplayed Black, and Black was just lucky to have enough resources (unknown to both engines) to reach the draw. White should be considered the stronger engine in this example if there is no other information available, though by a lesser margin than if White had actually won.Ovyron wrote: ↑Thu Feb 06, 2020 12:40 amSo an engine that doesn't know two knights can't mate a king and gives away its advantage to go for it, against an engine that knows it's a draw (and that's why it allows it), would get more points than one that knows so it tries harder to win on positions with equal material or even material deficiency?
The former is rewarding the engine's ignorance, and sacrificing material to get a draw is a beautiful part of chess, so I'd rather give 3/4 of a point to engines that managed to draw the game with material deficiency, because at least it was an interesting game (the dullest draws tend to end with equal material.)
Komodo rules!