Why Lc0 eval (in cp) is asymmetric against AB engines?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Why Lc0 eval (in cp) is asymmetric against AB engines?

Post by Laskos »

With the talk about supervised learning escalating, I would like to a point where the reinforcement learning seems to fail pretty badly -- converting endgames. I presented the middlegame graph, where the things often linger into endgames (aside issues with tactics in both middlegames and endgames), and the graph of Lc0 was shifted with regard to axis origins. From the shapes of these plots, one can suspect that the shift in the middlegames is partly due to poor endgame evaluation and performance. Here the supervised learning would help a lot, and learning from Stockfish or Komodo (enabled with 6 men Syzygy) games (without adjudications) from endgame starting positions would help greatly (in both eval and tactics). Simply, hand-crafted endgame eval terms and TBs are very helpful in traditional engines, and self-play with NN isn't that productive. Here are the plots of the performances versus eval for three cases (games of equaled in strength Leela against Komodo):

Image


Image


Image
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Why Lc0 eval (in cp) is asymmetric against AB engines?

Post by Laskos »

New shape of performance versus eval with new hardware (RTX 2070 GPU), new test30 net, time control 15s + 0.25s. In these conditions, Lc0 on RTX 2070 and SF10 on 4 strong i7 cores are equally matched in strength:

Score of lc0_v201_32721 vs SF10: 49 - 49 - 102 [0.500] 200
Elo difference: 0.00 +/- 33.76
Finished match

And the performance versus eval in opening - early midgame looks like this:

Image


Lc0, to smaller degree, still has difficulty converting even large evals it shows. It exhibits a sort of Contempt of some 60cp. SF10 has an in-built Contempt of about 20cp. This difficulty converting of Lc0 is probably again related to bad endgames and occasional blunders.
But the negative evals of Lc0 show almost exact performance as SF10, it behaves regularly when seeing itself in disadvantage.
megamau
Posts: 37
Joined: Wed Feb 10, 2016 6:20 am
Location: Singapore

Re: Why Lc0 eval (in cp) is asymmetric against AB engines?

Post by megamau »

Laskos wrote: Sat Jan 19, 2019 1:36 pm Image

Lc0, to smaller degree, still has difficulty converting even large evals it shows. It exhibits a sort of Contempt of some 60cp. SF10 has an in-built Contempt of about 20cp. This difficulty converting of Lc0 is probably again related to bad endgames and occasional blunders.
But the negative evals of Lc0 show almost exact performance as SF10, it behaves regularly when seeing itself in disadvantage.
What you see as "contempt" is actually "failure to convert". As you balance the overall elo, LC0 will in general be stronger and obtain better positions, but fail to convert or blunder them and lose. The overall result is a wash (by design of the experiment).
The matching in the negative eval is because the AB engine will not blunder the better positions.
yanquis1972
Posts: 1766
Joined: Wed Jun 03, 2009 12:14 am

Re: Why Lc0 eval (in cp) is asymmetric against AB engines?

Post by yanquis1972 »

I think that’s the major factor, especially at this TC, but “comtempt” (relatively dramatic cp scores) also plays a part...when possible I exclusively use win% with leela now because after a short time you can understand her evaluation in traditional “+-, +/-, +/=“, etc terms. If the cp value is intended to mimic a/b evals it does so poorly....

What I don’t immediately understand is why leelas negative evals plot so well with stockfishes, but I’d bet that again precise conversion is the primary factor.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Why Lc0 eval (in cp) is asymmetric against AB engines?

Post by Laskos »

yanquis1972 wrote: Sun Jan 20, 2019 11:20 am I think that’s the major factor, especially at this TC, but “comtempt” (relatively dramatic cp scores) also plays a part...when possible I exclusively use win% with leela now because after a short time you can understand her evaluation in traditional “+-, +/-, +/=“, etc terms. If the cp value is intended to mimic a/b evals it does so poorly....

What I don’t immediately understand is why leelas negative evals plot so well with stockfishes, but I’d bet that again precise conversion is the primary factor.
Well, it is still contempt of some 60cp. Say, against an equal or stronger regular engine, if Leela's eval is +40cp, it should rather forcefully draw if it can, than hoping to get a better result, as +40cp has some 48% performance, or -14 Elo points average outcome (against an equal regular engine, even worse against a superior engine).
It's a good thing against significantly weaker regular engines. All in all, acts like regular Contempt.

I am not sure why for negative evals, Leela's performance shape is so similar to that of SF10. It seems that what Leela considers as an inferior position is seen by SF10 also as inferior, of similar (negative) cp value.

Sure, for those high (positive) Leela's eval values, the main reason is the failure to convert, including blunders.

I am not using win% output with Leela, what does it say for 300cp-400cp values in early midgame? Here it is around 80-85% performance, against an equal in strength SF10. Do they use some general logistic conversion or it's more complicated and dependent on each position?
Last edited by Laskos on Sun Jan 20, 2019 12:08 pm, edited 2 times in total.
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: Why Lc0 eval (in cp) is asymmetric against AB engines?

Post by chrisw »

Laskos wrote: Sun Jan 20, 2019 11:41 am
yanquis1972 wrote: Sun Jan 20, 2019 11:20 am I think that’s the major factor, especially at this TC, but “comtempt” (relatively dramatic cp scores) also plays a part...when possible I exclusively use win% with leela now because after a short time you can understand her evaluation in traditional “+-, +/-, +/=“, etc terms. If the cp value is intended to mimic a/b evals it does so poorly....

What I don’t immediately understand is why leelas negative evals plot so well with stockfishes, but I’d bet that again precise conversion is the primary factor.
Well, it is still contempt of some 60cp. Say, against an equal or stronger regular engine, if Leela's eval is +40cp, it should rather forcefully draw if it can, than hoping to get a better result, as +40cp has some 48% performance, or -14 Elo points average outcome (against an equal regular engine, even worse against a superior engine).

I am not sure why for negative evals, Leela's performance shape is so similar to that of SF10. It seems that what Leela considers as an inferior position is seen by SF10 also as inferior, of similar (negative) cp value.

Sure, for those high (positive) Leela's eval values, the main reason is failure to convert, including blunders.

I am not using win% output with Leela, what does it say for 300cp-400cp values in early midgame? Here it is around 80-85% performance, against an equal in strength SF10. Do they use some general logistic conversion or it's more complicated and dependent on each position?
I think this “asymmetry” is just an effect of the way the search finds an evaluation. The explore exploit algorithm favours searching “good” moves. The search at any node doesn’t go full width. The backed up evaluation averages what has been looked at. So the score tends to be an averaging over the better lines in the tree. Hence positively asymmetric.
grahamj
Posts: 43
Joined: Thu Oct 11, 2018 2:26 pm
Full name: Graham Jones

Re: Why Lc0 eval (in cp) is asymmetric against AB engines?

Post by grahamj »

I have found a quite different type of assymmetry.

I use board positions taken randomly from various sets of games, evaluate them with both SF and LC0 using python-chess, convert to prob(win), and graph. I gave them 150ms each. The graph uses 1000 positions from 1000 games from LC0's Test20 self-play games. These include a lot of positions with and obvious winner, so most points are clustered near (0,0) and (1,1).

For LC0 I use its prob(win) value [though I'm actually calculating it by reversing the prob(win) -> cp mapping] and for SF I use (math.atan(int(text)/100.0) / (math.pi/2) + 1) / 2. This makes most values roughly follow a straight line from (0,0) to (1,1) (though there is a characteristic waviness that you can see if you squint). However there are a few board postions where LC0 and SF strongly disagree, with LC0 always being pessimistic.
Graham Jones, www.indriid.com