Uri Blass wrote: ↑Tue Sep 25, 2018 3:44 am
Laskos wrote: ↑Tue Sep 25, 2018 12:27 am
chrisw wrote: ↑Fri Sep 21, 2018 10:29 pm
139 rounds
Code: Select all
Engine Tournament Init Score 1st 2nd 3rd 4th 5th 6th 7th 8th....
Stockfish 3500 3467 26.5 0.97 0.03 0.00 0.00 0.00 0.00 0.00 0.00
Houdini 3448 3428 22.0 0.03 0.74 0.22 0.01 0.00 0.00 0.00 0.00
Komodo 3441 3456 18.5 0.00 0.22 0.68 0.09 0.00 0.00 0.00 0.00
Lc0 3381 3356 19.5 0.00 0.01 0.10 0.84 0.05 0.01 0.00 0.00
Ethereal 3333 3335 15.0 0.00 0.00 0.00 0.03 0.46 0.34 0.16 0.01
Fire 3341 3369 13.0 0.00 0.00 0.00 0.02 0.37 0.38 0.21 0.02
Booot 3312 3319 13.5 0.00 0.00 0.00 0.00 0.12 0.26 0.53 0.09
Andscacs 3275 3286 11.0 0.00 0.00 0.00 0.00 0.00 0.02 0.11 0.87
LC0 marked down by the sim compared to actual ranking because of elo
Yes, your predictions stand well.
It's interesting that after 60/70 games played by each engine, in top 4 playing one another, Lc0 would have been second next to Stockfish. Equal in points to Houdini, but better SB. It would have entered the final, but with 8 engines it fares not so well, as it has problems with weaker engines.
I think that we do not have enough games to say that with only 4 engines Lc0 could enter the final.
Note that I guess that with only 4 engines we could have also more games to reduce statistical noise.
My best guess is that if you start a new stage with only 4 engines you are not going to see lc0 in the top 2 and that lc0 was simply unlucky against the weak opponents(I cannot believe that it is a normal result when lc0 lost 2 games against andscacs without losing more games against stockfish when stockfish is significantly stronger).
I computed the rating achieved by Lc0 after 65/70 games, 27/30 against top 3, and 38/40 against bottom 4. I took CCRL 40/4 ratings for the regular engines, and Lc0 rating unknown.
The performance of Lc0 against top 3 (Stockfish, Houdini, Komodo) is:
3470 +/- 50 Elo points 2 standard deviations
The performance of Lc0 against bottom 4 (Ethereal, Fire, Booot, Andscacs) is:
3350 +/- 65 Elo points 2 standards deviations
And the difference between two performances is:
120 +/- 80 Elo points 2 standard deviations, outside error margins 2SD
Lc0 doesn't respect well the Elo curve against regular engines. Its Elo performance is significantly better against stronger opponents than against weaker opponents. Combined with scaling issues, hardware issues, it's pretty useless to talk of Lc0 strength against regular engines in general.
Also, I let play a round round-robin of top 3 + Lc0, and although in individual matches, Lc0 loses to each one of these 3, in round-robin standings it's above Komodo, on the third place. So, the CCCC behavior is somehow confirmed, although my hardware is much weaker and very different. Draw rate of Lc0 is very high in my top 4 tournament too compared to other engines, similar to what happens in CCCC. One of the problems for this "Elo disobedience" is Lc0 endgame play, it fails to convert clear wins in the endgame against strong and weak engines alie. Just saw a completely won K+R+R of Lc0 against K+R of Sf dev, ended in draw. Also, sometimes it blunders clear endgame draws.