CCCC Stage 2 results simulation

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

CCCC Stage 2 results simulation

Post by chrisw »

139 rounds

Code: Select all

Engine Tournament Init    Score 1st  2nd  3rd  4th  5th  6th  7th  8th....
Stockfish   3500  3467    26.5  0.97 0.03 0.00 0.00 0.00 0.00 0.00 0.00
Houdini     3448  3428    22.0  0.03 0.74 0.22 0.01 0.00 0.00 0.00 0.00
Komodo      3441  3456    18.5  0.00 0.22 0.68 0.09 0.00 0.00 0.00 0.00
Lc0         3381  3356    19.5  0.00 0.01 0.10 0.84 0.05 0.01 0.00 0.00
Ethereal    3333  3335    15.0  0.00 0.00 0.00 0.03 0.46 0.34 0.16 0.01
Fire        3341  3369    13.0  0.00 0.00 0.00 0.02 0.37 0.38 0.21 0.02
Booot       3312  3319    13.5  0.00 0.00 0.00 0.00 0.12 0.26 0.53 0.09
Andscacs    3275  3286    11.0  0.00 0.00 0.00 0.00 0.00 0.02 0.11 0.87
LC0 marked down by the sim compared to actual ranking because of elo
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCCC Stage 2 results simulation

Post by Laskos »

chrisw wrote: Fri Sep 21, 2018 10:29 pm 139 rounds

Code: Select all

Engine Tournament Init    Score 1st  2nd  3rd  4th  5th  6th  7th  8th....
Stockfish   3500  3467    26.5  0.97 0.03 0.00 0.00 0.00 0.00 0.00 0.00
Houdini     3448  3428    22.0  0.03 0.74 0.22 0.01 0.00 0.00 0.00 0.00
Komodo      3441  3456    18.5  0.00 0.22 0.68 0.09 0.00 0.00 0.00 0.00
Lc0         3381  3356    19.5  0.00 0.01 0.10 0.84 0.05 0.01 0.00 0.00
Ethereal    3333  3335    15.0  0.00 0.00 0.00 0.03 0.46 0.34 0.16 0.01
Fire        3341  3369    13.0  0.00 0.00 0.00 0.02 0.37 0.38 0.21 0.02
Booot       3312  3319    13.5  0.00 0.00 0.00 0.00 0.12 0.26 0.53 0.09
Andscacs    3275  3286    11.0  0.00 0.00 0.00 0.00 0.00 0.02 0.11 0.87
LC0 marked down by the sim compared to actual ranking because of elo
Yes, your predictions stand well.

It's interesting that after 60/70 games played by each engine, in top 4 playing one another, Lc0 would have been second next to Stockfish. Equal in points to Houdini, but better SB. It would have entered the final, but with 8 engines it fares not so well, as it has problems with weaker engines.
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: CCCC Stage 2 results simulation

Post by chrisw »

Laskos wrote: Tue Sep 25, 2018 12:27 am
chrisw wrote: Fri Sep 21, 2018 10:29 pm 139 rounds

Code: Select all

Engine Tournament Init    Score 1st  2nd  3rd  4th  5th  6th  7th  8th....
Stockfish   3500  3467    26.5  0.97 0.03 0.00 0.00 0.00 0.00 0.00 0.00
Houdini     3448  3428    22.0  0.03 0.74 0.22 0.01 0.00 0.00 0.00 0.00
Komodo      3441  3456    18.5  0.00 0.22 0.68 0.09 0.00 0.00 0.00 0.00
Lc0         3381  3356    19.5  0.00 0.01 0.10 0.84 0.05 0.01 0.00 0.00
Ethereal    3333  3335    15.0  0.00 0.00 0.00 0.03 0.46 0.34 0.16 0.01
Fire        3341  3369    13.0  0.00 0.00 0.00 0.02 0.37 0.38 0.21 0.02
Booot       3312  3319    13.5  0.00 0.00 0.00 0.00 0.12 0.26 0.53 0.09
Andscacs    3275  3286    11.0  0.00 0.00 0.00 0.00 0.00 0.02 0.11 0.87
LC0 marked down by the sim compared to actual ranking because of elo
Yes, your predictions stand well.

It's interesting that after 60/70 games played by each engine, in top 4 playing one another, Lc0 would have been second next to Stockfish. Equal in points to Houdini, but better SB. It would have entered the final, but with 8 engines it fares not so well, as it has problems with weaker engines.
Crikey, you're right, I'ld not noticed that. I'll make a second table, tracking the top four only ....
Uri Blass
Posts: 10281
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: CCCC Stage 2 results simulation

Post by Uri Blass »

Laskos wrote: Tue Sep 25, 2018 12:27 am
chrisw wrote: Fri Sep 21, 2018 10:29 pm 139 rounds

Code: Select all

Engine Tournament Init    Score 1st  2nd  3rd  4th  5th  6th  7th  8th....
Stockfish   3500  3467    26.5  0.97 0.03 0.00 0.00 0.00 0.00 0.00 0.00
Houdini     3448  3428    22.0  0.03 0.74 0.22 0.01 0.00 0.00 0.00 0.00
Komodo      3441  3456    18.5  0.00 0.22 0.68 0.09 0.00 0.00 0.00 0.00
Lc0         3381  3356    19.5  0.00 0.01 0.10 0.84 0.05 0.01 0.00 0.00
Ethereal    3333  3335    15.0  0.00 0.00 0.00 0.03 0.46 0.34 0.16 0.01
Fire        3341  3369    13.0  0.00 0.00 0.00 0.02 0.37 0.38 0.21 0.02
Booot       3312  3319    13.5  0.00 0.00 0.00 0.00 0.12 0.26 0.53 0.09
Andscacs    3275  3286    11.0  0.00 0.00 0.00 0.00 0.00 0.02 0.11 0.87
LC0 marked down by the sim compared to actual ranking because of elo
Yes, your predictions stand well.

It's interesting that after 60/70 games played by each engine, in top 4 playing one another, Lc0 would have been second next to Stockfish. Equal in points to Houdini, but better SB. It would have entered the final, but with 8 engines it fares not so well, as it has problems with weaker engines.
I think that we do not have enough games to say that with only 4 engines Lc0 could enter the final.
Note that I guess that with only 4 engines we could have also more games to reduce statistical noise.

My best guess is that if you start a new stage with only 4 engines you are not going to see lc0 in the top 2 and that lc0 was simply unlucky against the weak opponents(I cannot believe that it is a normal result when lc0 lost 2 games against andscacs without losing more games against stockfish when stockfish is significantly stronger).
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCCC Stage 2 results simulation

Post by Laskos »

Uri Blass wrote: Tue Sep 25, 2018 3:44 am
Laskos wrote: Tue Sep 25, 2018 12:27 am
chrisw wrote: Fri Sep 21, 2018 10:29 pm 139 rounds

Code: Select all

Engine Tournament Init    Score 1st  2nd  3rd  4th  5th  6th  7th  8th....
Stockfish   3500  3467    26.5  0.97 0.03 0.00 0.00 0.00 0.00 0.00 0.00
Houdini     3448  3428    22.0  0.03 0.74 0.22 0.01 0.00 0.00 0.00 0.00
Komodo      3441  3456    18.5  0.00 0.22 0.68 0.09 0.00 0.00 0.00 0.00
Lc0         3381  3356    19.5  0.00 0.01 0.10 0.84 0.05 0.01 0.00 0.00
Ethereal    3333  3335    15.0  0.00 0.00 0.00 0.03 0.46 0.34 0.16 0.01
Fire        3341  3369    13.0  0.00 0.00 0.00 0.02 0.37 0.38 0.21 0.02
Booot       3312  3319    13.5  0.00 0.00 0.00 0.00 0.12 0.26 0.53 0.09
Andscacs    3275  3286    11.0  0.00 0.00 0.00 0.00 0.00 0.02 0.11 0.87
LC0 marked down by the sim compared to actual ranking because of elo
Yes, your predictions stand well.

It's interesting that after 60/70 games played by each engine, in top 4 playing one another, Lc0 would have been second next to Stockfish. Equal in points to Houdini, but better SB. It would have entered the final, but with 8 engines it fares not so well, as it has problems with weaker engines.
I think that we do not have enough games to say that with only 4 engines Lc0 could enter the final.
Note that I guess that with only 4 engines we could have also more games to reduce statistical noise.

My best guess is that if you start a new stage with only 4 engines you are not going to see lc0 in the top 2 and that lc0 was simply unlucky against the weak opponents(I cannot believe that it is a normal result when lc0 lost 2 games against andscacs without losing more games against stockfish when stockfish is significantly stronger).
I computed the rating achieved by Lc0 after 65/70 games, 27/30 against top 3, and 38/40 against bottom 4. I took CCRL 40/4 ratings for the regular engines, and Lc0 rating unknown.

The performance of Lc0 against top 3 (Stockfish, Houdini, Komodo) is:
3470 +/- 50 Elo points 2 standard deviations

The performance of Lc0 against bottom 4 (Ethereal, Fire, Booot, Andscacs) is:
3350 +/- 65 Elo points 2 standards deviations

And the difference between two performances is:
120 +/- 80 Elo points 2 standard deviations, outside error margins 2SD

Lc0 doesn't respect well the Elo curve against regular engines. Its Elo performance is significantly better against stronger opponents than against weaker opponents. Combined with scaling issues, hardware issues, it's pretty useless to talk of Lc0 strength against regular engines in general.

Also, I let play a round round-robin of top 3 + Lc0, and although in individual matches, Lc0 loses to each one of these 3, in round-robin standings it's above Komodo, on the third place. So, the CCCC behavior is somehow confirmed, although my hardware is much weaker and very different. Draw rate of Lc0 is very high in my top 4 tournament too compared to other engines, similar to what happens in CCCC. One of the problems for this "Elo disobedience" is Lc0 endgame play, it fails to convert clear wins in the endgame against strong and weak engines alie. Just saw a completely won K+R+R of Lc0 against K+R of Sf dev, ended in draw. Also, sometimes it blunders clear endgame draws.
Last edited by Laskos on Tue Sep 25, 2018 2:43 pm, edited 6 times in total.
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: CCCC Stage 2 results simulation

Post by CMCanavessi »

Why is the +/- smaller for leela vs the top 3 than the bottom 4, when it has played more matches vs the bottom 4? Maybe variance of results?
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCCC Stage 2 results simulation

Post by Laskos »

CMCanavessi wrote: Tue Sep 25, 2018 2:22 pm Why is the +/- smaller for leela vs the top 3 than the bottom 4, when it has played more matches vs the bottom 4? Maybe variance of results?
Lc0 Draw rate is humongous there against top 3. It diminishes the error margins.
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: CCCC Stage 2 results simulation

Post by chrisw »

Laskos wrote: Tue Sep 25, 2018 2:20 pm
Uri Blass wrote: Tue Sep 25, 2018 3:44 am
Laskos wrote: Tue Sep 25, 2018 12:27 am
chrisw wrote: Fri Sep 21, 2018 10:29 pm 139 rounds

Code: Select all

Engine Tournament Init    Score 1st  2nd  3rd  4th  5th  6th  7th  8th....
Stockfish   3500  3467    26.5  0.97 0.03 0.00 0.00 0.00 0.00 0.00 0.00
Houdini     3448  3428    22.0  0.03 0.74 0.22 0.01 0.00 0.00 0.00 0.00
Komodo      3441  3456    18.5  0.00 0.22 0.68 0.09 0.00 0.00 0.00 0.00
Lc0         3381  3356    19.5  0.00 0.01 0.10 0.84 0.05 0.01 0.00 0.00
Ethereal    3333  3335    15.0  0.00 0.00 0.00 0.03 0.46 0.34 0.16 0.01
Fire        3341  3369    13.0  0.00 0.00 0.00 0.02 0.37 0.38 0.21 0.02
Booot       3312  3319    13.5  0.00 0.00 0.00 0.00 0.12 0.26 0.53 0.09
Andscacs    3275  3286    11.0  0.00 0.00 0.00 0.00 0.00 0.02 0.11 0.87
LC0 marked down by the sim compared to actual ranking because of elo
Yes, your predictions stand well.

It's interesting that after 60/70 games played by each engine, in top 4 playing one another, Lc0 would have been second next to Stockfish. Equal in points to Houdini, but better SB. It would have entered the final, but with 8 engines it fares not so well, as it has problems with weaker engines.
I think that we do not have enough games to say that with only 4 engines Lc0 could enter the final.
Note that I guess that with only 4 engines we could have also more games to reduce statistical noise.

My best guess is that if you start a new stage with only 4 engines you are not going to see lc0 in the top 2 and that lc0 was simply unlucky against the weak opponents(I cannot believe that it is a normal result when lc0 lost 2 games against andscacs without losing more games against stockfish when stockfish is significantly stronger).
I computed the rating achieved by Lc0 after 65/70 games, 27/30 against top 3, and 38/40 against bottom 4. I took CCRL 40/4 ratings for the regular engines, and Lc0 rating unknown.

The performance of Lc0 against top 3 (Stockfish, Houdini, Komodo) is:
3470 +/- 50 Elo points 2 standard deviations

The performance of Lc0 against bottom 4 (Ethereal, Fire, Booot, Andscacs) is:
3350 +/- 65 Elo points 2 standards deviations

And the difference between two performances is:
120 +/- 80 Elo points 2 standard deviations, outside error margins 2SD

Lc0 doesn't respect well the Elo curve against regular engines. Its Elo performance is significantly better against stronger opponents than against weaker opponents. Combined with scaling issues, hardware issues, it's pretty useless to talk of Lc0 strength against regular engines in general.

Also, I let play a round round-robin of top 3 + Lc0, and although in individual matches, Lc0 loses to each one of these 3, in round-robin standings it's above Komodo, on the third place. So, the CCCC behavior is somehow confirmed, although my hardware is much weaker and very different. Draw rate of Lc0 is very high in my top 4 tournament too compared to other engines, similar to what happens in CCCC. One of the problems for this "Elo disobedience" is Lc0 endgame play, it fails to convert clear wins in the endgame against strong and weak engines alie. Just saw a completely won K+R+R of Lc0 against K+R of Sf dev, ended in draw. Also, sometimes it blunders clear endgame draws.
I lazily calculated the points of engines against each other. For example LC0 has 4pts vs SF and 4pts against Houdini. You'ld expect some sort of gradation in points as you travel across the crosstable.
I got as far as thinking this looks like lc0 gradient across the cross table is less than the others. But then decided this was a job for Laskos and a spreadsheet ....