TCEC Division 3 results simulator

Milos · Post by **Milos** » Tue Aug 14, 2018 2:32 pm

Uri Blass wrote: ↑Tue Aug 14, 2018 7:02 am
Milos wrote: ↑Tue Aug 14, 2018 12:19 am
CMCanavessi wrote: ↑Mon Aug 13, 2018 8:46 pmI really don't care if it wins TCEC or if it finishes last in div4, but the time management issue is real and the leela team really screwed up. Check my stream, I'm currently doing lc0 (on a gtx 1080 non ti) vs SF8 (on 8 cores) and currently they are 6-6. Why do you think leela went from dominating in div4 to playing awful in div3? It's no magic, but also, as you say, excuses are useless. Leela team will learn the lesson for next season.
Fanboys hype.
Come back when you run 200 or 500 games. SF 8 used to loose of engines 300Elo weaker in TCEC. So what?
On my setup Lc0 recent test net on OCed 1060 vs single very slow core SFdev 44% at 10'+6'' TC after 200 games.
After appropriate scaling one could see that SF8 on fast 8 cores should be around 80Elo stronger than the strongest Lc0 test net on 1080.
Lets see, you have equal score and error margins of 120Elo, what a surprise.
Now you can talk about trend...
What about the following result with the same net that play on TCEC that I read in another thread:

Leela Lc0 with 10520 net running on 1 GV100, versus Stockfish dev 18080112 running on 6 threads i7 5820k @4.2 GHz

TC was 2+2 and Arena perfect2010 opening book was used and the result was:
Leela 10520 - SF dev +8 -8 =69
0 ELo difference with confidence interval [-32 , +32] (95% c.l).

I know that lc0 has some hardware advantage relative to TCEC but can other engines in division 3 achieve this type of result with the same hardware advantage?

In this particular case GV100 (we don't know which) is roughly around 8x compared to my OC'ed 1060 and 6 threads on i7 5820@4.2 GHz is around 8x in nps compared to my single core. My TC (10'+6'') is around 4x longer than 2'+2'' so scaling should be roughly the same. So this 40 Elo discrepancy between my result and this one is totally within error margins once SMP loss when going from 1 to 6 cores is included. I would say it is even spot on.
Problem with Carlos' results is that they are on way too small sample and totally off the expected value and he is just hyping on them.

And problem with comparison of GV100 and some random CPU is that current SFdev is still +100Elo on for example Ryzen 7 1700 compared to strongest Lc0 on GV100 and GV100 is more than 10x in price and at least 5x in TDP compared to Ryzen 7 1700 so it is totally unfair competition.

chrisw · Post by **chrisw** » Tue Aug 14, 2018 3:20 pm

Laskos wrote: ↑Tue Aug 14, 2018 12:31 pm
chrisw wrote: ↑Tue Aug 14, 2018 10:56 am After 13 games each, Sim prediction of promotion:
Ethereal 100%
Pedone 41%
Arasan 27%
DeusX 22%
Nemorino 3%
LC0 3%
Code: Select all
 1 Ethereal 10.81    3176 10.5 13 59.00 0 +113 80.8 ·· == 1= 1= 1  1= 11 11
 2 Pedone 1.8        3104  7.5 13 44.50 0  +61 57.7 == ·· == 10 0= =  11 =1
 3 Arasan TCEC13     3142  7.0 13 43.00 0  +26 53.8 0= == ·· 01 1= == == 1 
 4 DeusX 1.0         3200  7.0 13 41.75 0  -19 53.8 0= 01 10 ·· =1 == =  1=
 5 Nemorino 5.01     3104  6.5 13 37.25 0  +31 50.0 0  1= 0= =0 ·· =1 1= 01
 6 lc0 16.10520      3219  6.0 13 35.00 0  -59 46.2 0= =  == == =0 ·· == 1=
 7 Hannibal 20180806 3193  5.0 13 24.75 0  -78 38.5 00 00 == =  0= == ·· 11
 8 Bobcat 8          3072  2.5 13 16.75 0  -75 19.2 00 =0 0  0= 10 0= 00 ··
How are these simulations done? Seem a bit extreme. Gut feeling, an educated gut feeling from a person literally feeling the numbers in an extraordinary tactile and colorful way, like Kai (myself), would say that Lc0 has some 10% to qualify and Deus X some 20%.

Kai,
Thanks for this btw, it prodded me to take a closer look. The reason LC0 and Nemorino score lower in the Sim than one might expect, is because I did an adjust on their initial given elo to reflect:
a) Nemorino tendency to hang, shown in a couple of games
b) LC0 supposedly playing weaker than Division4 because of poor time control handler changes (allegedly)

I've also made a few assessments of my own gut and adjusted a little the other initial elos too. Will print up the Initial elo and the tournament calculated elo when posting next, you're welcome to suggest different figures

chrisw · Post by **chrisw** » Tue Aug 14, 2018 3:38 pm

57 games played, Sim predictions to promote:
Ethereal 100%
Pedone 45%
DeusX 24%
Arasan 22%
LC0 4%
Nemorino 2%

Code: Select all

 1 Ethereal 10.81    3176 11.5 14 70.25 0 +126 82.1 ·· == 1= 1= 1= 11 11 11
 2 Pedone 1.8        3104  8.0 14 51.75 0  +66 57.1 == ·· 10 == == 0= 11 =1
 3 DeusX 1.0         3200  7.5 14 47.50 0  -19 53.6 0= 01 ·· 10 == =1 == 1=
 4 Arasan TCEC13     3142  7.5 14 47.50 0  +23 53.6 0= == 01 ·· == 1= == 1=
 5 lc0 16.10520      3219  6.5 14 42.00 0  -64 46.4 0= == == == ·· =0 == 1=
 6 Nemorino 5.01     3104  6.5 14 40.50 0  +18 46.4 00 1= =0 0= =1 ·· 1= 01
 7 Hannibal 20180806 3193  5.5 14 30.75 0  -78 39.3 00 00 == == == 0= ·· 11
 8 Bobcat 8          3072  3.0 14 21.25 0  -72 21.4 00 =0 0= 0= 0= 10 00 ··

Code: Select all

Division 3,
Column 'First' is chance of winning, Column 'First two' is chance of being in first two
Engine,   Tournament Elo, Initial Elo, First, First two
Ethereal       3333         3300       0.999  1.000
Pedone         3160         3130       0.001  0.454
DeusX          3145         3130       0.000  0.247
Arasan         3147         3134       0.000  0.220
lc0            3122         3140       0.000  0.047
Nemorino       3104         3099       0.000  0.027
Hannibal       3076         3100       0.000  0.005
Bobcat         2974         3029       0.000  0.000

Guenther · Post by **Guenther** » Tue Aug 14, 2018 4:25 pm

chrisw wrote: ↑Tue Aug 14, 2018 3:20 pm
a) Nemorino tendency to hang, shown in a couple of games

This should be zero since quite a while.
First the threads were reduced to 16. Later even syzygy were disabled (both losses were with very high tbs hits being black)

chrisw · Post by **chrisw** » Tue Aug 14, 2018 5:14 pm

Guenther wrote: ↑Tue Aug 14, 2018 4:25 pm
chrisw wrote: ↑Tue Aug 14, 2018 3:20 pm
a) Nemorino tendency to hang, shown in a couple of games

This should be zero since quite a while.
First the threads were reduced to 16. Later even syzygy were disabled (both losses were with very high tbs hits being black)

oh, okay, thanks. Good news. I'll go back to the CCRL figure. Are you the programmer? Or, if you have a better figure taking all into account, I can use that.

chrisw · Post by **chrisw** » Tue Aug 14, 2018 5:21 pm

Guenther wrote: ↑Tue Aug 14, 2018 4:25 pm
chrisw wrote: ↑Tue Aug 14, 2018 3:20 pm
a) Nemorino tendency to hang, shown in a couple of games

This should be zero since quite a while.
First the threads were reduced to 16. Later even syzygy were disabled (both losses were with very high tbs hits being black)

sorry, one other question. are TCEC counting the disconnects as strikes under the tiebreak rule? because I also coded disconnects=2 which would currently cause the sim to give automatic loss under the tie break rule, should I put those back to disconnects=0, or leave in?

Guenther · Post by **Guenther** » Tue Aug 14, 2018 5:47 pm

chrisw wrote: ↑Tue Aug 14, 2018 5:21 pm
Guenther wrote: ↑Tue Aug 14, 2018 4:25 pm
chrisw wrote: ↑Tue Aug 14, 2018 3:20 pm
a) Nemorino tendency to hang, shown in a couple of games

This should be zero since quite a while.
First the threads were reduced to 16. Later even syzygy were disabled (both losses were with very high tbs hits being black)
sorry, one other question. are TCEC counting the disconnects as strikes under the tiebreak rule? because I also coded disconnects=2 which would currently cause the sim to give automatic loss under the tie break rule, should I put those back to disconnects=0, or leave in?

If I read the rules correctly, crashes (the disconnects are practically crashes) are in fact counted as a first tie break rule (after the points).
Thanks for mentioning that, so far I had never thought about it, this means there should be still some penalty in case of being equal with
place 2 and it needs a half point more than all others to promote.
(I take place 1 already for granted for Ethereal.

I was lurking a few times at TCEC and immediately suggested thread reducing and/or disabling syzygy.
Obviously TCEC first reduced the threads and it had no more crashes, but the author wanted to be sure
and asked for removing syzygy too for safety (reason already mentioned).
That's my info.

For your other question. No, I just share one name with Nemorinos' author.
( I was running RWBC and was involved in some other cc projects/activities in the past and maintain the XB/UCI chronology)

chrisw · Post by **chrisw** » Tue Aug 14, 2018 6:55 pm

Guenther wrote: ↑Tue Aug 14, 2018 5:47 pm
chrisw wrote: ↑Tue Aug 14, 2018 5:21 pm
Guenther wrote: ↑Tue Aug 14, 2018 4:25 pm
chrisw wrote: ↑Tue Aug 14, 2018 3:20 pm
a) Nemorino tendency to hang, shown in a couple of games

This should be zero since quite a while.
First the threads were reduced to 16. Later even syzygy were disabled (both losses were with very high tbs hits being black)
sorry, one other question. are TCEC counting the disconnects as strikes under the tiebreak rule? because I also coded disconnects=2 which would currently cause the sim to give automatic loss under the tie break rule, should I put those back to disconnects=0, or leave in?
If I read the rules correctly, crashes (the disconnects are practically crashes) are in fact counted as a first tie break rule (after the points).
Thanks for mentioning that, so far I had never thought about it, this means there should be still some penalty in case of being equal with
place 2 and it needs a half point more than all others to promote.
(I take place 1 already for granted for Ethereal.

I was lurking a few times at TCEC and immediately suggested thread reducing and/or disabling syzygy.
Obviously TCEC first reduced the threads and it had no more crashes, but the author wanted to be sure
and asked for removing syzygy too for safety (reason already mentioned).
That's my info.

For your other question. No, I just share one name with Nemorinos' author.
( I was running RWBC and was involved in some other cc projects/activities in the past and maintain the XB/UCI chronology)

Okay, thanks. I put Nemorino back to CCRL figure but left in the disconnection count, also the others back to CCRL, Ethereal is good at 3300 initial I think, which left ZeusX and LC0, LC0 reduced for the damaged entry to 3100, DeusX guessed at stronger than the damaged LC0 but around Arasan/Nemorino/Pedone level, say 3130. The tie break rule means Nemorino is effectively penalised by 0.5 pts which is tough.

What would have been the results of it's games, had it not disconnected? From the listing it looked like a win and a loss, so with another point from there it would be around fourth place with maybe 25% chance of promotion. Shame.

This now after 59 games ...

Code: Select all

Engine,   Tournament Elo, Initial Elo, First, First two
Ethereal       3324         3300       0.998  1.000
Pedone         3141         3092       0.001  0.378
DeusX          3149         3130       0.000  0.322
Arasan         3141         3124       0.000  0.230
Nemorino       3101         3099       0.000  0.030
lc0            3104         3100       0.000  0.029
Hannibal       3097         3158       0.000  0.009
Bobcat         2972         3029       0.000  0.000

Guenther · Post by **Guenther** » Tue Aug 14, 2018 7:01 pm

chrisw wrote: ↑Tue Aug 14, 2018 6:55 pm
What would have been the results of it's games, had it not disconnected? With another couple of points from there it would be in second place.

The two lost games by rule would have scored 1/2 (one game was winning and one lost).

chrisw · Post by **chrisw** » Tue Aug 14, 2018 7:08 pm

Guenther wrote: ↑Tue Aug 14, 2018 7:01 pm
chrisw wrote: ↑Tue Aug 14, 2018 6:55 pm
What would have been the results of it's games, had it not disconnected? With another couple of points from there it would be in second place.

The two lost games by rule would have scored 1/2 (one game was winning and one lost).

So Arasan got the extra point, if no disconnect that would have the effect (right now) of reversing Arasan and Nemorino in the results table. 4th and 6th swap. Okay, shudda, wudda, cudda, wot if and so on. Stuff happens.

TCEC Division 3 results simulator

Re: TCEC Division 3 results simulator

Re: TCEC Division 3 results simulator

Re: TCEC Division 3 results simulator

Re: TCEC Division 3 results simulator

Re: TCEC Division 3 results simulator

Re: TCEC Division 3 results simulator

Re: TCEC Division 3 results simulator

Re: TCEC Division 3 results simulator

Re: TCEC Division 3 results simulator

Re: TCEC Division 3 results simulator