TCEC Division 3 results simulator

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: TCEC Division 3 results simulator

Post by chrisw »

Whom would you explain it to?
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: TCEC Division 3 results simulator

Post by Milos »

CMCanavessi wrote: Mon Aug 13, 2018 8:46 pmI really don't care if it wins TCEC or if it finishes last in div4, but the time management issue is real and the leela team really screwed up. Check my stream, I'm currently doing lc0 (on a gtx 1080 non ti) vs SF8 (on 8 cores) and currently they are 6-6. Why do you think leela went from dominating in div4 to playing awful in div3? It's no magic, but also, as you say, excuses are useless. Leela team will learn the lesson for next season.
Fanboys hype.
Come back when you run 200 or 500 games. SF 8 used to loose of engines 300Elo weaker in TCEC. So what?
On my setup Lc0 recent test net on OCed 1060 vs single very slow core SFdev 44% at 10'+6'' TC after 200 games.
After appropriate scaling one could see that SF8 on fast 8 cores should be around 80Elo stronger than the strongest Lc0 test net on 1080.
Lets see, you have equal score and error margins of 120Elo, what a surprise.
Now you can talk about trend... :lol: :lol:
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: TCEC Division 3 results simulator

Post by Milos »

CMCanavessi wrote: Mon Aug 13, 2018 9:19 pm Anyways, DeusX with default settings and the same old network as in div4 is ahead of lc0 with newer (and stronger) net but stupid TM settings. How can you explain that?
Ever heard of error margins? Do you even have a clue how large they are on that kind of sample???
Only 1 conclusion can be made so far from Division 3 games, and that is that Ethereal is head and shoulders (100Elo at least when 2 sigma margin is subtracted) over NN engine(s).
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: TCEC Division 3 results simulator

Post by chrisw »

After 11 games each, the NNs are in the bottom half of the table and the Sim predicted promotion stats are now:

Ethereal 100%
Pedone 34%
Arasan 32%
DeusX 13%
Nemorino 13%
LC0 5%
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: TCEC Division 3 results simulator

Post by jdart »

Note that LC0 and DeuxX are in positions 5 and 6 of the Div 3 standings, right at the moment.

--Jon
Uri Blass
Posts: 10281
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: TCEC Division 3 results simulator

Post by Uri Blass »

Milos wrote: Tue Aug 14, 2018 12:19 am
CMCanavessi wrote: Mon Aug 13, 2018 8:46 pmI really don't care if it wins TCEC or if it finishes last in div4, but the time management issue is real and the leela team really screwed up. Check my stream, I'm currently doing lc0 (on a gtx 1080 non ti) vs SF8 (on 8 cores) and currently they are 6-6. Why do you think leela went from dominating in div4 to playing awful in div3? It's no magic, but also, as you say, excuses are useless. Leela team will learn the lesson for next season.
Fanboys hype.
Come back when you run 200 or 500 games. SF 8 used to loose of engines 300Elo weaker in TCEC. So what?
On my setup Lc0 recent test net on OCed 1060 vs single very slow core SFdev 44% at 10'+6'' TC after 200 games.
After appropriate scaling one could see that SF8 on fast 8 cores should be around 80Elo stronger than the strongest Lc0 test net on 1080.
Lets see, you have equal score and error margins of 120Elo, what a surprise.
Now you can talk about trend... :lol: :lol:
What about the following result with the same net that play on TCEC that I read in another thread:

Leela Lc0 with 10520 net running on 1 GV100, versus Stockfish dev 18080112 running on 6 threads i7 5820k @4.2 GHz

TC was 2+2 and Arena perfect2010 opening book was used and the result was:
Leela 10520 - SF dev +8 -8 =69
0 ELo difference with confidence interval [-32 , +32] (95% c.l).


I know that lc0 has some hardware advantage relative to TCEC but can other engines in division 3 achieve this type of result with the same hardware advantage?
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: TCEC Division 3 results simulator

Post by chrisw »

After 13 games each, Sim prediction of promotion:
Ethereal 100%
Pedone 41%
Arasan 27%
DeusX 22%
Nemorino 3%
LC0 3%

Code: Select all

 1 Ethereal 10.81    3176 10.5 13 59.00 0 +113 80.8 ·· == 1= 1= 1  1= 11 11
 2 Pedone 1.8        3104  7.5 13 44.50 0  +61 57.7 == ·· == 10 0= =  11 =1
 3 Arasan TCEC13     3142  7.0 13 43.00 0  +26 53.8 0= == ·· 01 1= == == 1 
 4 DeusX 1.0         3200  7.0 13 41.75 0  -19 53.8 0= 01 10 ·· =1 == =  1=
 5 Nemorino 5.01     3104  6.5 13 37.25 0  +31 50.0 0  1= 0= =0 ·· =1 1= 01
 6 lc0 16.10520      3219  6.0 13 35.00 0  -59 46.2 0= =  == == =0 ·· == 1=
 7 Hannibal 20180806 3193  5.0 13 24.75 0  -78 38.5 00 00 == =  0= == ·· 11
 8 Bobcat 8          3072  2.5 13 16.75 0  -75 19.2 00 =0 0  0= 10 0= 00 ··
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: TCEC Division 3 results simulator

Post by Laskos »

chrisw wrote: Tue Aug 14, 2018 10:56 am After 13 games each, Sim prediction of promotion:
Ethereal 100%
Pedone 41%
Arasan 27%
DeusX 22%
Nemorino 3%
LC0 3%

Code: Select all

 1 Ethereal 10.81    3176 10.5 13 59.00 0 +113 80.8 ·· == 1= 1= 1  1= 11 11
 2 Pedone 1.8        3104  7.5 13 44.50 0  +61 57.7 == ·· == 10 0= =  11 =1
 3 Arasan TCEC13     3142  7.0 13 43.00 0  +26 53.8 0= == ·· 01 1= == == 1 
 4 DeusX 1.0         3200  7.0 13 41.75 0  -19 53.8 0= 01 10 ·· =1 == =  1=
 5 Nemorino 5.01     3104  6.5 13 37.25 0  +31 50.0 0  1= 0= =0 ·· =1 1= 01
 6 lc0 16.10520      3219  6.0 13 35.00 0  -59 46.2 0= =  == == =0 ·· == 1=
 7 Hannibal 20180806 3193  5.0 13 24.75 0  -78 38.5 00 00 == =  0= == ·· 11
 8 Bobcat 8          3072  2.5 13 16.75 0  -75 19.2 00 =0 0  0= 10 0= 00 ··
How are these simulations done? Seem a bit extreme. Gut feeling, an educated gut feeling from a person literally feeling the numbers in an extraordinary tactile and colorful way, like Kai (myself), would say that Lc0 has some 10% to qualify and Deus X some 20%.

:lol:
stavros
Posts: 165
Joined: Tue Dec 02, 2014 1:29 am

Re: TCEC Division 3 results simulator

Post by stavros »

you all forget that leela zero project hasnt saturated yet after 33M training games unlikely A0
i think leela NN is better than A0.give her 2 titan v and its better than A0 with 4TPU
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: TCEC Division 3 results simulator

Post by chrisw »

Laskos wrote: Tue Aug 14, 2018 12:31 pm
chrisw wrote: Tue Aug 14, 2018 10:56 am After 13 games each, Sim prediction of promotion:
Ethereal 100%
Pedone 41%
Arasan 27%
DeusX 22%
Nemorino 3%
LC0 3%

Code: Select all

 1 Ethereal 10.81    3176 10.5 13 59.00 0 +113 80.8 ·· == 1= 1= 1  1= 11 11
 2 Pedone 1.8        3104  7.5 13 44.50 0  +61 57.7 == ·· == 10 0= =  11 =1
 3 Arasan TCEC13     3142  7.0 13 43.00 0  +26 53.8 0= == ·· 01 1= == == 1 
 4 DeusX 1.0         3200  7.0 13 41.75 0  -19 53.8 0= 01 10 ·· =1 == =  1=
 5 Nemorino 5.01     3104  6.5 13 37.25 0  +31 50.0 0  1= 0= =0 ·· =1 1= 01
 6 lc0 16.10520      3219  6.0 13 35.00 0  -59 46.2 0= =  == == =0 ·· == 1=
 7 Hannibal 20180806 3193  5.0 13 24.75 0  -78 38.5 00 00 == =  0= == ·· 11
 8 Bobcat 8          3072  2.5 13 16.75 0  -75 19.2 00 =0 0  0= 10 0= 00 ··
How are these simulations done? Seem a bit extreme. Gut feeling, an educated gut feeling from a person literally feeling the numbers in an extraordinary tactile and colorful way, like Kai (myself), would say that Lc0 has some 10% to qualify and Deus X some 20%.

:lol:
It's a home brew MonteCarlo sim, I'm still working on it, and agree it's too extreme (working on that right now).
Basically, it takes results so far, then uses elo difference plus randomiser to predict win/loss/draw on the remaining games.
For each MC division completion, it sorts the engines (using total score, strike rate, wins, sb) and then bumps a couple of counters, one for first place and one for either first or second.
Homebrew is the initial elo estimation, then an elo adjust for results so far. Those are critical; and how much one dares to adjust the elo for the prediction process is very critical. Too much adjust and we get too much magnification for engines already doing well. But the more games played so far, the more I can dare adjust the tournament elo. Homebrew also is the win/loss/draw allocator. Win/loss is pretty easy but percentage draw allocation not so easy. Homebrew also is an attempt at including into the LC0 initial elo the time control weakener, and into Nemorino elo, the tendency to hang.