TCEC Division 3 results simulator
Moderators: hgm, Rebel, chrisw
-
- Posts: 4319
- Joined: Tue Apr 03, 2012 4:28 pm
Re: TCEC Division 3 results simulator
Whom would you explain it to?
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: TCEC Division 3 results simulator
Fanboys hype.CMCanavessi wrote: ↑Mon Aug 13, 2018 8:46 pmI really don't care if it wins TCEC or if it finishes last in div4, but the time management issue is real and the leela team really screwed up. Check my stream, I'm currently doing lc0 (on a gtx 1080 non ti) vs SF8 (on 8 cores) and currently they are 6-6. Why do you think leela went from dominating in div4 to playing awful in div3? It's no magic, but also, as you say, excuses are useless. Leela team will learn the lesson for next season.
Come back when you run 200 or 500 games. SF 8 used to loose of engines 300Elo weaker in TCEC. So what?
On my setup Lc0 recent test net on OCed 1060 vs single very slow core SFdev 44% at 10'+6'' TC after 200 games.
After appropriate scaling one could see that SF8 on fast 8 cores should be around 80Elo stronger than the strongest Lc0 test net on 1080.
Lets see, you have equal score and error margins of 120Elo, what a surprise.
Now you can talk about trend...
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: TCEC Division 3 results simulator
Ever heard of error margins? Do you even have a clue how large they are on that kind of sample???CMCanavessi wrote: ↑Mon Aug 13, 2018 9:19 pm Anyways, DeusX with default settings and the same old network as in div4 is ahead of lc0 with newer (and stronger) net but stupid TM settings. How can you explain that?
Only 1 conclusion can be made so far from Division 3 games, and that is that Ethereal is head and shoulders (100Elo at least when 2 sigma margin is subtracted) over NN engine(s).
-
- Posts: 4319
- Joined: Tue Apr 03, 2012 4:28 pm
Re: TCEC Division 3 results simulator
After 11 games each, the NNs are in the bottom half of the table and the Sim predicted promotion stats are now:
Ethereal 100%
Pedone 34%
Arasan 32%
DeusX 13%
Nemorino 13%
LC0 5%
Ethereal 100%
Pedone 34%
Arasan 32%
DeusX 13%
Nemorino 13%
LC0 5%
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: TCEC Division 3 results simulator
Note that LC0 and DeuxX are in positions 5 and 6 of the Div 3 standings, right at the moment.
--Jon
--Jon
-
- Posts: 10297
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: TCEC Division 3 results simulator
What about the following result with the same net that play on TCEC that I read in another thread:Milos wrote: ↑Tue Aug 14, 2018 12:19 amFanboys hype.CMCanavessi wrote: ↑Mon Aug 13, 2018 8:46 pmI really don't care if it wins TCEC or if it finishes last in div4, but the time management issue is real and the leela team really screwed up. Check my stream, I'm currently doing lc0 (on a gtx 1080 non ti) vs SF8 (on 8 cores) and currently they are 6-6. Why do you think leela went from dominating in div4 to playing awful in div3? It's no magic, but also, as you say, excuses are useless. Leela team will learn the lesson for next season.
Come back when you run 200 or 500 games. SF 8 used to loose of engines 300Elo weaker in TCEC. So what?
On my setup Lc0 recent test net on OCed 1060 vs single very slow core SFdev 44% at 10'+6'' TC after 200 games.
After appropriate scaling one could see that SF8 on fast 8 cores should be around 80Elo stronger than the strongest Lc0 test net on 1080.
Lets see, you have equal score and error margins of 120Elo, what a surprise.
Now you can talk about trend...
Leela Lc0 with 10520 net running on 1 GV100, versus Stockfish dev 18080112 running on 6 threads i7 5820k @4.2 GHz
TC was 2+2 and Arena perfect2010 opening book was used and the result was:
Leela 10520 - SF dev +8 -8 =69
0 ELo difference with confidence interval [-32 , +32] (95% c.l).
I know that lc0 has some hardware advantage relative to TCEC but can other engines in division 3 achieve this type of result with the same hardware advantage?
-
- Posts: 4319
- Joined: Tue Apr 03, 2012 4:28 pm
Re: TCEC Division 3 results simulator
After 13 games each, Sim prediction of promotion:
Ethereal 100%
Pedone 41%
Arasan 27%
DeusX 22%
Nemorino 3%
LC0 3%
Ethereal 100%
Pedone 41%
Arasan 27%
DeusX 22%
Nemorino 3%
LC0 3%
Code: Select all
1 Ethereal 10.81 3176 10.5 13 59.00 0 +113 80.8 ·· == 1= 1= 1 1= 11 11
2 Pedone 1.8 3104 7.5 13 44.50 0 +61 57.7 == ·· == 10 0= = 11 =1
3 Arasan TCEC13 3142 7.0 13 43.00 0 +26 53.8 0= == ·· 01 1= == == 1
4 DeusX 1.0 3200 7.0 13 41.75 0 -19 53.8 0= 01 10 ·· =1 == = 1=
5 Nemorino 5.01 3104 6.5 13 37.25 0 +31 50.0 0 1= 0= =0 ·· =1 1= 01
6 lc0 16.10520 3219 6.0 13 35.00 0 -59 46.2 0= = == == =0 ·· == 1=
7 Hannibal 20180806 3193 5.0 13 24.75 0 -78 38.5 00 00 == = 0= == ·· 11
8 Bobcat 8 3072 2.5 13 16.75 0 -75 19.2 00 =0 0 0= 10 0= 00 ··
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: TCEC Division 3 results simulator
How are these simulations done? Seem a bit extreme. Gut feeling, an educated gut feeling from a person literally feeling the numbers in an extraordinary tactile and colorful way, like Kai (myself), would say that Lc0 has some 10% to qualify and Deus X some 20%.chrisw wrote: ↑Tue Aug 14, 2018 10:56 am After 13 games each, Sim prediction of promotion:
Ethereal 100%
Pedone 41%
Arasan 27%
DeusX 22%
Nemorino 3%
LC0 3%
Code: Select all
1 Ethereal 10.81 3176 10.5 13 59.00 0 +113 80.8 ·· == 1= 1= 1 1= 11 11 2 Pedone 1.8 3104 7.5 13 44.50 0 +61 57.7 == ·· == 10 0= = 11 =1 3 Arasan TCEC13 3142 7.0 13 43.00 0 +26 53.8 0= == ·· 01 1= == == 1 4 DeusX 1.0 3200 7.0 13 41.75 0 -19 53.8 0= 01 10 ·· =1 == = 1= 5 Nemorino 5.01 3104 6.5 13 37.25 0 +31 50.0 0 1= 0= =0 ·· =1 1= 01 6 lc0 16.10520 3219 6.0 13 35.00 0 -59 46.2 0= = == == =0 ·· == 1= 7 Hannibal 20180806 3193 5.0 13 24.75 0 -78 38.5 00 00 == = 0= == ·· 11 8 Bobcat 8 3072 2.5 13 16.75 0 -75 19.2 00 =0 0 0= 10 0= 00 ··
-
- Posts: 165
- Joined: Tue Dec 02, 2014 1:29 am
Re: TCEC Division 3 results simulator
you all forget that leela zero project hasnt saturated yet after 33M training games unlikely A0
i think leela NN is better than A0.give her 2 titan v and its better than A0 with 4TPU
i think leela NN is better than A0.give her 2 titan v and its better than A0 with 4TPU
-
- Posts: 4319
- Joined: Tue Apr 03, 2012 4:28 pm
Re: TCEC Division 3 results simulator
It's a home brew MonteCarlo sim, I'm still working on it, and agree it's too extreme (working on that right now).Laskos wrote: ↑Tue Aug 14, 2018 12:31 pmHow are these simulations done? Seem a bit extreme. Gut feeling, an educated gut feeling from a person literally feeling the numbers in an extraordinary tactile and colorful way, like Kai (myself), would say that Lc0 has some 10% to qualify and Deus X some 20%.chrisw wrote: ↑Tue Aug 14, 2018 10:56 am After 13 games each, Sim prediction of promotion:
Ethereal 100%
Pedone 41%
Arasan 27%
DeusX 22%
Nemorino 3%
LC0 3%
Code: Select all
1 Ethereal 10.81 3176 10.5 13 59.00 0 +113 80.8 ·· == 1= 1= 1 1= 11 11 2 Pedone 1.8 3104 7.5 13 44.50 0 +61 57.7 == ·· == 10 0= = 11 =1 3 Arasan TCEC13 3142 7.0 13 43.00 0 +26 53.8 0= == ·· 01 1= == == 1 4 DeusX 1.0 3200 7.0 13 41.75 0 -19 53.8 0= 01 10 ·· =1 == = 1= 5 Nemorino 5.01 3104 6.5 13 37.25 0 +31 50.0 0 1= 0= =0 ·· =1 1= 01 6 lc0 16.10520 3219 6.0 13 35.00 0 -59 46.2 0= = == == =0 ·· == 1= 7 Hannibal 20180806 3193 5.0 13 24.75 0 -78 38.5 00 00 == = 0= == ·· 11 8 Bobcat 8 3072 2.5 13 16.75 0 -75 19.2 00 =0 0 0= 10 0= 00 ··
Basically, it takes results so far, then uses elo difference plus randomiser to predict win/loss/draw on the remaining games.
For each MC division completion, it sorts the engines (using total score, strike rate, wins, sb) and then bumps a couple of counters, one for first place and one for either first or second.
Homebrew is the initial elo estimation, then an elo adjust for results so far. Those are critical; and how much one dares to adjust the elo for the prediction process is very critical. Too much adjust and we get too much magnification for engines already doing well. But the more games played so far, the more I can dare adjust the tournament elo. Homebrew also is the win/loss/draw allocator. Win/loss is pretty easy but percentage draw allocation not so easy. Homebrew also is an attempt at including into the LC0 initial elo the time control weakener, and into Nemorino elo, the tendency to hang.