Strange Lc0 TCEC performance

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Strange Lc0 TCEC performance

Post by Milos »

Laskos wrote: Wed Aug 15, 2018 12:27 am Yes, weird, more so because Lc0 is known to have LOW draw rate even at fairly strong level. Deus, OTOH has 7. I do not know what is this.
What does that even mean??? Low draw rate at fairly strong level, seriously?
You just invented that claim.
In my test Lc0 on OC'ed 1060 recent net vs. slow single core SFdev, at 1'+0.6'' had 35% and draw rate of 50% and 4'+2.4'' had 44.5% and draw rate of 58.5% and at 10'+6'' had 44.5% and draw rate of 64%.
This is exactly the draw rate once can expect between lets say K and SFdev on single core. And already after 4'+2.4'' TC no more scaling advantage. In all cases 200 games were played.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Strange Lc0 TCEC performance

Post by Laskos »

Milos wrote: Wed Aug 15, 2018 12:40 am
Laskos wrote: Wed Aug 15, 2018 12:27 am Yes, weird, more so because Lc0 is known to have LOW draw rate even at fairly strong level. Deus, OTOH has 7. I do not know what is this.
What does that even mean??? Low draw rate at fairly strong level, seriously?
You just invented that claim.
In my test Lc0 on OC'ed 1060 recent net vs. slow single core SFdev, at 1'+0.6'' had 35% and draw rate of 50% and 4'+2.4'' had 44.5% and draw rate of 58.5% and at 10'+6'' had 44.5% and draw rate of 64%.
This is exactly the draw rate once can expect between lets say K and SFdev on single core. And already after 4'+2.4'' TC no more scaling advantage. In all cases 200 games were played.
Still a bit lower. I usually test at 2' + 2'' with draw rates about 50-55% against equal in strength AB opponent (SF8 on 1 core, for example).
Here are two example matches, toal 80 games:

Code: Select all

ID10694
Score of lc0_v16 10714 vs SF8: 9 - 12 - 19 [0.463]
Elo difference: -26.11 +/- 79.29
40 of 40 games finished.

ID10714:
Score of lc0_v16 10697 vs SF 8: 10 - 7 - 23 [0.537]
Elo difference: 26.11 +/- 69.53
40 of 40 games finished.
This is somewhat lower draw rate than that between SF8 and Komodo 11.3 on one core with no Contempt at this time control (60-63%).
Now clearer?

And anyway, such a high draw rate as to be a complete outlier for Lc0 in TCEC is strange. Still, probably here indeed the 15 games sample is too small.
Branko Radovanovic
Posts: 89
Joined: Sat Sep 13, 2014 4:12 pm
Location: Zagreb, Croatia
Full name: Branko Radovanović

Re: Strange Lc0 TCEC performance

Post by Branko Radovanovic »

Laskos wrote: Wed Aug 15, 2018 12:35 am Both Lc0 (testnet and Deus) in both Div 4 and DIv 3 perform consistently below expectations (although people cheered their sore promotion from Div 4). "Too few games" in some total of 80+ games (yes, different nets whatever, matters less) and under-performance of some 200 Elo points is a marginal argument. ID10520 was never weak in my normal tests. Maybe different nets exhibit different scaling and other weird behavior, I have no much knowledge of that (I only know that 6x64 nets scale worse than current 20x256 nets).
Just did a quick calculation - if I'm not too much off, Lc0's performance in Div4 was at an upper-half-of-Div1 level. While I wasn't surprised by this, I must say that - having run no tests of my own - I have no idea whether that result is "normal" or not, so I can't really comment on that.

My argument was, rather, that the Div3-Div4 discrepancy by itself did have a precedent, and whatever caused it for Ethereal, could have caused it for Leela too. If anything, Ethereal's case should be easier to explain (straight A-B stuff, no NN mumbo jumbo). If Ethereal was not hurt by "too few games" (and possibly accidentally bolstered in Div4 by the same circumstance), what was it then?
chrisw
Posts: 4319
Joined: Tue Apr 03, 2012 4:28 pm

Re: Strange Lc0 TCEC performance

Post by chrisw »

LC0 should be competing for 2nd place. It isn't. It's two games behind right now. To get competing it needs to start winning some games.
You don't win games by drawing at a rate of 80% of games played. Something is broken.
And, in the bottom four, LC0 has Nemorino to contend with, if it wasn't for the unfortunate disconnect, Nemorino would be above LC0 right now.
Okay, everything can change.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Strange Lc0 TCEC performance

Post by Milos »

chrisw wrote: Wed Aug 15, 2018 1:44 am LC0 should be competing for 2nd place. It isn't. It's two games behind right now. To get competing it needs to start winning some games.
You don't win games by drawing at a rate of 80% of games played. Something is broken.
Lol, the only broken thing is the hype and expectations of fanboys.
Div 4 was terrible. Realistically, if Ivanhoe was playing on 43 cores instead of 1 it would have been like Ethereal here and Lc0 performance would look much more reasonable. It would still probably qualify for div 3 but wouldn't be so much overhyped.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: Strange Lc0 TCEC performance

Post by Pio »

Laskos wrote: Wed Aug 15, 2018 12:57 am
Milos wrote: Wed Aug 15, 2018 12:40 am
Laskos wrote: Wed Aug 15, 2018 12:27 am Yes, weird, more so because Lc0 is known to have LOW draw rate even at fairly strong level. Deus, OTOH has 7. I do not know what is this.
What does that even mean??? Low draw rate at fairly strong level, seriously?
You just invented that claim.
In my test Lc0 on OC'ed 1060 recent net vs. slow single core SFdev, at 1'+0.6'' had 35% and draw rate of 50% and 4'+2.4'' had 44.5% and draw rate of 58.5% and at 10'+6'' had 44.5% and draw rate of 64%.
This is exactly the draw rate once can expect between lets say K and SFdev on single core. And already after 4'+2.4'' TC no more scaling advantage. In all cases 200 games were played.
Still a bit lower. I usually test at 2' + 2'' with draw rates about 50-55% against equal in strength AB opponent (SF8 on 1 core, for example).
Here are two example matches, toal 80 games:

Code: Select all

ID10694
Score of lc0_v16 10714 vs SF8: 9 - 12 - 19 [0.463]
Elo difference: -26.11 +/- 79.29
40 of 40 games finished.

ID10714:
Score of lc0_v16 10697 vs SF 8: 10 - 7 - 23 [0.537]
Elo difference: 26.11 +/- 69.53
40 of 40 games finished.
This is somewhat lower draw rate than that between SF8 and Komodo 11.3 on one core with no Contempt at this time control (60-63%).
Now clearer?

And anyway, such a high draw rate as to be a complete outlier for Lc0 in TCEC is strange. Still, probably here indeed the 15 games sample is too small.
Hi Kai, I really like your posts. I think one big problem for the neural networks could be that they are forced to play openings they do not like. I really do not like that they do not play from the start position and I also think that learning should be allowed between the games. It could be that the neural networks are highly optimised to play from the opening position. If they during self play would never play the openings given in tcec because they could avoid the positions why should they play them well? If my hypothesis is right lco should be affected much more than deusX.
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Strange Lc0 TCEC performance

Post by jdart »

I don't think you can make assumptions (w/o testing) about how Arasan or any engine scales from 4 cores to 43 cores.

Plus, Arasan is maybe not a typical engine because it is getting about half the NPS of most of the other a/b searchers in the 43 core setup (I am not sure why).

--Jon
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Strange Lc0 TCEC performance

Post by corres »

Laskos wrote: Wed Aug 15, 2018 12:35 am
Both Lc0 (testnet and Deus) in both Div 4 and DIv 3 perform consistently below expectations (although people cheered their sore promotion from Div 4). "Too few games" in some total of 80+ games (yes, different nets whatever, matters less) and under-performance of some 200 Elo points is a marginal argument. ID10520 was never weak in my normal tests. Maybe different nets exhibit different scaling and other weird behavior, I have no much knowledge of that (I only know that 6x64 nets scale worse than current 20x256 nets).
Maybe the doubled GTX 1080 Ti does not give the expected power on TCEC hardware or Leele (and so DeusX)
can not use more than one GPU effectively.
I think the second case may be the true.
As from CMcanavessy we can known CPU power has no effect on chess power of Leela and its derivative DeusX.
User avatar
Werner
Posts: 2872
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: Strange Lc0 TCEC performance

Post by Werner »

corres wrote: Wed Aug 15, 2018 8:08 am As from CMcanavessy we can known CPU power has no effect on chess power of Leela and its derivative DeusX.

I wonder, why the 15/192 network using CPUs produce here much better results than the new 20/256 network - even when I use more CPUs.
Is it possible the new network is only stronger using a GPU?
Werner
Uri Blass
Posts: 10309
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Strange Lc0 TCEC performance

Post by Uri Blass »

Laskos wrote: Tue Aug 14, 2018 11:27 pm As you all can see, the TCEC performance in DIv 3 of Lc0 with both nets, ID10520 and Deus one, is severely underpar from what we know having ordinary hardware, say reasonable home CPU and GPU.

Code: Select all

 N Engine            Rtng  Pts Gm    SB X  Elo Perf Et  Pe  Ar  De  Lc  Ne  Ha  Bo 

 1 Ethereal 10.81    3176 12.0 15 79.25 0 +127 80.0 ··· ==  1=  1== 1=  11  11  11 
 2 Pedone 1.8        3104  9.0 15 58.25 0  +80 60.0 ==  ··· ==  10  ==  0=  11  =11
 3 Arasan TCEC13     3142  8.5 15 56.75 0  +37 56.7 0=  ==  ··· 01  ==  1=1 ==  1= 
 4 DeusX 1.0         3200  8.0 15 56.75 0  -20 53.3 0== 01  10  ··· ==  =1  ==  1= 
 5 lc0 16.10520      3219  7.0 15 48.25 0  -65 46.7 0=  ==  ==  ==  ··· =0  === 1= 
 6 Nemorino 5.01     3104  6.5 15 44.25 2   +4 43.3 00  1=  0=0 =0  =1  ··· 1=  01 
 7 Hannibal 20180806 3193  6.0 15 36.25 0  -77 40.0 00  00  ==  ==  === 0=  ··· 11 
 8 Bobcat 8          3072  3.0 15 22.75 0  -86 20.0 00  =00 0=  0=  0=  10  00  ···
 
One might say TCEC conditions are simply hard to reproduce, GPUs are not working properly, there were bugs introduced in Div 3, and so on. Few of them stand. The performance in Div 4 was bad too, as almost half of the engines there were not working properly on 43 cores and generally engines were pretty weak (aside IvanHoe, which was running... on one core). Few took Div 4 results at face value, and were already thinking of Div 2 and Div 1 almost certain promotion of Lc0, although it really had troubles already in Div 4. By now, it is likely that Lc0 ID10520 will not promote to Div 2.
I also checked the Lc0 ID10520 time management in TCEC, and is it really so terrible? It might be not optimal, but it is not terrible, and completely wrecking the performance. I guess it might weaken by some 30-40 Elo points at most the performance compared to a better TM, that's all. Are there other bugs? In both ID10520 and Deus?
The remaining thing is TCEC conditions, which are really hard to reproduce (to me at least, impossible). So, I took another approach: match CPU part with GPU part as they are in TCEC by the shown NPS and assumed SMP scaling.
I took the Arasan 21 chess engine, which should be very close in strength to Arasan TCEC, and which I was already using in my gauntlets against AB engines. On my 4 cores, NPS is about 8 times lower than TCEC NPS. Efficiency of the SMP on 43 cores is, even with the best SMP implementation, no higher than 60%-70% (which is very high, by the way, for 43 cores). So, all in all, the "effective speed" (inverse of time-to-strength) of TCEC CPU for Arasan 21 is about 5.0-5.5 that of my CPU. For GPU part, NPS seem to be about 6 times higher in TCEC than on my GPU, and an "effective speed" about 5 times higher (correct me on that one if you know better how to get from NPS speed-up the effective speed-up with 2 GPUs). All in all, I can mimic TCEC conditions to some degree by having Lc0 running on my GPU and Arasan 21 on 4 cores (maybe 5 cores would be even better, but it's not that relevant).

I have chosen time control to be 10 times faster than in TCEC: 3m + 1s.
Partial result:

Code: Select all

Score of lc0_v16 10520 vs Arasan 21: +13  -2  =7 [0.750]
Elo difference: 190.85 +/- 139.14

22 of 40 games finished.
As I surely expected, Lc0 ID10520 destroys Arasan 21 in mimicked TCEC conditions, but on about 5 times slower hardware and at 10 times faster games (a total hardware * time factor of 50). I will let the match end to 40 games, but I have no doubts about the shape of the result. Sure, in TCEC conditions, the draw rate is higher, and Elo difference compresses. Nevertheless, from this, Lc0 ID10520 should be at least the level of Ethereal in DIv 3.

So, what happens? TM, "too few games" and other things seem lame excuses. Is there a serious a bug affecting both Lc0 participants? Is there a hardware misconfiguration, invisible in NPS?
Or something I started to suspect: Lc0 scales badly in this 50x time * hardware configuration?
I read that lc0 changed pruning and I wonder if you used the same number that I read to be 0.604 in the TCEC chat in your tests