LCZero: Progress and Scaling. Relation to CCRL Elo

Guenther · Post by **Guenther** » Sun Apr 01, 2018 12:22 pm

sovaz1997 wrote:Hi! Where to put on weights for the neurual networks lczero? I cann't run the UCI-engine (Output: "A network weights file is requied to the problem"). Thanks!

Same folder and you have to specify it on the commandline.
Moreover the network file itself is compressed and you first have to decompress it (e.g. 7z).

sovaz1997 · Post by **sovaz1997** » Sun Apr 01, 2018 12:36 pm

Tkanks! It works

Laskos · Post by **Laskos** » Sun Apr 01, 2018 6:22 pm

Uri Blass wrote:
Laskos wrote:
Laskos wrote:
Uri Blass wrote:
Laskos wrote:
CMCanavessi wrote:Kai, how much nps are you getting running 4 CPU threads? Would it be possible to estimate how much you would get running around 43 like TCEC does, and the approx. strenght?
NPS on 4 CPU are from 1500 to 5000 or more, depending on position. Also, it increases with allotted time, stabilizes after say 30 seconds or so on a position. I think this NPS is comparable to a good GPU NPS. I think MCTS search parallelize very well, so I would expect from 4 to 43 cores an improvement of 200-300 ELO points, depending on time control. Also, in TCEC LTC conditions, as LCZero seems to improve with time control (scales better than standard engines), I expect it to be at at least 2300 Elo level (CCRL), probably more.
Weak engines do not scale well based on my experience.

It means that if engine A at 10 seconds per move is at the same level as engine B at 1 seconds per move then
Probably engine A need more then 100 seconds per move to be at the same level as engine B at 10 seconds per move.

I think that it may be interesting to test LCzero against stockfish with fixed number of nodes per move.

First find a number of nodes K that LCZero at 1 second per move is at the same level as stockfish at K nodes per move and after it test LCZero at 10 seconds per move against stockfish at 10K nodes per move to see who scales better.

I suggest K nodes per move for stockfish because I am sure that today LCZero is too weak to beat Stockfish even at 100:1 time handicap and hopefully 10K nodes per move is near 10 times slower than K nodes per move.
Well, it will be compared to weak engines, not to top ones. But I will check your theory (which seems valid to me):
I managed to equal the strength:

SF9 1000 nodes/move vs LCZero 0.25s/move:
55.5: 44.5

Now I will test
SF9 8000 nodes/move vs LCZero 2.0s/move
?

3 doublings in time (factor of 8). It will take some time, I will post later the result.
SF9 1000 nodes/move vs LCZero 0.25s/move:
55.5 : 44.5

SF9 8000 nodes/move vs LCZero 2.0s/move
83.5 : 16.5

So, SF9 indeed scales significantly better, as you thought. But keep in mind that the base time for SF9 is about 1ms, 250 times or so smaller than the base time of LCZero. It's a bit comparing apples to oranges, the doublings at hugely shorter TC give obviously more Elo. Comparing the scaling with modern similar in strength engines like Zurichess App. and Predateur at similar time controls for all gives better scaling for LCZero. My guess is that if LCZero becomes comparable in strength to SF9 at same time used, it will scale better than SF9. But let's see.
I think that for correspondence players it may be interesting if some weaker program scales better than stockfish in this type of test because in that case there is a reason to use it for analysis(hoping that using 24 hours for that program in some position is better even if in rating list it is worse because no rating list use 24 hours per move).

I think that we are both right. Where LC0 is weaker (say Elo), LC0 should be compared to weak (Elo) engines, and it scales well comparatively. Also, where LC0 is strong (particular case of opening positional understanding), it should be compared to strong engines on the same issue:

With the same opening positional test suite I presented before:

20s/position:

Code: Select all

Fritz 15       (3227 CCRL)     :     102       200   51.0      1.9     20.0  openings200beta07.epd  
LCZero  *************  ID69    :      98       200   49.0      2.7     20.0  openings200beta07.epd

60s/position

Code: Select all

Fritz 15       (3227 CCRL)     :     109       200   54.5      6.8     60.0  openings200beta07.epd
LCZero  *************  ID69    :     108       200   54.0      4.1     60.0  openings200beta07.epd

On this issue, Fritz 15 improves by 7, LC0 by 10. It is within error margins, but one can probably say that LC0 is scaling on this topic at least as well as Fritz 15, and on very long analysis it might be better to use LC0 than Fritz 15.

Laskos · Post by **Laskos** » Mon Apr 02, 2018 2:28 pm

Laskos wrote:I tested this morning the latest network, ID69, and compared to 2 days older network ID56, the latest performs significantly better in my opening positional suite.

Code: Select all

[Search parameters: MaxDepth=99   MaxTime=20.0   DepthDelta=2   MinDepth=7   MinTime=0.1] 

Engine                         : Correct  TotalPos  Corr%  AveT(s)  MaxT(s)  TestFile 
      
Komodo 10.2 64-bit             :     145       200   72.5      2.0     20.0  openings200beta07.epd 
Houdini 5.01 Pro x64           :     144       200   72.0      2.4     20.0  openings200beta07.epd    
Stockfish 8 64 BMI2            :     141       200   70.5      2.0     20.0  openings200beta07.epd 
Houdini 5.01 Pro x64 Tactical  :     139       200   69.5      2.3     20.0  openings200beta07.epd      
Deep Shredder 13 x64           :     128       200   64.0      2.7     20.0  openings200beta07.epd    
Houdini 4 Pro x64              :     126       200   63.0      1.8     20.0  openings200beta07.epd    
Andscacs 0.88n                 :     123       200   61.5      2.4     20.0  openings200beta07.epd 
Houdini 4 Pro x64 Tactical     :     120       200   60.0      1.6     20.0  openings200beta07.epd 
Nirvanachess 2.3               :     119       200   59.5      1.8     20.0  openings200beta07.epd 
Fire 5 x64                     :     110       200   55.0      3.0     20.0  openings200beta07.epd    
Texel 1.06 64-bit              :     110       200   55.0      1.6     20.0  openings200beta07.epd    
Fritz 15       (3227)          :     102       200   51.0      1.9     20.0  openings200beta07.epd  

LCZero  *************  ID69    :      98       200   49.0      2.7     20.0  openings200beta07.epd 
  
Fruit 2.1      (2685)          :      91       200   45.5      1.5     20.0  openings200beta07.epd  
Sjaak II 1.3.1 (2194)          :      75       200   37.5      4.0     20.0  openings200beta07.epd    
BikJump v2.01  (2098)          :      74       200   37.0      1.6     20.0  openings200beta07.epd

Maximum time was 20s/position.
LC0 seems already close to very strong engines in this opening suite. At this pace of advancement in positional understanding, I will be very curious how it develops.

I have found something interesting about STS suite (1500 positions), and confirmed the scaling behavior. STS suite always seemed to me as over-analyzed by Rybka (1.0b?) engine and maybe some other engines. In my opening positional test suite (Openings200beat07.epd, 200 positions), I used engines only to eliminate tactical positions and to check that engines vary on move selection. But the positions and solutions were selected mostly according to huge mostly human games databases (often restricted to FIDE Elo above 2200 or so) and outcomes.

In my opening suite, LC0 (ID69) came at 20s/position significantly above Fruit 2.1, close to Fritz 15:

Code: Select all

[Search parameters: MaxDepth=99   MaxTime=20.0   DepthDelta=2   MinDepth=7   MinTime=0.1] 

Engine                         : Correct  TotalPos  Corr%  AveT(s)  MaxT(s)  TestFile 
      
Komodo 10.2 64-bit             :     145       200   72.5      2.0     20.0  openings200beta07.epd 
Houdini 5.01 Pro x64           :     144       200   72.0      2.4     20.0  openings200beta07.epd    
Stockfish 8 64 BMI2            :     141       200   70.5      2.0     20.0  openings200beta07.epd 
Houdini 5.01 Pro x64 Tactical  :     139       200   69.5      2.3     20.0  openings200beta07.epd      
Deep Shredder 13 x64           :     128       200   64.0      2.7     20.0  openings200beta07.epd    
Houdini 4 Pro x64              :     126       200   63.0      1.8     20.0  openings200beta07.epd    
Andscacs 0.88n                 :     123       200   61.5      2.4     20.0  openings200beta07.epd 
Houdini 4 Pro x64 Tactical     :     120       200   60.0      1.6     20.0  openings200beta07.epd 
Nirvanachess 2.3               :     119       200   59.5      1.8     20.0  openings200beta07.epd 
Fire 5 x64                     :     110       200   55.0      3.0     20.0  openings200beta07.epd    
Texel 1.06 64-bit              :     110       200   55.0      1.6     20.0  openings200beta07.epd    
Fritz 15       (3227 CCRL)     :     102       200   51.0      1.9     20.0  openings200beta07.epd  

LCZero  *************  ID69    :      98       200   49.0      2.7     20.0  openings200beta07.epd 
  
Fruit 2.1      (2685 CCRL)     :      91       200   45.5      1.5     20.0  openings200beta07.epd  
Sjaak II 1.3.1 (2194 CCRL)     :      75       200   37.5      4.0     20.0  openings200beta07.epd    
BikJump v2.01  (2098 CCRL)     :      74       200   37.0      1.6     20.0  openings200beta07.epd

I filtered STS 1500 position for positions containing 28-32 men, i.e. opening and early middlegame positions. There are 209 of them. On this suite of 209 positions, I expected similar results to those with my suite, but it is not so. LC0 comes the level of BikJump v2.01, about 2100 CCRL Elo level, significantly below Fruit 2.1 (CCRL about 2700 Elo level):

STS (209 positions)

5s/position

Code: Select all

[Search parameters: MaxDepth=99   MaxTime=5.0   DepthDelta=2   MinDepth=7   MinTime=0.1] 
Fruit 2.1    (2685) :    score=145/209 [averages on correct positions: depth=5.2 time=0.33 nodes=704167]
BikJump 2.01 (2098) :    score=113/209 [averages on correct positions: depth=4.9 time=0.71 nodes=1710508]
LC0 ID 69 *******   :    score=107/209 [averages on correct positions: depth=14.7 time=0.70 nodes=867]

20s/position

Code: Select all

[Search parameters: MaxDepth=99   MaxTime=20.0   DepthDelta=2   MinDepth=7   MinTime=0.1] 
Fruit 2.1    (2685) :    score=164/209 [averages on correct positions: depth=6.0 time=1.52 nodes=3162284]
LC0 ID 69 *******   :    score=128/209 [averages on correct positions: depth=15.0 time=1.82 nodes=2253]
BikJump 2.01 (2098) :    score=126/209 [averages on correct positions: depth=5.7 time=2.53 nodes=6359373]

Remark about the scaling: from 5s to 20s, Fruit 2.1 improves by 19 points, LC0 by 21 points and BikJump by 13 points. As on this suite, LC0 should be compared to BikJump (similar performance), the scaling of LC0 is significantly better that that of BikJump.

It is possible that many STS solutions are derived from engine analysis in the same paradigm of PST + Material eval and alpha-beta search. The standard engines might converge artificially on the solutions. That would be some sort of explanation of why my suite (much less engine analyzed) results are very different from LC0 point of view in performance.

peter · Post by **peter** » Tue Apr 03, 2018 7:40 am

Laskos wrote:
peter wrote:Hi Robin!
CheckersGuy wrote:That's indeed a very impressive result but that's probably what neural-nets are good at. It's kind of intresting. Weaker traditional alpha-beta engines are decent at tactics and suffer from bad positional play while with Leela0 it's the other way around
Well, I'd admit, that the opening has become better, but I yet wouldn't call that good positional play:
[pgn]
[Event "?"]
[Site "?"]
[Date "2018.03.31"]
[Round "?"]
[White "CuckooChess 1.13a9"]
[Black "play.lczero.org"]
[ECO "C50"]
[Result "1-0"]

1. e4 e5 2. Nf3 Nc6 3. Bc4 d6 4. O-O Be7 5. d4 Nxd4 6. Nxd4
exd4 7. Qh5 g6 8. Qd5 Be6 9. Qxb7 Nf6 10. Bxe6 fxe6 11. Rd1
e5 12. Qc6+ Kf7 13. c3 Rb8 14. cxd4 exd4 15. Rxd4 Rb6
16. Qc2 Rf8 17. e5 Nd7 18. e6+ Kxe6 19. Qc4+ Kf6 20. Bh6
Re8 21. Rf4+ Ke5 22. Qe4# 1-0
[/pgn]

Cuckoochess was running on a SonyXperia with 15"per game and Leela in slow mode. It was before the server was changed to an older computer over weekend but after changing to latest NN-version.
I am not sure, if I understood, the hardware was very weak, and LC0 improves greatly with time and hardware. By move 11. Rd1, LC0 had a better position, although the earlier moves were not nice (but not obviously wrong). I tested this morning the latest network, ID69, and compared to 2 days older network ID56, the latest performs significantly better in my opening positional suite.
Code: Select all
[Search parameters: MaxDepth=99   MaxTime=20.0   DepthDelta=2   MinDepth=7   MinTime=0.1] 

Engine                         : Correct  TotalPos  Corr%  AveT(s)  MaxT(s)  TestFile 
      
Komodo 10.2 64-bit             :     145       200   72.5      2.0     20.0  openings200beta07.epd 
Houdini 5.01 Pro x64           :     144       200   72.0      2.4     20.0  openings200beta07.epd    
Stockfish 8 64 BMI2            :     141       200   70.5      2.0     20.0  openings200beta07.epd 
Houdini 5.01 Pro x64 Tactical  :     139       200   69.5      2.3     20.0  openings200beta07.epd      
Deep Shredder 13 x64           :     128       200   64.0      2.7     20.0  openings200beta07.epd    
Houdini 4 Pro x64              :     126       200   63.0      1.8     20.0  openings200beta07.epd    
Andscacs 0.88n                 :     123       200   61.5      2.4     20.0  openings200beta07.epd 
Houdini 4 Pro x64 Tactical     :     120       200   60.0      1.6     20.0  openings200beta07.epd 
Nirvanachess 2.3               :     119       200   59.5      1.8     20.0  openings200beta07.epd 
Fire 5 x64                     :     110       200   55.0      3.0     20.0  openings200beta07.epd    
Texel 1.06 64-bit              :     110       200   55.0      1.6     20.0  openings200beta07.epd    
Fritz 15       (3227)          :     102       200   51.0      1.9     20.0  openings200beta07.epd  

LCZero  *************  ID69    :      98       200   49.0      2.7     20.0  openings200beta07.epd 
  
Fruit 2.1      (2685)          :      91       200   45.5      1.5     20.0  openings200beta07.epd  

LCZero  *************  ID56    :      90       200   45.0      1.7     20.0  openings200beta07.epd 
  
Sjaak II 1.3.1 (2194)          :      75       200   37.5      4.0     20.0  openings200beta07.epd    
BikJump v2.01  (2098)          :      74       200   37.0      1.6     20.0  openings200beta07.epd
Maximum time was 20s/position.
LC0 seems already close to very strong engines in this opening suite. At this pace of advancement in positional understanding, I will be very curious how it develops.

Maybe you should take positions like the one appearing after

1. e4 e5
2. f4 exf4
3. Nf3 Nf6
4. e5 Ng4
5. h3

into your test suite, at least as long as the latest build repeats playing that with Black after 1.e4 always coming up with 5...Nxe5? for sure in slow mode.
I don't think this is good positional play, but of course you'll say, it's simply tactical blunder.

Edit: What a pity, 1...e6 now after 1.e4, somebody must have shown my post to Leela.

Evert · Post by **Evert** » Tue Apr 03, 2018 9:21 am

So, here's a question for you.

My son likes to play chess, but I'm not always available to play with him (and much better than he is anyway, so we have to play some sort of handicap to make it remotely interesting), and his sister isn't always available either (he's again better than she is,although I find her quite good considering she's five). So sometimes he plays the computer. It's an old AMD machine, running Linux, which has some training version of Fritz that seems ok for him, and he sometimes plays SjaakII, which has random and static ordering modes to make it weaker. The downside of these programs still is that the type of errors they make are not human-like, so they're not so good for him to play against and learn from (at this stage).

Would Leela be a more suitable opponent?

GregNeto · Post by **GregNeto** » Tue Apr 03, 2018 11:17 am

Your son can try
http://play.lczero.org/

Slow mode (2000 playouts) should be around human Elo 1700, fast mode (200 playouts) around Elo 1400 when a human thinks without time pressure (all estimates for network 69).

Lczero likes the bishop, open lines, advanced pawns and good developement. It does not care much about material (giving a pawn is normal). Tactical skills are not yet much developed.

Werewolf · Post by **Werewolf** » Tue Apr 03, 2018 11:23 am

GregNeto wrote:Your son can try
http://play.lczero.org/

Slow mode (2000 playouts) should be around human Elo 1700, fast mode (200 playouts) around Elo 1400 when a human thinks without time pressure (all estimates for network 69).

Lczero likes the bishop, open lines, advanced pawns and good developement. It does not care much about material (giving a pawn is normal). Tactical skills are not yet much developed.

From my experiments

http://www.talkchess.com/forum/viewtopic.php?t=66956

I'd have said a little above 1700 elo.

Vizvezdenec · Post by **Vizvezdenec** » Tue Apr 03, 2018 12:30 pm

Nope.
Not because of playing strength but because of extremely low variety of play...
What I mean by that, me, being really bad player, lost first 5-10 games to LCZero and then I found a variation which leads in 80% of cases to LCzero saccing a pawn for unexistant attack and other 20% lead to LCzero completely missing tactics and getting mated while taking rook+pawn for a knight.
So in terms of being good training opponent it's really bad - it will teach your son just to find holes in it play which it repeats like 80% of times and wouldn't really improve his play.
Lichess fish at some skill level always plays differently in first five moves while LCzero can repeat like 10-15 moves variations every single games, so 1st one is much more suitable for training.

whereagles · Post by **whereagles** » Tue Apr 03, 2018 8:20 pm

Leela is black.. 2.36% chance to win on a lone king??

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo