New lc0 tune with kiudee tuner

jjoshua2 · Post by **jjoshua2** » Thu May 14, 2020 4:33 pm

The famous kiudee tuner (kudos to him) has had much success on many lc0 networks despite being done with only t58 network. Here were the conditions:

Code: Select all

CPuct / CPuctFactor / CPuctBase / FpuValue / PolicyTemperature using 58613 in time control matches.
Iterations: 2500 (10000 games)
LC0-version: v0.23.1
LC0 options: ID58613, Backend=cudnn/cudnn-fp16, Threads=2
SF options: 6 threads, stockfish_19121008_x64_modern
Hardware: i7-3930K + GTX 1080 for tuning
Nodes/move 1s+0.016s games: 500 ± 50 (SE), 10s+0.16s games: 14k ± 1k (SE)
Time control: Both 1s+0.016s and 10s+0.16s
Book: Chad openings-6ply-1000.pgn (randomly chosen during tuning, sequential for testing)
Tablebases: None during play, 6 piece adjudication

I am redoing using the Leelenstein binary fork, and also tuning the two trade penalty parameters:

Code: Select all

30s+0.4s
2 x 2080 super (cudnn-fp16), Threads=3
LS 14.3 vs BrainFish_200420 23 CPU (because its faster than abrok for me)
6 man TB in search for both (and cutechess adjudication).

So average TC is about 6x and GPU speed is roughly 10x? (I benched max of 84k NPS after about 40s, with 0.4s search I think it's more like 70k at start but very position dependent, some are as low as 35k). Seems to get around 150k to 300k nodes per move (instead of 500 or 14k from previous tune).

jjoshua2 · Post by **jjoshua2** » Thu May 14, 2020 4:34 pm

I am doing 6 games per iteration instead of 4 to reduce noise. Currently at iteration 645 which is thus after 3870 games. Note the red line values are the current best and the orange are kind of a error bars, but real error is greater than this.
I tested 1000 games in selfplay on same TC and GPUs not too far from these settings and they were -10 elo +- 10 elo, which isn't too surprising because trade penalty is known to reduce elo in self play.

Winrate and error bars on winrate where -1 is LS wins all games and 0 is 50% winrate and 1 is SF wins all games. So roughly 59% winrate +-3.5% is expected at the optimum shown here. (Not average of all points shown)
(array([-0.18201357]), array([0.07259538]))

jjoshua2 · Post by **jjoshua2** » Thu May 14, 2020 9:29 pm

For ease of copying, After iteration 674 (Not much change):

Code: Select all

parameters = ["CPuct", "FpuValue", "PolicyTemperature", "CPuctBase", "TradePenalty", "TradePenalty2", "CPuctFactor"]
([4.223124047255782, 0.6161279812397882, 1.4110734534724663, 25847.809028552532, 0.0006370430590132097, 6.087375777149111, 2.789049799649798], -0.1796340617270061 +- 0.06630009)

Dann Corbit · Post by **Dann Corbit** » Thu May 14, 2020 11:15 pm

Thanks for sharing your work.
It is nice to see a *reason* why parameters should be set to certain values.
Otherwise, it feels like poking something with a stick and hoping that it works.

jjoshua2 · Post by **jjoshua2** » Sat May 16, 2020 9:19 pm

Thanks @Dann Corbit! Also neglected to point out the obvious that previous tuning was done with a net that is around 4x smaller/faster and thus weaker, but still doing a lot more nodes here despite the slowdown.

I did some testing of iteration 674 with a 250*2 game gauntlet and it ended up behind kiudee defaults. Here is iteration 805. It's changing very slowly but over many iterations it adds up. FPU and Cpuct are lower and cpuctFactor and policy are higher now.

Note if you open image in a new tab you will see it is high enough resolution to read all the numbers.

jjoshua2 · Post by **jjoshua2** » Wed May 20, 2020 4:18 pm

1185 looks like the red and orange are pretty converged now. But doesn't mean it won't change later, but there's finally a lot of dots everywhere now so its probably good performance here even with other similar setup computers. Cpuctfactor is remarkably close to kiudee's tune. Tradepenalty is about 0.00003, would be better if this was refactored to call it 3 maybe.

corres · Post by **corres** » Thu May 21, 2020 3:06 pm

jjoshua2 wrote: ↑Sat May 16, 2020 9:19 pm ...
Note if you open image in a new tab you will see it is high enough resolution to read all the numbers.

I think it would be more simpler if you line up the new, optimized parameters.

jjoshua2 · Post by **jjoshua2** » Sat May 30, 2020 7:30 pm

corres wrote: ↑Thu May 21, 2020 3:06 pm I think it would be more simpler if you line up the new, optimized parameters.

Iteration 1755. I switched from 6 to 8 games around 1300 iterations because time to calculate next iteration takes such a long time now.

I only take time to type up latest recommend settings occasionally, but here is what I'm having edosani stream now for this:

Code: Select all

cpuctbase 2500
tradepenalty 0.00001
tradepenalty2 25.42
cpuctfactor 2.7207
cpuct 3.4234
fpu 0.781
policy temp 1.6908

From testing this some on a computer with a significantly worse leela ratio it seems close in elo in SF gauntlet to kiudee tune, but much better in midgame and worse in endgame.

New lc0 tune with kiudee tuner

New lc0 tune with kiudee tuner

Re: New lc0 tune with kiudee tuner

Re: New lc0 tune with kiudee tuner

Re: New lc0 tune with kiudee tuner

Re: New lc0 tune with kiudee tuner

Re: New lc0 tune with kiudee tuner

Re: New lc0 tune with kiudee tuner

Re: New lc0 tune with kiudee tuner