Are kiudee parameters the best?

corres · Post by **corres** » Mon Jan 27, 2020 1:02 pm

I made some tests with and without kiudee parameters.
Hardware
AMD Threadripper 16 x 4.0 GHz
2 x RTX 2060 OC (~1 x RTX 2080 Ti OC)
Common parameters of the tests
Chess GUI: DeepFritz 14
Hash (for Stockfish): 2048 MB
TC 1 min + 2 sec/move
Openings my TestBook50 with altered colors (2 x 50 = 100 games)
Tablebases: 6 men syzygy + 5 men Nalimov (only for GUI)

Parameters of Leela ver.0.22.0

"def" marked:
threads 4
backend multiplexing
backendoptions (backend=cudnn-fp16,gpu=1),(backend=cudnn-fp16,gpu=2)
NNcacheSize 10000000
MaxPrefetch 64
Others are Leela Defaults

"ccc" marked (kiudee):
threads 4
backend multiplexing
backendoptions (backend=cudnn-fp16,gpu=1),(backend=cudnn-fp16,gpu=2)
NNcacheSize 10000000
CPuct 2.147
CPuctBase 18368
CPuctFactor 2.815
FPUValue 0.443
MaxCollisionEvents 256
MaxPrefetch 64
PolicyTemperature 1.607
Others are Leela Defaults

Results 1.
ccc62078 : defJ13B.2-188 = 11 : 4 (85 draw) 100 games
def62078 : defJ13B.2-188 = 7 : 5 (88 draw) 100 games
Results 2.
ccc62078 : Stockfish191002 = 9 : 8 (83 draw) 100 games
def62078 : Stockfish191002 = 16 : 9 (75 draw) 100 games
Results 3.
ccc62078 : def62078 = 3 : 2 (95 draw) 100 games

From the above results it is obvious 62078net is stronger than J13B.2-188 and Stockfish191002 with and without kiudee parameters. I think the Elo of 62078net is valued down on the list of Leela nets.
On the other hand the effect of kiudee parameters is rather ambiguous so using it as a default settings is risky.
It is proposed to make tests with kiudee on the own system before using for analysis or engine-engine matches.
Note
You can download the test games from
wikisend.com
File ID 266998
Password leela

Laskos · Post by **Laskos** » Mon Jan 27, 2020 4:51 pm

It's good that you opened this thread. Yes, Kiudee settings are good within a large range of generality. I will present some current results with one the latest T59 nets on my OC-ed RTX 2070 GPU, these 128x10b nets with Lc0 0.23.2 hit the ceiling of about 100k NPS of the engine's (pseudo) MCTS search. Smaller nets like 64x6 or 48x5 have basically the same speed due to this ceiling, so, as the most important in order to derive LTC behavior is the number of nodes per move used, I used the strong by now T59 run.

First, a bit about your result: you probably use "too good, too balanced" openings, which translate into a very high draw rate. Combined with small samples (100 games), you results are hardly conclusive. I got from unbalanced openings (pair-wise --- side and reversed to diminish the noise) in more games (500 games each match) reasonable draw rates and small error margins. The smaller than naive error margins come from 5-nomial variance for unbalanced openings, discussed here:
https://www.chessprogramming.org/index. ... l_Analysis

I also compute the 5-nomial Normalized Elo, as simple Elo differences tend to compress with increasing time control. The Normalized Elo described here:
http://hardy.uhasselt.be/Toga/normalized_elo.pdf

I first played matches at ultra-fast time control 6s + 0.1s to see whether the Kiudee settings give an advantage. The matches were both self-play and against different opponents. In self-play, the Kiudee settings gave an inflated by about 20% Elo difference for Kiudee bonus. I tried T40, SV, LS, T59, T60 nets with and without Kiudee settings, and SF too in the mix as an opponent. In these ultra-fast games, I got about about 45-50 Elo bonus in self play and about 40 Elo points bonus against a different opponent. Maybe a tiny bit larger bonus for T59 and T60 nets, probably because they are trained with different settings as noise etc. It is worth noting that one important thing is to test at equal time controls, not at equal nodes even in self-play, because Kiudee settings can affect the NPS behavior.

To check the time control behavior, I used three time controls: 6s + 0.1s, corresponding to about 20k npm on average, 30s + 0.3s, corresponding to about 100k npm and 120s + 1.2s, corresponding to about 400k npm on average. I used a late T59 net in self-games Kiudee versus default. Here are the results in matches of 500 games each:

Kiudee settings:
CPuct=2.147
FpuValue=0.443
PolicyTemperature=1.607
CPuctBase=18368
CPuctFactor=2.815

Code: Select all

TC: 6s + 0.1s  (about 20k nodes per move)
Score of T_59_Kiudee vs T_59: 147 - 77 - 276 [0.570]
Elo difference: 49.0 +/- 8.2, LOS: 100.0 %, DrawRatio: 55.2 %

Normalized Elo: 0.270

Code: Select all

TC: 30s + 0.3s (about 100k nodes per move)
Score of T_59_Kiudee vs T_59: 125 - 57 - 318 [0.568]
Elo difference: 47.5 +/- 7.2, LOS: 100.0 %, DrawRatio: 63.6 %

Normalized Elo: 0.299

Code: Select all

TC: 120s + 1.2s (about 400k nodes per move)
Score of T_59_Kiudee vs T_59: 121 - 66 - 313 [0.555]
Elo difference: 38.4 +/- 7.1, LOS: 100.0 %, DrawRatio: 62.6 %

Normalized Elo: 0.245

The Elo error margins shown are 1 SD 5-nomial.

The scaling of the Kiudee settings is given by the 5-nomial Normalied Elo. Although within 95% confidence confidence interval, the Kiudee bonus seems to decrease a bit with some 80% confidence going from 100k npm to 400k npm. It decreases just a bit and 400k npm on my RTX 2070 with T60 net is already a slow rapid, not far from real LTC.

To check that it's not something simple to cure that small decrease, of the very few facts about the settings, I knew that to LTC a bit larger CPuct is recommended. So, now under the test is a 120s +1.2 match of 500 games with Kiudee settings + CPuct = 2.600 instead of CPuct = 2.147.

It's too early, one more day to go, the things might change a bit, a provisional result is here:

Code: Select all

TC: 120s + 1.2s (about 400k nodes per move)
Score of T_59_Kiudee_mod vs T_59: 51 - 30 - 129 [0.550]
Elo difference: 34.9 +/- 29.1, LOS: 99.0 %, DrawRatio: 61.4 %

210 of 500 games finished.

My impression is that the Kiudee settings were obtained by tuning them all together (CLOP-like), and if CPuct setting is not orthogonal to all the other parameters, a simple attempt like that to just increase CPuct to longer TC won't get you the desired results. If one knows how other parameters might relate with CPuct to longer TC, one might try to tune on fewer parameters with longer TC games. The number of games necessary for tuning explodes exponentially with the number of parameters, so having 2-3 instead of 5 would help greatly. Also, if one knows roughly how these 2-3 parameters relate one to another to longer TC (more npm), then one can do just a simple exploring, almost a manual one, using longer TC games.

jp · Post by jp » Mon Jan 27, 2020 5:04 pm

Laskos wrote: ↑Mon Jan 27, 2020 4:51 pm The smaller than naive error margins come from 5-nomial variance for unbalanced openings, discussed here:
https://www.chessprogramming.org/index. ... l_Analysis

However one may show that under reasonable elo models the trinomial model is not correct in case games are played in pairs with reversed colors ...

Is this written down somewhere?

Laskos · Post by **Laskos** » Mon Jan 27, 2020 5:06 pm

jp wrote: ↑Mon Jan 27, 2020 5:04 pm
Laskos wrote: ↑Mon Jan 27, 2020 4:51 pm The smaller than naive error margins come from 5-nomial variance for unbalanced openings, discussed here:
https://www.chessprogramming.org/index. ... l_Analysis

However one may show that under reasonable elo models the trinomial model is not correct in case games are played in pairs with reversed colors ...
Is this written down somewhere?

On this forum some time ago, an entire thread or two. I have no time now, but it's easy to search by keywords or in the link above.

Hugo · Post by **Hugo** » Mon Jan 27, 2020 8:01 pm

corres wrote: ↑Mon Jan 27, 2020 1:02 pm I made some tests with and without kiudee parameters.

From the above results it is obvious 62078net is stronger than J13B.2-188 and Stockfish191002 with and without kiudee parameters. I think the Elo of 62078net is valued down on the list of Leela nets.

I have tested 62078 Net vs Stockfish 11

Lc0 on RTX 2060, default settings - Stockfish on 12 cpu and contempt=0
Timecontrol was 10 min + 2 sec
using a Noomemn testsuite with reversed colors
Stockfish seems + 14 ELO ahead in that test:

Code: Select all

nn 62078, Blitz 10m+2s  2020

                                
1   Stockfish 11 64 BMI2-12cpu  +28/-20/=152 52.00%  104.0/200
2   Lc0,v0.23.2+git.c8d9095     +20/-28/=152 48.00%   96.0/200

C.K.

mwyoung · Post by **mwyoung** » Mon Jan 27, 2020 8:22 pm

Laskos wrote: ↑Mon Jan 27, 2020 4:51 pm It's good that you opened this thread. Yes, Kiudee settings are good within a large range of generality. I will present some current results with one the latest T59 nets on my OC-ed RTX 2070 GPU, these 128x10b nets with Lc0 0.23.2 hit the ceiling of about 100k NPS of the engine's (pseudo) MCTS search. Smaller nets like 64x6 or 48x5 have basically the same speed due to this ceiling, so, as the most important in order to derive LTC behavior is the number of nodes per move used, I used the strong by now T59 run.

First, a bit about your result: you probably use "too good, too balanced" openings, which translate into a very high draw rate. Combined with small samples (100 games), you results are hardly conclusive. I got from unbalanced openings (pair-wise --- side and reversed to diminish the noise) in more games (500 games each match) reasonable draw rates and small error margins. The smaller than naive error margins come from 5-nomial variance for unbalanced openings, discussed here:
https://www.chessprogramming.org/index. ... l_Analysis

I also compute the 5-nomial Normalized Elo, as simple Elo differences tend to compress with increasing time control. The Normalized Elo described here:
http://hardy.uhasselt.be/Toga/normalized_elo.pdf

I first played matches at ultra-fast time control 6s + 0.1s to see whether the Kiudee settings give an advantage. The matches were both self-play and against different opponents. In self-play, the Kiudee settings gave an inflated by about 20% Elo difference for Kiudee bonus. I tried T40, SV, LS, T59, T60 nets with and without Kiudee settings, and SF too in the mix as an opponent. In these ultra-fast games, I got about about 45-50 Elo bonus in self play and about 40 Elo points bonus against a different opponent. Maybe a tiny bit larger bonus for T59 and T60 nets, probably because they are trained with different settings as noise etc. It is worth noting that one important thing is to test at equal time controls, not at equal nodes even in self-play, because Kiudee settings can affect the NPS behavior.

To check the time control behavior, I used three time controls: 6s + 0.1s, corresponding to about 20k npm on average, 30s + 0.3s, corresponding to about 100k npm and 120s + 1.2s, corresponding to about 400k npm on average. I used a late T59 net in self-games Kiudee versus default. Here are the results in matches of 500 games each:

Kiudee settings:
CPuct=2.147
FpuValue=0.443
PolicyTemperature=1.607
CPuctBase=18368
CPuctFactor=2.815
Code: Select all
TC: 6s + 0.1s  (about 20k nodes per move)
Score of T_59_Kiudee vs T_59: 147 - 77 - 276 [0.570]
Elo difference: 49.0 +/- 8.2, LOS: 100.0 %, DrawRatio: 55.2 %
Normalized Elo: 0.270
Code: Select all
TC: 30s + 0.3s (about 100k nodes per move)
Score of T_59_Kiudee vs T_59: 125 - 57 - 318 [0.568]
Elo difference: 47.5 +/- 7.2, LOS: 100.0 %, DrawRatio: 63.6 %
Normalized Elo: 0.299
Code: Select all
TC: 120s + 1.2s (about 400k nodes per move)
Score of T_59_Kiudee vs T_59: 121 - 66 - 313 [0.555]
Elo difference: 38.4 +/- 7.1, LOS: 100.0 %, DrawRatio: 62.6 %
Normalized Elo: 0.245

The Elo error margins shown are 1 SD 5-nomial.

The scaling of the Kiudee settings is given by the 5-nomial Normalied Elo. Although within 95% confidence confidence interval, the Kiudee bonus seems to decrease a bit with some 80% confidence going from 100k npm to 400k npm. It decreases just a bit and 400k npm on my RTX 2070 with T60 net is already a slow rapid, not far from real LTC.

To check that it's not something simple to cure that small decrease, of the very few facts about the settings, I knew that to LTC a bit larger CPuct is recommended. So, now under the test is a 120s +1.2 match of 500 games with Kiudee settings + CPuct = 2.600 instead of CPuct = 2.147.

It's too early, one more day to go, the things might change a bit, a provisional result is here:
Code: Select all
TC: 120s + 1.2s (about 400k nodes per move)
Score of T_59_Kiudee_mod vs T_59: 51 - 30 - 129 [0.550]
Elo difference: 34.9 +/- 29.1, LOS: 99.0 %, DrawRatio: 61.4 %

210 of 500 games finished.
My impression is that the Kiudee settings were obtained by tuning them all together (CLOP-like), and if CPuct setting is not orthogonal to all the other parameters, a simple attempt like that to just increase CPuct to longer TC won't get you the desired results. If one knows how other parameters might relate with CPuct to longer TC, one might try to tune on fewer parameters with longer TC games. The number of games necessary for tuning explodes exponentially with the number of parameters, so having 2-3 instead of 5 would help greatly. Also, if one knows roughly how these 2-3 parameters relate one to another to longer TC (more npm), then one can do just a simple exploring, almost a manual one, using longer TC games.

If you want to tune yourself the main one imo. To start with is temperature policy. And go from there like cput. This widens or narrows the search. I would not get caught up in any type of node counting. Why Kuidee and other settings work is because it allows Lc0 to search deeper with the same number of nodes. The trick is to do this deeper search without missing tactics. That is what you are tuning.

Ovyron · Post by **Ovyron** » Mon Jan 27, 2020 8:59 pm

Laskos wrote: ↑Mon Jan 27, 2020 4:51 pm To check the time control behavior, I used three time controls: 6s + 0.1s, corresponding to about 20k npm on average, 30s + 0.3s, corresponding to about 100k npm and 120s + 1.2s, corresponding to about 400k npm on average. I used a late T59 net in self-games Kiudee versus default. Here are the results in matches of 500 games each:

I like this talking about nodes per move average, with so many GPUs and time controls, it makes sense to just report the npm. So thanks to this now I know that I'd probably could reproduce your 6s + 0.1s results by making my CPU Leela play at 20 minute/move. It's the first time I'm able to make such a comparison.

Laskos · Post by **Laskos** » Mon Jan 27, 2020 9:13 pm

Ovyron wrote: ↑Mon Jan 27, 2020 8:59 pm
Laskos wrote: ↑Mon Jan 27, 2020 4:51 pm To check the time control behavior, I used three time controls: 6s + 0.1s, corresponding to about 20k npm on average, 30s + 0.3s, corresponding to about 100k npm and 120s + 1.2s, corresponding to about 400k npm on average. I used a late T59 net in self-games Kiudee versus default. Here are the results in matches of 500 games each:
I like this talking about nodes per move average, with so many GPUs and time controls, it makes sense to just report the npm. So thanks to this now I know that I'd probably could reproduce your 6s + 0.1s results by making my CPU Leela play at 20 minute/move. It's the first time I'm able to make such a comparison.

You probably mean 20 seconds per move, right? Or your full CPU churns out 15 nps with the fast T59 net? My phone is much faster.

corres · Post by **corres** » Mon Jan 27, 2020 10:07 pm

mwyoung wrote: ↑Mon Jan 27, 2020 8:22 pm ...
If you want to tune yourself the main one imo. To start with is temperature policy. And go from there like cput. This widens or narrows the search. I would not get caught up in any type of node counting. Why Kuidee and other settings work is because it allows Lc0 to search deeper with the same number of nodes. The trick is to do this deeper search without missing tactics. That is what you are tuning.

The main aim of kiudee settings is to tighten and to deepen the search of Leela. This obviously gives benefit to find positions with higher evaluation value what is caused by using unbalanced opening positions and/or tactical opportunity. So the effectiveness of kiudee settings depend on not only the speed of Leela but the types of the game too.

mwyoung · Post by **mwyoung** » Mon Jan 27, 2020 10:16 pm

corres wrote: ↑Mon Jan 27, 2020 10:07 pm
mwyoung wrote: ↑Mon Jan 27, 2020 8:22 pm ...
If you want to tune yourself the main one imo. To start with is temperature policy. And go from there like cput. This widens or narrows the search. I would not get caught up in any type of node counting. Why Kuidee and other settings work is because it allows Lc0 to search deeper with the same number of nodes. The trick is to do this deeper search without missing tactics. That is what you are tuning.
The main aim of kiudee settings is to tighten and to deepen the search of Leela. This obviously gives benefit to find positions with higher evaluation value what is caused by using unbalanced opening positions and/or tactical opportunity. So the effectiveness of kiudee settings depend on not only the speed of Leela but the types of the game too.

I agree, tuning the NN needs to be done to your system setup. and needs. But there are good general settings better than default. I always suggest starting with temp policy.You can tune very well with just this one setting.

Are kiudee parameters the best?

Are kiudee parameters the best?

Re: Are kiudee parameters the best?

Re: Are kiudee parameters the best?

Re: Are kiudee parameters the best?

Re: Are kiudee parameters the best?

Re: Are kiudee parameters the best?

Re: Are kiudee parameters the best?

Re: Are kiudee parameters the best?

Re: Are kiudee parameters the best?

Re: Are kiudee parameters the best?