Better settings, best net, best backend on RTX GPU to LTC

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Better settings, best net, best backend on RTX GPU to LTC

Post by Laskos »

Hardware:
4 core i7 fast CPU
RTX 2070 OC-ed GPU



1/
I was on holidays, in fact unknowingly, in coronavirus hit areas, and left my PC for a long run to check for a possible improvement to the Kiudee settings to LTC. I seem to get a positive result changing that:

CPuct=1.90
instead of Kiudee one CPuct=2.147 (v023.2 engine)

Kiudee settings are abbreviated by "K", new settings by "KL".

This is what I got in a longish test (almost one week) from unbalanced openings:

Code: Select all

TC: 120s + 1.2s

   # PLAYER     : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 T_59_KL    :  24.68  18.15     370.5     700    52.9      93    
   2 T_59_K     :   5.41  18.12     354.5     700    50.6      72    
   3 SF_11      :   0.00   ----     675.0    1400    48.2     ---    

White advantage = 154.72 +/- 6.57
Draw rate (equal opponents) = 64.66 % +/- 1.78
Here on average Lc0 with T_59 (128x10 smallnet) used about 400k nodes per move. This is pretty much long TC for larger nets. The result using pentanomial variance is about 19 +/- 10 Elo points (1 SD) advantage of "KL" settings versus "K"settings to LTC
LOS = 97%

The result is fairly conclusive. To fast TC, "KL" and "K" settings perform the same (within 1 SD error margins even in 1000 games). So, I guess that setting CPuct=1.90 in Kiudee settings is generally better and scales well to LTC. The improvement to LTC seems even larger in self-games with the same net, some 25-30 Elo points.


2/
The v0.24 engine has two fast backends for my hardware, CUDA and DX12. For 256x20 nets, CUDA is faster, but for 384x30 SV nets DX12 is faster. I checked also the strength at 30s + 0.3s:

Code: Select all

TC: 30s + 0.3s

Score of SV_384x30_2880_DX vs SV_384x30_2880_CUDA: 116 - 88 - 196  [0.535] 400
Elo difference: 24.36 +/- 24.32
Finished match
Pentanomial LOS = 99%, DX12 backend is better with this large net.

I used some scaling results with 10x nodes, and got that SV 384x30-t60-28XX.pb nets scale the best and are probably surpassing the strongest up to now net SV 256x20-t40-1541.pb to LTC (above 100k large net nodes).

A sanity test was performed at 15min + 9s (about 200k nodes per move for large net, and about 600k nodes per move 1541 net):

Code: Select all

Score of SV_384x30_2880 vs SV_256x20_1541: 7 - 3 - 10 [0.600]
Elo difference: 70.4 +/- 111.6, LOS: 89.7 %, DrawRatio: 50.0 %

20 of 20 games finished.


Using the pentanomial variance LOS = 97%. In fact SV 384x30_2880 seems to smash (+ 4 -0 =6 pairwise) the 1541 net at this longer TC on an RTX 2070 GPU. Being at only the second LR drop (0.01), these SV large nets will be hard to beat at LTC on RTX GPUs for T60 nets in foreseeable future. Sergio is doing a very good job.


3/
Now I am performing a sanity check against SF_11 using SV 384x30_2880 net with "KL" LTC settings, DX12 backend v0.24rc1 engine, games at 15m + 9s. I expect Lc0 to win, but let's see how heavily it wins with these optimized settings and probably the best net to LTC.


4/
An observation: the test results on tactical Arasan 21 test suite at say 30s/position correlate very well with the strength of the nets to LTC, and the results seem to show that to LTC SV 384x30-t60-28XX.pb nets are clearly the best, followed by T60 latest nets and by 1541 SV net (these two almost equal to LTC).
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Better settings, best net, best backend on RTX GPU to LTC

Post by Dann Corbit »

I would like to get a screen shot of your complete UCI settings or your parameter disk file, if you use that instead.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Better settings, best net, best backend on RTX GPU to LTC

Post by Laskos »

Dann Corbit wrote: Wed Feb 26, 2020 5:06 pm I would like to get a screen shot of your complete UCI settings or your parameter disk file, if you use that instead.
There is not much to add to the defaults of the v0.24rc1 settings, which incorporated the Kiudee settings as default (and added some new). Only these parameters overwrite the defaults with seemingly the best to LTC settings


{
"name" : "WeightsFile",
"value" : "./384x30-t60-2880.pb.gz"
},
{
"name" : "CPuct",
"value" : "1.900"
},
{
"name" : "CPuctAtRoot",
"value" : "1.900"
},
{
"name" : "NNCacheSize",
"value" : 1000000
},

What are probably the best nets at LTC on an RTX GPU are SV 384x30 distilled nets:
https://www.comp.nus.edu.sg/~sergio-v/t60/384x30/
The latest is usually the best or close.

The latest Lc0 engine is here, and now it will automatically select cudnn-fp16 for RTX GPUs with NVIDIA-CUDA engine or dx12 for all GPUs with DX12 engine:
https://github.com/LeelaChessZero/lc0/releases

With an RTX GPU it's probably the best to use CUDA engine for T40 and T60 sizes nets, and DX12 with SV 384x30 and larger nets. DX12 scales well speed-wise with the size of the net, and it surpases CUDA even using RTX GPUs with large nets.
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Better settings, best net, best backend on RTX GPU to LTC

Post by Dann Corbit »

Thank you. I know that you have put a great deal of effort into this analysis.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
mbabigian
Posts: 204
Joined: Tue Oct 15, 2013 2:34 am
Location: US
Full name: Mike Babigian

Re: Better settings, best net, best backend on RTX GPU to LTC

Post by mbabigian »

Great info. I look forward to trying the 1.9 CPuct.

I do have a completely different experience on the DX12 backend however. I run an ASUS Strix 2080TI OC (watercooled). And the DX12 backend is nearly 25% slower. It also goes nuts if I set a batch size of 512 like I run under cuda. It likewise goes nuts if I increase the NN Cache size over your setting to 2000000 or more. None of that happens with the cudnn-fp16 backend. I'm also using Sergio's 2880 net, but no matter what I do, DX12 is slower even if I lower nncache and batch size to match dx12's stable settings.
“Censorship is telling a man he can't have a steak just because a baby can't chew it.” ― Mark Twain
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Better settings, best net, best backend on RTX GPU to LTC

Post by Laskos »

mbabigian wrote: Wed Feb 26, 2020 8:35 pm Great info. I look forward to trying the 1.9 CPuct.

I do have a completely different experience on the DX12 backend however. I run an ASUS Strix 2080TI OC (watercooled). And the DX12 backend is nearly 25% slower. It also goes nuts if I set a batch size of 512 like I run under cuda. It likewise goes nuts if I increase the NN Cache size over your setting to 2000000 or more. None of that happens with the cudnn-fp16 backend. I'm also using Sergio's 2880 net, but no matter what I do, DX12 is slower even if I lower nncache and batch size to match dx12's stable settings.
Good to know, so it might be hardware dependant. My results with SV 384x30 nets are consistent, here is the earlier one with SV 1538 net:

http://talkchess.com/forum3/viewtopic.php?f=2&t=73045

Score of SV_384x30_1538_DX vs SV_384x30_1538_CUDA: 115 - 81 - 204 [0.542]
Elo difference: 29.6 +/- 23.8, LOS: 99.2 %, DrawRatio: 51.0 %

400 of 400 games finished.

My GPU clocks are oscillating between 1800 and 1900 MHz instead of stock 1410 MHz. At some 65C.

Yes, I also tried to increase the batch size to 512, but DX backend seems buggy in this respect. I haven't tried very large nncaches.


For now the sanity check at LTC against SF_11 with SV_384x30_2880_DX (CPuct = 1.900) is running well, but really very very few (9) games:

Score of SV_384x30_2880 vs SF_11: 4 - 1 - 4 [0.667]
Elo difference: 120.4 +/- 194.2, LOS: 91.0 %, DrawRatio: 44.4 %

9 of 20 games finished.

If it keeps that pace, the result would be significantly better than SV 1541 and T60 nets results against SF_11. But it's only a sanity check, not something to give high confidence.
mbabigian
Posts: 204
Joined: Tue Oct 15, 2013 2:34 am
Location: US
Full name: Mike Babigian

Re: Better settings, best net, best backend on RTX GPU to LTC

Post by mbabigian »

I have an infinite analysis going for the last several hours (more than 3hrs) using your 1.9 CPuct value. Due to water cooling my temps are crazy low.

Image

Since I never do less than 1 million nodes per position, your scaling data and LTC info has been very helpful. Keep up the great work.
Mike
P.S. My used system memory is due to 24 Komodo 13.3's doing something else at the same time. LC0 is only consuming 22GB so far.
“Censorship is telling a man he can't have a steak just because a baby can't chew it.” ― Mark Twain
ankan
Posts: 77
Joined: Sun Apr 21, 2013 3:29 pm
Full name: Ankan Banerjee

Re: Better settings, best net, best backend on RTX GPU to LTC

Post by ankan »

mbabigian wrote: Wed Feb 26, 2020 8:35 pm Great info. I look forward to trying the 1.9 CPuct.

I do have a completely different experience on the DX12 backend however. I run an ASUS Strix 2080TI OC (watercooled). And the DX12 backend is nearly 25% slower. It also goes nuts if I set a batch size of 512 like I run under cuda. It likewise goes nuts if I increase the NN Cache size over your setting to 2000000 or more. None of that happens with the cudnn-fp16 backend. I'm also using Sergio's 2880 net, but no matter what I do, DX12 is slower even if I lower nncache and batch size to match dx12's stable settings.
What version of Windows are you using? And what version of Nvidia drivers? The dx backend needs windows 10 version 1903 or later, and the very latest Nvidia graphics drivers.
mbabigian
Posts: 204
Joined: Tue Oct 15, 2013 2:34 am
Location: US
Full name: Mike Babigian

Re: Better settings, best net, best backend on RTX GPU to LTC

Post by mbabigian »

What version of Windows are you using? And what version of Nvidia drivers? The dx backend needs windows 10 version 1903 or later, and the very latest Nvidia graphics drivers.
I'm running Win 10 Pro 1909 and Nvidia 442.19 driver released 2/3/2020. This new computer hasn't even been operational for 30 days yet.
“Censorship is telling a man he can't have a steak just because a baby can't chew it.” ― Mark Twain
ankan
Posts: 77
Joined: Sun Apr 21, 2013 3:29 pm
Full name: Ankan Banerjee

Re: Better settings, best net, best backend on RTX GPU to LTC

Post by ankan »

Laskos wrote: Wed Feb 26, 2020 9:24 pm Yes, I also tried to increase the batch size to 512, but DX backend seems buggy in this respect. I haven't tried very large nncaches.
Max supported batch size for dx backend is 256. However you should be able to set any nncache size.