How to run rtx 2080ti for leela optimally?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: Harvey Williamson, bob, hgm

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
h1a8
Posts: 441
Joined: Fri Jun 04, 2010 5:23 am

How to run rtx 2080ti for leela optimally?

Post by h1a8 » Mon Apr 20, 2020 10:42 pm

I have a rtx 2080ti installed with i7-9700k and 16gb ram.
But I'm only getting 20knps in the lc0 0.24 benchmark. I see others are getting over 35knps.
What must I do to allow full speed?
step by step since i'm a noob.


Also what are the optimal settings for leela (default settings have 2 threads instead of 8). Is having 2 threads better than more?

Nay Lin Tun
Posts: 633
Joined: Mon Jan 16, 2012 5:34 am

Re: How to run rtx 2080ti for leela optimally?

Post by Nay Lin Tun » Tue Apr 21, 2020 3:34 am

Assuming you have same network, do you use fp16 backend?

Yes, 2 threads is enough. ( dont use 8 threads)

If you still need help, you better ask leela discord, help channel.

MOBMAT
Posts: 327
Joined: Sat Feb 04, 2017 10:57 pm
Location: USA

Re: How to run rtx 2080ti for leela optimally?

Post by MOBMAT » Tue Apr 21, 2020 3:39 am

I second that you should see if fp16 (instead of CUDA) works for you. I can't use it on my GPU.

also, check out this leela page...

https://lczero.org/play/flags/

I use Lc0 with Arena and I get better than 20K with a 1080.
i7-6700K @ 4.00Ghz (using 6 threads), EGTBs on PCI SSD
Benchmark: Stockfish 11 64 bmi2 (nps): 2067669

crem
Posts: 162
Joined: Wed May 23, 2018 7:29 pm

Re: How to run rtx 2080ti for leela optimally?

Post by crem » Tue Apr 21, 2020 8:06 am

Since ~6 months ago, Lc0 automatically detects when to use fp16.

I believe you just use a different network size.
20000 for 320x24 nets is normal, it's probably 256x20 that gets 35000.

You can see other's speeds at http://lc0.org/benchmark, and network examples for different sizes at http://lc0.org/bestnet

IanKennedy
Posts: 27
Joined: Sun Feb 04, 2018 11:38 am
Location: UK

Re: How to run rtx 2080ti for leela optimally?

Post by IanKennedy » Tue Apr 21, 2020 8:22 am

My current roundrobin is showing an average of 31.5knps with 384x30-t40-2036.pb using cudnn-fp16 on a 2080ti.

h1a8
Posts: 441
Joined: Fri Jun 04, 2010 5:23 am

Re: How to run rtx 2080ti for leela optimally?

Post by h1a8 » Tue Apr 21, 2020 9:44 am

IanKennedy wrote:
Tue Apr 21, 2020 8:22 am
My current roundrobin is showing an average of 31.5knps with 384x30-t40-2036.pb using cudnn-fp16 on a 2080ti.
What chess gui are you using?
Do I need to install something from Nvidia to enable Cuda or something? So your weight file is 2036.pb?

h1a8
Posts: 441
Joined: Fri Jun 04, 2010 5:23 am

Re: How to run rtx 2080ti for leela optimally?

Post by h1a8 » Tue Apr 21, 2020 9:46 am

crem wrote:
Tue Apr 21, 2020 8:06 am
Since ~6 months ago, Lc0 automatically detects when to use fp16.

I believe you just use a different network size.
20000 for 320x24 nets is normal, it's probably 256x20 that gets 35000.

You can see other's speeds at http://lc0.org/benchmark, and network examples for different sizes at http://lc0.org/bestnet
Thanks. But I'm using 256x20. I'm thinking I need to download and install something from Nvidia to enable Cuda? I'm using fritz 17 gui.

brianr
Posts: 451
Joined: Thu Mar 09, 2006 2:01 pm

Re: How to run rtx 2080ti for leela optimally?

Post by brianr » Tue Apr 21, 2020 12:17 pm

Everything needed should be part of the releases download.
Here is a link to the cuda version that includes the cuda files.
There is a separate one that says cuda-nodll if you already have the big files and only want the engine update.

https://github.com/LeelaChessZero/lc0/r ... a-cuda.zip

Speed is typically tested from the command prompt, not with a GUI.

zullil
Posts: 6442
Joined: Mon Jan 08, 2007 11:31 pm
Location: PA USA
Full name: Louis Zulli

Re: How to run rtx 2080ti for leela optimally?

Post by zullil » Tue Apr 21, 2020 12:36 pm

h1a8 wrote:
Tue Apr 21, 2020 9:46 am
crem wrote:
Tue Apr 21, 2020 8:06 am
Since ~6 months ago, Lc0 automatically detects when to use fp16.

I believe you just use a different network size.
20000 for 320x24 nets is normal, it's probably 256x20 that gets 35000.

You can see other's speeds at http://lc0.org/benchmark, and network examples for different sizes at http://lc0.org/bestnet
Thanks. But I'm using 256x20. I'm thinking I need to download and install something from Nvidia to enable Cuda? I'm using fritz 17 gui.
Here's what I get using network 42850:

Code: Select all

$ ./lc0 benchmark
       _
|   _ | |
|_ |_ |_| v0.24.1+git.unknown built Mar 28 2020
Found pb network file: ./network42850
Creating backend [cudnn-auto]...
Switching to [cudnn-fp16]...
CUDA Runtime version: 10.1.0
Cudnn version: 7.6.2
Latest version of CUDA supported by the driver: 10.1.0
GPU: GeForce RTX 2080 Ti
GPU memory: 10.7241 Gb
GPU clock frequency: 1635 MHz
GPU compute capability: 7.5
Benchmark time 22ms, 2 nodes, 285 nps, move e2e4
Benchmark time 25ms, 3 nodes, 272 nps, move e2e4
Benchmark time 29ms, 4 nodes, 285 nps, move e2e4
Benchmark time 32ms, 8 nodes, 444 nps, move e2e4
Benchmark time 42ms, 13 nodes, 464 nps, move e2e4
Benchmark time 46ms, 17 nodes, 548 nps, move e2e4
Benchmark time 56ms, 27 nodes, 658 nps, move e2e4
Benchmark time 60ms, 37 nodes, 822 nps, move e2e4
Benchmark time 64ms, 42 nodes, 857 nps, move e2e4
Benchmark time 71ms, 64 nodes, 1142 nps, move e2e4
Benchmark time 78ms, 93 nodes, 1476 nps, move e2e4
Benchmark time 83ms, 109 nodes, 1602 nps, move e2e4
Benchmark time 87ms, 127 nodes, 1763 nps, move e2e4
Benchmark time 94ms, 165 nodes, 2088 nps, move e2e4
Benchmark time 102ms, 206 nodes, 2367 nps, move e2e4
Benchmark time 106ms, 223 nodes, 2450 nps, move e2e4
Benchmark time 109ms, 251 nodes, 2670 nps, move e2e4
Benchmark time 118ms, 320 nodes, 3106 nps, move e2e4
Benchmark time 126ms, 379 nodes, 3414 nps, move e2e4
Benchmark time 133ms, 423 nodes, 3554 nps, move e2e4
Benchmark time 141ms, 459 nodes, 3614 nps, move e2e4
Benchmark time 147ms, 462 nodes, 3500 nps, move e2e4
Benchmark time 151ms, 464 nodes, 3411 nps, move e2e4
Benchmark time 157ms, 469 nodes, 3302 nps, move e2e4
Benchmark time 162ms, 491 nodes, 3340 nps, move e2e4
Benchmark time 164ms, 500 nodes, 3333 nps, move e2e4
Benchmark time 167ms, 516 nodes, 3394 nps, move e2e4
Benchmark time 171ms, 577 nodes, 3675 nps, move e2e4
Benchmark time 173ms, 604 nodes, 3798 nps, move e2e4
Benchmark time 177ms, 651 nodes, 4018 nps, move e2e4
Benchmark time 185ms, 867 nodes, 5100 nps, move e2e4
Benchmark time 192ms, 1069 nodes, 6039 nps, move e2e4
Benchmark time 200ms, 1275 nodes, 6854 nps, move e2e4
Benchmark time 205ms, 1408 nodes, 7371 nps, move e2e4
Benchmark time 211ms, 1532 nodes, 7816 nps, move e2e4
Benchmark time 226ms, 1935 nodes, 9127 nps, move e2e4
Benchmark time 236ms, 2183 nodes, 9877 nps, move e2e4
Benchmark time 249ms, 2487 nodes, 10628 nps, move e2e4
Benchmark time 274ms, 3120 nodes, 12046 nps, move e2e4
Benchmark time 290ms, 3602 nodes, 13098 nps, move e2e4
Benchmark time 301ms, 3869 nodes, 13527 nps, move e2e4
Benchmark time 303ms, 3970 nodes, 13737 nps, move e2e4
Benchmark time 314ms, 4205 nodes, 14063 nps, move e2e4
Benchmark time 329ms, 4617 nodes, 14703 nps, move e2e4
Benchmark time 340ms, 4877 nodes, 15006 nps, move e2e4
Benchmark time 353ms, 5252 nodes, 15492 nps, move e2e4
Benchmark time 398ms, 6694 nodes, 17432 nps, move e2e4
Benchmark time 432ms, 7740 nodes, 18516 nps, move e2e4
Benchmark time 454ms, 8422 nodes, 19184 nps, move e2e4
Benchmark time 487ms, 9545 nodes, 20222 nps, move e2e4
Benchmark time 497ms, 9947 nodes, 20594 nps, move e2e4
Benchmark time 534ms, 11121 nodes, 21427 nps, move e2e4
Benchmark time 629ms, 14609 nodes, 23754 nps, move e2e4
Benchmark time 652ms, 15377 nodes, 24139 nps, move e2e4
Benchmark time 665ms, 15777 nodes, 24235 nps, move e2e4
Benchmark time 676ms, 16180 nodes, 24478 nps, move e2e4
Benchmark time 710ms, 17413 nodes, 25054 nps, move e2e4
Benchmark time 732ms, 18230 nodes, 25425 nps, move e2e4
Benchmark time 756ms, 19038 nodes, 25692 nps, move e2e4
Benchmark time 778ms, 19876 nodes, 26015 nps, move e2e4
Benchmark time 801ms, 20695 nodes, 26296 nps, move e2e4
Benchmark time 835ms, 21850 nodes, 26646 nps, move e2e4
Benchmark time 891ms, 23563 nodes, 26898 nps, move e2e4
Benchmark time 914ms, 24415 nodes, 27157 nps, move e2e4
Benchmark time 1425ms, 45943 nodes, 32583 nps, move e2e4
Benchmark time 2403ms, 88718 nodes, 37136 nps, move e2e4
Benchmark time 3067ms, 119041 nodes, 39004 nps, move e2e4
Benchmark time 3233ms, 126527 nodes, 39318 nps, move e2e4
Benchmark time 3726ms, 147936 nodes, 39853 nps, move e2e4
Benchmark time 3952ms, 158912 nodes, 40363 nps, move e2e4
Benchmark time 4214ms, 170660 nodes, 40643 nps, move e2e4
Benchmark time 5384ms, 224840 nodes, 41877 nps, move e2e4
Benchmark time 6042ms, 257175 nodes, 42670 nps, move e2e4
Benchmark time 10000ms, 445571 nodes, 44624 nps, move e2e4
bestmove e2e4
Benchmark final time 10.012s calculating 44554.8 nodes per second.

IanKennedy
Posts: 27
Joined: Sun Feb 04, 2018 11:38 am
Location: UK

Re: How to run rtx 2080ti for leela optimally?

Post by IanKennedy » Tue Apr 21, 2020 12:54 pm

h1a8 wrote:
Tue Apr 21, 2020 9:44 am
IanKennedy wrote:
Tue Apr 21, 2020 8:22 am
My current roundrobin is showing an average of 31.5knps with 384x30-t40-2036.pb using cudnn-fp16 on a 2080ti.
What chess gui are you using?
Do I need to install something from Nvidia to enable Cuda or something? So your weight file is 2036.pb?
I'm running Banksia GUI on Ubuntu 18.04. It gives a nice stats summary of average nps and depth for each engine in a tournament. The network is one of the 'Sergio' creations, but check the 'best net' page cited above.

Post Reply