Page 1 of 6

how good is a GeForce GTX 1060 6GB for Leela ?

Posted: Mon Apr 30, 2018 5:29 am
by JJJ
All in the title, I d like to know how will perform leela in the average with it.

For the starting position I have en average of 2K nodes per secondes with it. Is it good ?

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Posted: Mon Apr 30, 2018 6:36 am
by Albert Silver
JJJ wrote:All in the title, I d like to know how will perform leela in the average with it.

For the starting position I have en average of 2K nodes per secondes with it. Is it good ?
Be sure to run the full-tune on it. Then run it with the start position until ply 26 and see your average ply depth.

Even with an old i5-2500K and GTX1060 I get about 2250NPS in the benchmark I described. This is good for ~2900 CCRL.

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Posted: Mon Apr 30, 2018 8:33 am
by JJJ
What do you mean the full tune on it ?
from starting position, my leela needed 1min26 to reach depth 26. I don't know if it is ok for my card or below.

Anyway, my leela is winning against Hakkapelitta. So that's nice already.

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Posted: Mon Apr 30, 2018 8:48 am
by AdminX
JJJ wrote:What do you mean the full tune on it ?
from starting position, my leela needed 1min26 to reach depth 26. I don't know if it is ok for my card or below.

Anyway, my leela is winning against Hakkapelitta. So that's nice already.
PS: Make sure you have the correct --gpu # if you have more than one GPU.

Code: Select all

lczero --gpu 0 --tune-only --full-tuner 

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Posted: Mon Apr 30, 2018 8:50 am
by Guenther
JJJ wrote:What do you mean the full tune on it ?
from starting position, my leela needed 1min26 to reach depth 26. I don't know if it is ok for my card or below.

Anyway, my leela is winning against Hakkapelitta. So that's nice already.
At first start LCZero does an automatical tuning for what settings to use with your gpu. This is of course a standard tuning.
By doing a full tuning you can get a speed increase up to perhaps 150-200% in some cases.
Delete the automatically created file named leelaz_opencl_tuning and start the process like described below.

Run sth like this (adapt names/files) from commandline or add it to a batch file in case of windows.
This might run some time and you can see how each tried setting gets more GFlops out of your card.

Code: Select all

lczero07.exe --tune-only --full-tuner -w ID222
example for my very weak gpu which must be retuned now for the new NN size (NN ID222 is now at 15*192):

Code: Select all

C:\Engines\UCIPG\LCZero_07ID222>lczero07.exe --tune-only --full-tuner -w ID222
Using 2 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 9.1.75
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce GT 710
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 388.13
Device speed:  954 MHz
Device cores:  1 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GT 710
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0xec7a02 (thread: 701728073)
Will try 5279 valid configurations.
(1/5279) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 ST
RM=0 STRN=0 VWM=1 VWN=1 0.8444 ms (22.4 GFLOPS)
(18/5279) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0
STRM=0 STRN=0 VWM=1 VWN=1 0.8431 ms (22.4 GFLOPS)
(93/5279) KWG=16 KWI=2 MDIMA=16 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
STRM=0 STRN=0 VWM=2 VWN=1 0.6607 ms (28.6 GFLOPS)
(96/5279) KWG=32 KWI=2 MDIMA=32 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
STRM=0 STRN=0 VWM=2 VWN=1 0.5157 ms (36.6 GFLOPS)
(119/5279) KWG=16 KWI=8 MDIMA=16 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
 STRM=0 STRN=0 VWM=2 VWN=1 0.5133 ms (36.8 GFLOPS)
(133/5279) KWG=32 KWI=8 MDIMA=16 MDIMC=8 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=
0 STRM=0 STRN=0 VWM=2 VWN=1 0.5128 ms (36.8 GFLOPS)
(145/5279) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0
 STRM=0 STRN=0 VWM=4 VWN=1 0.4856 ms (38.9 GFLOPS)
(171/5279) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
STRM=0 STRN=0 VWM=8 VWN=1 0.4844 ms (39.0 GFLOPS)
(481/5279) KWG=16 KWI=2 MDIMA=16 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
 STRM=1 STRN=0 VWM=4 VWN=1 0.4836 ms (39.0 GFLOPS)
(610/5279) KWG=16 KWI=2 MDIMA=16 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0
 STRM=1 STRN=0 VWM=4 VWN=2 0.4068 ms (46.4 GFLOPS)
(1555/5279) KWG=16 KWI=8 MDIMA=16 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=
0 STRM=0 STRN=0 VWM=1 VWN=2 0.3585 ms (52.6 GFLOPS)
(1577/5279) KWG=16 KWI=2 MDIMA=32 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=
0 STRM=0 STRN=0 VWM=2 VWN=2 0.3254 ms (58.0 GFLOPS)
(2547/5279) KWG=16 KWI=8 MDIMA=32 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=
0 STRM=1 STRN=1 VWM=2 VWN=2 0.2867 ms (65.8 GFLOPS)
...

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Posted: Mon Apr 30, 2018 8:55 am
by shrapnel
JJJ wrote:Anyway, my leela is winning against Hakkapelitta. So that's nice already.
Good to know. Glad that at least some Chess Engines have started to use the Power of GPUs.
If I get a Dual- 1080 Ti System, can it beat the latest Stockfish/Komodo ?

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Posted: Mon Apr 30, 2018 9:26 am
by nabildanial
shrapnel wrote:
JJJ wrote:Anyway, my leela is winning against Hakkapelitta. So that's nice already.
Good to know. Glad that at least some Chess Engines have started to use the Power of GPUs.
If I get a Dual- 1080 Ti System, can it beat the latest Stockfish/Komodo ?
It doesn't support multiple GPU, at least not yet.

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Posted: Mon Apr 30, 2018 11:33 am
by Nay Lin Tun
I got around 2200 nps with my 1060. For benchmark, you can see in Ipman benchmark.

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Posted: Mon Apr 30, 2018 3:05 pm
by shrapnel
nabildanial wrote:It doesn't support multiple GPU, at least not yet.
OK, thanks.
Which would be better for Chess, an nVidia TitanXp or the Geforce 1080 Ti ?

Re: how good is a GeForce GTX 1060 6GB for Leela ?

Posted: Mon Apr 30, 2018 6:15 pm
by Albert Silver
Guenther wrote:
JJJ wrote:What do you mean the full tune on it ?
from starting position, my leela needed 1min26 to reach depth 26. I don't know if it is ok for my card or below.

Anyway, my leela is winning against Hakkapelitta. So that's nice already.
At first start LCZero does an automatical tuning for what settings to use with your gpu. This is of course a standard tuning.
By doing a full tuning you can get a speed increase up to perhaps 150-200% in some cases.
Delete the automatically created file named leelaz_opencl_tuning and start the process like described below.

Run sth like this (adapt names/files) from commandline or add it to a batch file in case of windows.
This might run some time and you can see how each tried setting gets more GFlops out of your card.

Code: Select all

lczero07.exe --tune-only --full-tuner -w ID222
example for my very weak gpu which must be retuned now for the new NN size (NN ID222 is now at 15*192):

Code: Select all

C:\Engines\UCIPG\LCZero_07ID222>lczero07.exe --tune-only --full-tuner -w ID222
Using 2 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 9.1.75
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce GT 710
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 388.13
Device speed:  954 MHz
Device cores:  1 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GT 710
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0xec7a02 (thread: 701728073)
Will try 5279 valid configurations.
(1/5279) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 ST
...
Yes, here is what I got on my laptop:

Code: Select all

C:\Users\Albert\Chess\Leela Zero\GPU>lczero.exe -t3 -w weights.txt --full-tuner
Using 3 thread(s).
Detecting residual layers...v2...192 channels...15 blocks.
Initializing OpenCL.
Device name:   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Device type:   CPU
Device vendor: Intel(R) Corporation
Device driver: 7.6.0.611
Device speed:  2600 MHz
Device cores:  8 CU
Device score:  521
Platform version: OpenCL 1.2 CUDA 9.1.84
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     2
Device name:   GeForce GTX 980M
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 391.35
Device speed:  1126 MHz
Device cores:  12 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GTX 980M
with OpenCL 1.2 capability.

Started OpenCL SGEMM tuner.
RNG seed: 0x65d47141 (thread: 2783254248)
Will try 5117 valid configurations.
(1/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.1067 ms (177.0 GFLOPS)
(6/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=16 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0946 ms (199.5 GFLOPS)
(9/5117) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0894 ms (211.1 GFLOPS)
(79/5117) KWG=16 KWI=8 MDIMA=32 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0651 ms (289.9 GFLOPS)
(566/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=1 STRN=0 VWM=2 VWN=2 0.0594 ms (317.6 GFLOPS)
(853/5117) KWG=16 KWI=2 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=1 VWM=2 VWN=2 0.0571 ms (330.4 GFLOPS)
(1276/5117) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=32 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0551 ms (342.8 GFLOPS)
(1278/5117) KWG=32 KWI=2 MDIMA=16 MDIMC=8 MWG=32 NDIMB=16 NDIMC=16 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0530 ms (356.2 GFLOPS)
(1306/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0501 ms (377.0 GFLOPS)
(1348/5117) KWG=16 KWI=2 MDIMA=32 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 0.0484 ms (390.2 GFLOPS)
(1404/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 0.0444 ms (424.8 GFLOPS)
(1504/5117) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=2 0.0424 ms (444.7 GFLOPS)
(1837/5117) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=1 STRN=0 VWM=4 VWN=2 0.0421 ms (447.9 GFLOPS)
(3906/5117) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0399 ms (473.1 GFLOPS)
(3921/5117) KWG=16 KWI=2 MDIMA=32 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0374 ms (504.1 GFLOPS)
(3942/5117) KWG=32 KWI=8 MDIMA=16 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=1 0.0348 ms (542.6 GFLOPS)
(4400/5117) KWG=32 KWI=8 MDIMA=8 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=1 STRN=0 VWM=2 VWN=2 0.0332 ms (568.9 GFLOPS)
Wavefront/Warp size: 32
Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64
BLAS Core: Haswell