Lc0 CUDA 10.2 compile (0.25.1) is faster

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Lc0 CUDA 10.2 compile (0.25.1) is faster

Post by Eduard »

Lc0 0.25.1 CUDA 10.2 compile is fast.

You can download this lc0 binary for Windows 10 at Discord under test-discuss, Mai 1.

Or here, with all needed Nvidia 10.2 drivers (340 MB):

https://filehorst.de/d/dyhbuIiz

Short test CUDA 10.2 compile under Win 10 on my slow GTX 1050 Ti, after 60s:
Sergio 3200, new 950 nps (old 650 nps)
Sergio 1810, new 3000 nps (old 2100 nps)
701820, new 15 kns (old 14 kns)
Fat Fritz 1.1, new 2900 nps (old 2500 nps)

old=lc0 0.25.1 with old Nvidia drivers, new=Lc0 0.25.1_10.2 compile with Nvidia drivers 10.2.

Great work, thanks!
MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

Re: Lc0 CUDA 10.2 compile (0.25.1) is faster

Post by MMarco »

Thank you for the info.

Unfortunately there's something wrong with the new combo drivers+compile on with GTX 1660 ti : with 20x256 nets the nps drastically dropped from 9000 to a bare 1000 nps... :shock: Does someone have the same problem?
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Lc0 CUDA 10.2 compile (0.25.1) is faster

Post by corres »

As I remember well CUDA 10.2 needs VS 2019 compiler and newer NVIDIA GPU driver.
Last edited by corres on Mon May 04, 2020 8:17 am, edited 1 time in total.
User avatar
Werner
Posts: 2871
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: Lc0 CUDA 10.2 compile (0.25.1) is faster

Post by Werner »

MMarco wrote: Mon May 04, 2020 7:09 am Thank you for the info.

Unfortunately there's something wrong with the new combo drivers+compile on with GTX 1660 ti : with 20x256 nets the nps drastically dropped from 9000 to a bare 1000 nps... :shock: Does someone have the same problem?
For Gtx 16...
set `--backend-opts=custom_winograd=false` when using official v0.25 build.
Werner
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Lc0 CUDA 10.2 compile (0.25.1) is faster

Post by corres »

corres wrote: Mon May 04, 2020 7:50 am As I remember well CUDA 10.2 needs VS 2019 compiler and newer NVIDIA GPU driver.
NVIDIA Windows 10 64 bits and Windows 7 64 bits driver ver.445.87 is applicable to CUDA 10.2 but some user marked some issue with it.
Any experience using for Leela?
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lc0 CUDA 10.2 compile (0.25.1) is faster

Post by Laskos »

corres wrote: Mon May 04, 2020 9:27 am
corres wrote: Mon May 04, 2020 7:50 am As I remember well CUDA 10.2 needs VS 2019 compiler and newer NVIDIA GPU driver.
NVIDIA Windows 10 64 bits and Windows 7 64 bits driver ver.445.87 is applicable to CUDA 10.2 but some user marked some issue with it.
Any experience using for Leela?
I am using it, seems fine, both this compile and the older v25.1 compile. Only that I am not getting any speed gain with this CUDA 10.2 build. Maybe only fp32 gets big gains.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Lc0 CUDA 10.2 compile (0.25.1) is faster

Post by corres »

Laskos wrote: Mon May 04, 2020 9:44 am
corres wrote: Mon May 04, 2020 9:27 am
corres wrote: Mon May 04, 2020 7:50 am As I remember well CUDA 10.2 needs VS 2019 compiler and newer NVIDIA GPU driver.
NVIDIA Windows 10 64 bits and Windows 7 64 bits driver ver.445.87 is applicable to CUDA 10.2 but some user marked some issue with it.
Any experience using for Leela?
I am using it, seems fine, both this compile and the older v25.1 compile. Only that I am not getting any speed gain with this CUDA 10.2 build. Maybe only fp32 gets big gains.
Thanks for the info but if I am right you use DX12 backend and not cudnn-fp16 backend.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lc0 CUDA 10.2 compile (0.25.1) is faster

Post by Laskos »

corres wrote: Mon May 04, 2020 10:39 am
Laskos wrote: Mon May 04, 2020 9:44 am
corres wrote: Mon May 04, 2020 9:27 am
corres wrote: Mon May 04, 2020 7:50 am As I remember well CUDA 10.2 needs VS 2019 compiler and newer NVIDIA GPU driver.
NVIDIA Windows 10 64 bits and Windows 7 64 bits driver ver.445.87 is applicable to CUDA 10.2 but some user marked some issue with it.
Any experience using for Leela?
I am using it, seems fine, both this compile and the older v25.1 compile. Only that I am not getting any speed gain with this CUDA 10.2 build. Maybe only fp32 gets big gains.
Thanks for the info but if I am right you use DX12 backend and not cudnn-fp16 backend.
No, I benchmarked that T...160 20x256 JHorthos net using cudnn-fp16 backend. DX12 is good with larger nets, but here the issue was this CUDA 10.2 compile and libraries.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Lc0 CUDA 10.2 compile (0.25.1) is faster

Post by corres »

Laskos wrote: Mon May 04, 2020 10:50 am
corres wrote: Mon May 04, 2020 10:39 am
Laskos wrote: Mon May 04, 2020 9:44 am
corres wrote: Mon May 04, 2020 9:27 am
corres wrote: Mon May 04, 2020 7:50 am As I remember well CUDA 10.2 needs VS 2019 compiler and newer NVIDIA GPU driver.
NVIDIA Windows 10 64 bits and Windows 7 64 bits driver ver.445.87 is applicable to CUDA 10.2 but some user marked some issue with it.
Any experience using for Leela?
I am using it, seems fine, both this compile and the older v25.1 compile. Only that I am not getting any speed gain with this CUDA 10.2 build. Maybe only fp32 gets big gains.
Thanks for the info but if I am right you use DX12 backend and not cudnn-fp16 backend.
No, I benchmarked that T...160 20x256 JHorthos net using cudnn-fp16 backend. DX12 is good with larger nets, but here the issue was this CUDA 10.2 compile and libraries.
I see.
User avatar
pohl4711
Posts: 2433
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Lc0 CUDA 10.2 compile (0.25.1) is faster

Post by pohl4711 »

Laskos wrote: Mon May 04, 2020 9:44 am

I am using it, seems fine, both this compile and the older v25.1 compile. Only that I am not getting any speed gain with this CUDA 10.2 build. Maybe only fp32 gets big gains.
Same here on my RTX 2060 (mobile). No speed gain compared to offcial 0.25.1 release with cudnn-fp16.