Page 1 of 2

Dramatic drop in performance with LC0 after updating Cuda drivers

Posted: Wed Jan 09, 2019 10:50 am
by M ANSARI
I was on the Nvidia website and I saw that they had Cuda drivers version 10 ... so I figured I would upgrade my Cuda drivers since I had about 1700 cuda cores. Unfortunately after upgrading, my Knps went from about 14 to 1. I then tried to uninstall the drivers and was surprised to see there was around 8 different parts to uninstall ... but that seems to not help. Anyone know what the best way to setup Cuda drivers for LC0? Does LC0 need to use some older Cuda drivers to work well?

Re: Dramatic drop in performance with LC0 after updating Cuda drivers

Posted: Wed Jan 09, 2019 5:38 pm
by Leo
That's one of the big problems with this new engine. Not many people are used to using GPU cards for chess engines. There is a lot of learning needed and trial and error.

Re: Dramatic drop in performance with LC0 after updating Cuda drivers

Posted: Wed Jan 09, 2019 7:16 pm
by yanquis1972
M ANSARI wrote: Wed Jan 09, 2019 10:50 am I was on the Nvidia website and I saw that they had Cuda drivers version 10 ... so I figured I would upgrade my Cuda drivers since I had about 1700 cuda cores. Unfortunately after upgrading, my Knps went from about 14 to 1. I then tried to uninstall the drivers and was surprised to see there was around 8 different parts to uninstall ... but that seems to not help. Anyone know what the best way to setup Cuda drivers for LC0? Does LC0 need to use some older Cuda drivers to work well?
from what i'm reading on the lc0 site (it's been awhile since i installed it, & there was a lot of general confusion then), you don't need to install anything except basic nvidia drivers. the necessary cuda files come packaged with the client/uci download. maybe yours aren't up to date? i assume you'd get the below message if you open lc0.exe & type 'go' --
CUDA driver version is insufficient for CUDA runtime version

You need 411.31 or newer. Download latest from nvidia's website.
(from https://github.com/LeelaChessZero/lc0/w ... ng-Started

the drivers mentioned can be downloaded here https://www.nvidia.com/Download/index.aspx?lang=en-us

Re: Dramatic drop in performance with LC0 after updating Cuda drivers

Posted: Wed Jan 09, 2019 9:05 pm
by M ANSARI
Actually I did download what was the latest Cuda drivers from Nvidia … I think it was version 10. But somehow installing those drivers installed like 8 different subsets of drivers, including Physix (or something like that) and a bunch of other stuff that I have no clue what they were. Most likely these new drivers overwrote some stuff in LC0 and now LC0 cannot work well with it. I think there is probably a lot of low laying fruit when it comes to improving the performance of LC0 by just tuning the way the engine uses its available hardware more efficiently. I will uninstall LC0 and reinstall it again to see if things can get cleared up.

Re: Dramatic drop in performance with LC0 after updating Cuda drivers

Posted: Wed Jan 09, 2019 10:03 pm
by corres
M ANSARI wrote: Wed Jan 09, 2019 9:05 pm Actually I did download what was the latest Cuda drivers from Nvidia … I think it was version 10. But somehow installing those drivers installed like 8 different subsets of drivers, including Physix (or something like that) and a bunch of other stuff that I have no clue what they were. Most likely these new drivers overwrote some stuff in LC0 and now LC0 cannot work well with it. I think there is probably a lot of low laying fruit when it comes to improving the performance of LC0 by just tuning the way the engine uses its available hardware more efficiently. I will uninstall LC0 and reinstall it again to see if things can get cleared up.
If you have an NVIDIA graphic card you must install the latest NVIDIA driver only.
When you install the driver you should choice the graphic driver and the physX driver only.
Then you get the cublas64_100.dll and cudart64_100.dll from the the CUDA installer with the help of 7z extractor.
You can find the cudnn64_7.dll in the cudnn-10.0-windows10-x64-v7.4.2.24.zip file.
It is practical to put the three .dll, lco_xx.exe and the .pb net-file into a common directory.

Re: Dramatic drop in performance with LC0 after updating Cuda drivers

Posted: Wed Jan 09, 2019 10:07 pm
by Albert Silver
corres wrote: Wed Jan 09, 2019 10:03 pm
M ANSARI wrote: Wed Jan 09, 2019 9:05 pm Actually I did download what was the latest Cuda drivers from Nvidia … I think it was version 10. But somehow installing those drivers installed like 8 different subsets of drivers, including Physix (or something like that) and a bunch of other stuff that I have no clue what they were. Most likely these new drivers overwrote some stuff in LC0 and now LC0 cannot work well with it. I think there is probably a lot of low laying fruit when it comes to improving the performance of LC0 by just tuning the way the engine uses its available hardware more efficiently. I will uninstall LC0 and reinstall it again to see if things can get cleared up.
If you have an NVIDIA graphic card you must install the latest NVIDIA driver only.
When you install the driver you should choice the graphic driver and the physX driver only.
Then you get the cublas64_100.dll and cudart64_100.dll from the the CUDA installer with the help of 7z extractor.
You can find the cudnn64_7.dll in the cudnn-10.0-windows10-x64-v7.4.2.24.zip file.
It is practical to put the three .dll, lco_xx.exe and the .pb net-file into a common directory.
There is absolutely no need to do any of this. The latest CUDA versions of lc0 come packaged with the DLLs. Just download and unzip, and make sure you have the most recent Nvidia drivers. That is it.

Re: Dramatic drop in performance with LC0 after updating Cuda drivers

Posted: Wed Jan 09, 2019 10:14 pm
by corres
Albert Silver wrote: Wed Jan 09, 2019 10:07 pm
corres wrote: Wed Jan 09, 2019 10:03 pm
M ANSARI wrote: Wed Jan 09, 2019 9:05 pm Actually I did download what was the latest Cuda drivers from Nvidia … I think it was version 10. But somehow installing those drivers installed like 8 different subsets of drivers, including Physix (or something like that) and a bunch of other stuff that I have no clue what they were. Most likely these new drivers overwrote some stuff in LC0 and now LC0 cannot work well with it. I think there is probably a lot of low laying fruit when it comes to improving the performance of LC0 by just tuning the way the engine uses its available hardware more efficiently. I will uninstall LC0 and reinstall it again to see if things can get cleared up.
If you have an NVIDIA graphic card you must install the latest NVIDIA driver only.
When you install the driver you should choice the graphic driver and the physX driver only.
Then you get the cublas64_100.dll and cudart64_100.dll from the the CUDA installer with the help of 7z extractor.
You can find the cudnn64_7.dll in the cudnn-10.0-windows10-x64-v7.4.2.24.zip file.
It is practical to put the three .dll, lco_xx.exe and the .pb net-file into a common directory.
There is absolutely no need to do any of this. The latest CUDA versions of lc0 come packaged with the DLLs. Just download and unzip, and make sure you have the most recent Nvidia drivers. That is it.
But if he want to try another drivers he should follow what I wrote above.

Re: Dramatic drop in performance with LC0 after updating Cuda drivers

Posted: Sat Jan 12, 2019 2:45 pm
by Albert Silver
corres wrote: Wed Jan 09, 2019 10:14 pm
Albert Silver wrote: Wed Jan 09, 2019 10:07 pm
corres wrote: Wed Jan 09, 2019 10:03 pm
M ANSARI wrote: Wed Jan 09, 2019 9:05 pm Actually I did download what was the latest Cuda drivers from Nvidia … I think it was version 10. But somehow installing those drivers installed like 8 different subsets of drivers, including Physix (or something like that) and a bunch of other stuff that I have no clue what they were. Most likely these new drivers overwrote some stuff in LC0 and now LC0 cannot work well with it. I think there is probably a lot of low laying fruit when it comes to improving the performance of LC0 by just tuning the way the engine uses its available hardware more efficiently. I will uninstall LC0 and reinstall it again to see if things can get cleared up.
If you have an NVIDIA graphic card you must install the latest NVIDIA driver only.
When you install the driver you should choice the graphic driver and the physX driver only.
Then you get the cublas64_100.dll and cudart64_100.dll from the the CUDA installer with the help of 7z extractor.
You can find the cudnn64_7.dll in the cudnn-10.0-windows10-x64-v7.4.2.24.zip file.
It is practical to put the three .dll, lco_xx.exe and the .pb net-file into a common directory.
There is absolutely no need to do any of this. The latest CUDA versions of lc0 come packaged with the DLLs. Just download and unzip, and make sure you have the most recent Nvidia drivers. That is it.
But if he want to try another drivers he should follow what I wrote above.
No, that is incorrect. He can use any driver that supports CUDA 10.

Re: Dramatic drop in performance with LC0 after updating Cuda drivers

Posted: Sat Jan 12, 2019 9:42 pm
by corres
Albert Silver wrote: Sat Jan 12, 2019 2:45 pm
corres wrote: Wed Jan 09, 2019 10:14 pm
Albert Silver wrote: Wed Jan 09, 2019 10:07 pm
corres wrote: Wed Jan 09, 2019 10:03 pm
M ANSARI wrote: Wed Jan 09, 2019 9:05 pm Actually I did download what was the latest Cuda drivers from Nvidia … I think it was version 10. But somehow installing those drivers installed like 8 different subsets of drivers, including Physix (or something like that) and a bunch of other stuff that I have no clue what they were. Most likely these new drivers overwrote some stuff in LC0 and now LC0 cannot work well with it. I think there is probably a lot of low laying fruit when it comes to improving the performance of LC0 by just tuning the way the engine uses its available hardware more efficiently. I will uninstall LC0 and reinstall it again to see if things can get cleared up.
If you have an NVIDIA graphic card you must install the latest NVIDIA driver only.
When you install the driver you should choice the graphic driver and the physX driver only.
Then you get the cublas64_100.dll and cudart64_100.dll from the the CUDA installer with the help of 7z extractor.
You can find the cudnn64_7.dll in the cudnn-10.0-windows10-x64-v7.4.2.24.zip file.
It is practical to put the three .dll, lco_xx.exe and the .pb net-file into a common directory.
There is absolutely no need to do any of this. The latest CUDA versions of lc0 come packaged with the DLLs. Just download and unzip, and make sure you have the most recent Nvidia drivers. That is it.
But if he want to try another drivers he should follow what I wrote above.
No, that is incorrect. He can use any driver that supports CUDA 10.
What is that being incorrect?
For working Leela needs only cublas64_100.dll, cudart64_100.dll, cudnn64_7.dll, the net .pb file
and not "any" drivers.
These .dll files are separately for Windows 10 64bits and for Windows 7 64bits.

Re: Dramatic drop in performance with LC0 after updating Cuda drivers

Posted: Sun Jan 13, 2019 5:05 am
by ankan
I think there is some confusion.

The 'cudart64_100.dll, cublas64_100.dll, cudnn64_7.dll' files is not part of driver - they are cuda runtime/cublas/cudnn library files.
Very early builds of lc0 didn't come with them, so they had to be downloaded separately from nvidia's website (first two are part of cuda sdk, and third one is cudnn sdk).
All recent lc0 packages already include these files. E.g: in the "lc0-v0.20.1-windows-cuda.zip" package that can be downloaded from here:
https://github.com/LeelaChessZero/lc0/releases

As Albert said, all you need is a recent nvidia display driver, which can be downloaded from here:
https://www.nvidia.in/Download/index.aspx?lang=en-in
(When you install from the above link, you don't need to check any optional component - just need the display driver)

The other thing you need to run lc0 is the network file - which can be downloaded from here:
http://testserver.lczero.org/networks/

It's possible that earlier you were using a smaller network from T35 run (IDs 35xxx-36xxx, e.g: 36089) which would run much faster than full-sized network from other runs and the dramatic drop you are seeing is because now you are running with a bigger network. At long TC/faster hardware latest T30 networks should be the strongest (IDs 30xxx-32xxx, e.g: 32574).
Note that networks from T40 run (IDs 40xxx onwards) are relatively very weak as the run has just started.