Checking the backends with the new lc0 binary

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Checking the backends with the new lc0 binary

Post by mwyoung »

mwyoung wrote: Fri Oct 02, 2020 1:00 am
AdminX wrote: Thu Oct 01, 2020 10:14 pm
Laskos wrote: Thu Oct 01, 2020 1:50 pm
To remark the excellent result of DX12 backend, which seems by NPS vastly superior to the other two. A glitch occurred with this command line:

lc0_v263rc1.exe benchmark --backend=cudnn-fp16 --minibatch-size=240

which sometimes exits with this error message:

Position: 1/34 rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Unhandled exception in worker thread: CUDA error: an illegal memory access was encountered (c:\projects\lc0\src\neural\cuda\network_cudnn.cc:789)
Unhandled exception in worker thread:
I found it works better with this format:

lc0_v263rc1.exe benchmark --minibatch-size=240 --backend=cudnn-fp16

As you can see all I did was invert the two arguments.
No there is a issue 0.26.3-rc1. I tested it the day it came out in a 200 game blitz match. It was faster, but it also crashed about 23 times in 200 games. Causing a big loss in the match. I hope this will be corrected in rc2.
Rc2 is now ready for download. And the issue is claimed to be fixed. And rc2 should also run faster.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
AdminX
Posts: 6340
Joined: Mon Mar 13, 2006 2:34 pm
Location: Acworth, GA

Re: Checking the backends with the new lc0 binary

Post by AdminX »

Laskos wrote: Sat Oct 03, 2020 7:00 pm
AdminX wrote: Thu Oct 01, 2020 10:55 pm I was able to replicate your results on 2070 Super

Code: Select all

DX12 

lc0.exe benchmark --minibatch-size=240 --threads=2 --backend-opts=gpu=0

===========================
Total time (ms) : 341461
Nodes searched  : 3506458
Nodes/second    : 10269

Code: Select all

Cudnn-fp16 

lc0.exe benchmark --minibatch-size=240 --threads=2 --backend-opts=gpu=0

===========================
Total time (ms) : 341514
Nodes searched  : 2762948
Nodes/second    : 8090

Someone directed me to this test version with CUDA 11.1 and cuDNN 8.04
https://appveyorcidatav2.blob.core.wind ... a-cuda.zip
and replace lc0 with this one:
https://appveyorcidatav2.blob.core.wind ... ld/lc0.exe

I am getting very much improved results (50%+ faster) for cudnn-fp16 and cuda-fp16:

cudnn-fp16
Total time (ms) : 341515
Nodes searched : 3547152
Nodes/second : 10386

cuda-fp16
Total time (ms) : 341370
Nodes searched : 3630548
Nodes/second : 10635

dx12
Total time (ms) : 341409
Nodes searched : 3077528
Nodes/second : 9014

Cuda-fp16 seems now even faster than cudnn-fp16, and both above DX12.
Links not working, files may no longer be there.
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Checking the backends with the new lc0 binary

Post by Laskos »

AdminX wrote: Sat Oct 03, 2020 9:15 pm
Laskos wrote: Sat Oct 03, 2020 7:00 pm
AdminX wrote: Thu Oct 01, 2020 10:55 pm I was able to replicate your results on 2070 Super

Code: Select all

DX12 

lc0.exe benchmark --minibatch-size=240 --threads=2 --backend-opts=gpu=0

===========================
Total time (ms) : 341461
Nodes searched  : 3506458
Nodes/second    : 10269

Code: Select all

Cudnn-fp16 

lc0.exe benchmark --minibatch-size=240 --threads=2 --backend-opts=gpu=0

===========================
Total time (ms) : 341514
Nodes searched  : 2762948
Nodes/second    : 8090

Someone directed me to this test version with CUDA 11.1 and cuDNN 8.04
https://appveyorcidatav2.blob.core.wind ... a-cuda.zip
and replace lc0 with this one:
https://appveyorcidatav2.blob.core.wind ... ld/lc0.exe

I am getting very much improved results (50%+ faster) for cudnn-fp16 and cuda-fp16:

cudnn-fp16
Total time (ms) : 341515
Nodes searched : 3547152
Nodes/second : 10386

cuda-fp16
Total time (ms) : 341370
Nodes searched : 3630548
Nodes/second : 10635

dx12
Total time (ms) : 341409
Nodes searched : 3077528
Nodes/second : 9014

Cuda-fp16 seems now even faster than cudnn-fp16, and both above DX12.
Links not working, files may no longer be there.
Try these:

https://appveyorcidatav2.blob.core.wind ... 3A24Z&sp=r

https://appveyorcidatav2.blob.core.wind ... 3A28Z&sp=r
sarona
Posts: 122
Joined: Tue Oct 29, 2019 4:14 pm
Location: Canada
Full name: Ron Doughie

Re: Checking the backends with the new lc0 binary

Post by sarona »

I still cannot get them, either.

Code: Select all

<Error>
<Code>AuthenticationFailed</Code>
<Message>
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:48280c10-101e-0054-34c2-997f87000000 Time:2020-10-03T20:22:43.9719130Z
</Message>
<AuthenticationErrorDetail>
Signature not valid in the specified time frame: Start [Sat, 03 Oct 2020 19:23:24 GMT] - Expiry [Sat, 03 Oct 2020 19:29:24 GMT] - Current [Sat, 03 Oct 2020 20:22:43 GMT]
</AuthenticationErrorDetail>
</Error>
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Checking the backends with the new lc0 binary

Post by mwyoung »

mwyoung wrote: Sat Oct 03, 2020 9:05 pm
mwyoung wrote: Fri Oct 02, 2020 1:00 am
AdminX wrote: Thu Oct 01, 2020 10:14 pm
Laskos wrote: Thu Oct 01, 2020 1:50 pm
To remark the excellent result of DX12 backend, which seems by NPS vastly superior to the other two. A glitch occurred with this command line:

lc0_v263rc1.exe benchmark --backend=cudnn-fp16 --minibatch-size=240

which sometimes exits with this error message:

Position: 1/34 rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Unhandled exception in worker thread: CUDA error: an illegal memory access was encountered (c:\projects\lc0\src\neural\cuda\network_cudnn.cc:789)
Unhandled exception in worker thread:
I found it works better with this format:

lc0_v263rc1.exe benchmark --minibatch-size=240 --backend=cudnn-fp16

As you can see all I did was invert the two arguments.
No there is a issue 0.26.3-rc1. I tested it the day it came out in a 200 game blitz match. It was faster, but it also crashed about 23 times in 200 games. Causing a big loss in the match. I hope this will be corrected in rc2.
Rc2 is now ready for download. And the issue is claimed to be fixed. And rc2 should also run faster.
All seems to be working fine with 0.26.3-rc2. And now getting much faster speed with cuda 11.1 with the big nets 384x30. Game average right now is 38.2 Knps on a 2080ti with default settings.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Checking the backends with the new lc0 binary

Post by corres »

I also got only error messages from there.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Checking the backends with the new lc0 binary

Post by Laskos »

corres wrote: Sun Oct 04, 2020 10:39 am
I also got only error messages from there.
The links I have seem to not be working anymore, and I am unable to upload these large files.

Get the CUDA backend with all dll's from the new 0.26.3-rc2 official release. It is the fastest, faster than cuDNN backend in all my tests with different nets.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Checking the backends with the new lc0 binary

Post by corres »

Laskos wrote: Sun Oct 04, 2020 11:10 am
corres wrote: Sun Oct 04, 2020 10:39 am
I also got only error messages from there.
The links I have seem to not be working anymore, and I am unable to upload these large files.

Get the CUDA backend with all dll's from the new 0.26.3-rc2 official release. It is the fastest, faster than cuDNN backend in all my tests with different nets.
Thanks.
User avatar
AdminX
Posts: 6340
Joined: Mon Mar 13, 2006 2:34 pm
Location: Acworth, GA

Re: Checking the backends with the new lc0 binary

Post by AdminX »

Laskos wrote: Sun Oct 04, 2020 11:10 am
corres wrote: Sun Oct 04, 2020 10:39 am
I also got only error messages from there.
The links I have seem to not be working anymore, and I am unable to upload these large files.

Get the CUDA backend with all dll's from the new 0.26.3-rc2 official release. It is the fastest, faster than cuDNN backend in all my tests with different nets.
Or grab this one: https://ci.appveyor.com/project/LeelaCh ... /artifacts
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers
sarona
Posts: 122
Joined: Tue Oct 29, 2019 4:14 pm
Location: Canada
Full name: Ron Doughie

Re: Checking the backends with the new lc0 binary

Post by sarona »

Thanks very much Kai and Ted.

I downloaded (from AppVeyor) and quickly tried both v0.26.3-rc2 (3787) CUDA and cuDNN binaries using the 384x30-t60-4619.pb (Sergio) net and an RTX 2080 GPU. I am using the CUDA 11.1 Toolkit and v8.0.4.30 cuDNN libraries.

Image

Will try the DX12 binary tonight.