Lc0 v0.24 dev DX backend for AMD Radeon GPU

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Damir
Posts: 2801
Joined: Mon Feb 11, 2008 3:53 pm
Location: Denmark
Full name: Damir Desevac

Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Damir »

Hi the official Lc0 release with DX backend and LogitQ not available yet.But you can test it with this version:
https://gofile.io/?c=Ntb3Xv
http://www.filedropper.com/lc0-test-windows-gpu-dx22

https://www.chess2u.com/t13501-lc0-v0-2 ... -gpu#92392

For those who can not download from the 2 links above here is the alternative link:

https://www.dropbox.com/s/66fojbhpagc5j ... 2.zip?dl=0

requirements:
OS: updated Windows 10
AMD drivers: latest >= 20.1

DX is 3-4 x faster than opencl but slower than cudnn-fp16,so its recomended only for amd radeon GPU.But you can test is also on Nvidia GPU anyway.
here some benchmarks
Code:
Nvidia RTX 2060
===============
Network Net-Size OpenCL cudnn-fp32 dx-fp32 cudnn-fp16 dx-fp16
----------------------------------------------------------------------------------
T59 59611 128x10 11624 35470 31088 88009 51326
T30 32390 256x20 2225 4972 6459 15137 17391
T40 42850 256x20 1371 4571 318* 12780 14198
T60 61996 320x24 1555 3421 3317 8418 7669
SV-big-t40-1705 384x30 856 1854 104* 4955 4799
SV-huge-50 512x40 361 802 29* 2241 121*

* - poor performance due to a driver bug (hopefully will be fixed soon).


Nvidia RTX Titan
================
Network Net-Size cudnn-fp32 dx-fp32 cudnn-fp16 dx-fp16
------------------------------------------------------------------------
T59 59611 128x10 70266 50523 123414 73675
T30 32390 256x20 14438 14173 40452 42276
T40 42850 256x20 12932 13113 36753 38388
T60 61996 320x24 8137 7505 17472 18578
SV-big-t40-1705 384x30 4143 3933 11054 13086
SV-huge-50 512x40 1942 1918 4844 7015
Code:
AMD RX 5700XT (Navi)
====================
Network Net-Size OpenCL dx-fp32 dx-fp16
-------------------------------------------------------
T59 59611 128x10 12095 37845 55888
T30 32390 256x20 1505 5814 10198
T40 42850 256x20 900 4666 8041
T60 61996 320x24 1774 2874 5183
SV-big-t40-1705 384x30 . 1479 2801
SV-huge-50 512x40 . * 1080


AMD RX Vega VII
===============
Network Net-Size OpenCL dx-fp32 dx-fp16
-------------------------------------------------------
T59 59611 128x10 14889 46099 59721
T30 32390 256x20 1525 8055 12821
T40 42850 256x20 2754 6168 9156
T60 61996 320x24 2490 3722 6078
SV-big-t40-1705 384x30 1426 2129 3254
SV-huge-50 512x40 . 852 1323
Jhoravi
Posts: 291
Joined: Wed May 08, 2013 6:49 am

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Jhoravi »

Thanks because my GPU happens to be AMD Radeon. BTW what does DX mean?
crem
Posts: 177
Joined: Wed May 23, 2018 9:29 pm

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by crem »

Jhoravi wrote: Tue Feb 11, 2020 2:35 am Thanks because my GPU happens to be AMD Radeon. BTW what does DX mean?
It's DirectX12.
It's good that you noticed, it was somehow missed. I guess we'll rename it to directx or maybe even directx12 before the release.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Laskos »

crem wrote: Tue Feb 11, 2020 12:02 pm
Jhoravi wrote: Tue Feb 11, 2020 2:35 am Thanks because my GPU happens to be AMD Radeon. BTW what does DX mean?
It's DirectX12.
It's good that you noticed, it was somehow missed. I guess we'll rename it to directx or maybe even directx12 before the release.
Very interesting. On my RTX 2070 this backend is only 10% slower with T40 nets than using cudnn-fp16 backend, only 2% slower with T60 nets, and about 15% faster than cudnn-fp16 with the huge SV net 512x40b.
Geonerd
Posts: 79
Joined: Fri Mar 10, 2017 1:44 am

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Geonerd »

Fantastic news!
Collingwood
Posts: 89
Joined: Sat Nov 09, 2019 3:24 pm
Full name: .

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Collingwood »

Does this mean there's an Lc0 version for any GPU you have?
User avatar
M ANSARI
Posts: 3707
Joined: Thu Mar 16, 2006 7:10 pm

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by M ANSARI »

Laskos wrote: Wed Feb 12, 2020 11:25 pm
crem wrote: Tue Feb 11, 2020 12:02 pm
Jhoravi wrote: Tue Feb 11, 2020 2:35 am Thanks because my GPU happens to be AMD Radeon. BTW what does DX mean?
It's DirectX12.
It's good that you noticed, it was somehow missed. I guess we'll rename it to directx or maybe even directx12 before the release.
Very interesting. On my RTX 2070 this backend is only 10% slower with T40 nets than using cudnn-fp16 backend, only 2% slower with T60 nets, and about 15% faster than cudnn-fp16 with the huge SV net 512x40b.
Hmmm ... if that is true maybe that means that Lc0 cudnn-fp16 backend needs to be updated for the RTX cards!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Laskos »

M ANSARI wrote: Thu Feb 13, 2020 7:12 am
Laskos wrote: Wed Feb 12, 2020 11:25 pm
crem wrote: Tue Feb 11, 2020 12:02 pm
Jhoravi wrote: Tue Feb 11, 2020 2:35 am Thanks because my GPU happens to be AMD Radeon. BTW what does DX mean?
It's DirectX12.
It's good that you noticed, it was somehow missed. I guess we'll rename it to directx or maybe even directx12 before the release.
Very interesting. On my RTX 2070 this backend is only 10% slower with T40 nets than using cudnn-fp16 backend, only 2% slower with T60 nets, and about 15% faster than cudnn-fp16 with the huge SV net 512x40b.
Hmmm ... if that is true maybe that means that Lc0 cudnn-fp16 backend needs to be updated for the RTX cards!
It seems so. Here I checked at 30s + 0.3s the strength in a sanity check with 384x30b SV net 1538, one of hte strongest nets to LTC on a strong GPU.

Score of LargeNet_1538_dx vs LargeNet_1538_cudnn: 115 - 81 - 204 [0.542]
Elo difference: 29.6 +/- 23.8, LOS: 99.2 %, DrawRatio: 51.0 %

400 of 400 games finished.

Normalized Elo (pentanomial): 0.170 +/- 0.050 (1 SD)

DX performs better strength-wise too on these larger nets than cuDNN on an RTX 2070 GPU.
ankan
Posts: 77
Joined: Sun Apr 21, 2013 3:29 pm
Full name: Ankan Banerjee

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by ankan »

Thanks Laskos for testing. It's good to know that the speed increase translates to improvement in playing strength.
The dx backend (that defaults to fp16 precision) uses a different algorithm for convolution (winograd) that scales better with bigger networks compared to what cudnn-fp16 uses (implicit_gemm). We will be adding that path to cudnn-fp16 backend too.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lc0 v0.24 dev DX backend for AMD Radeon GPU

Post by Laskos »

ankan wrote: Thu Feb 13, 2020 12:48 pm Thanks Laskos for testing. It's good to know that the speed increase translates to improvement in playing strength.
The dx backend (that defaults to fp16 precision) uses a different algorithm for convolution (winograd) that scales better with bigger networks compared to what cudnn-fp16 uses (implicit_gemm). We will be adding that path to cudnn-fp16 backend too.
Thanks for the info, I was thinking that cuDNN backend also uses fast Winograd convolutions, at least that was the talk more than year ago, and in fact I used that 10-12 years ago with image processing.