How good is the RTX 2080 Ti for Leela?

jkiliani · Post by **jkiliani** » Thu Sep 20, 2018 10:10 pm

Ankan posted Lc0 benchmarks for the RTX 2080 Ti on Leela Discord today, since nondisclosure clauses regarding benchmarks of those are no longer in force now that the hardware is released:

Code: Select all

with cudnn 7.3 and 411.63 driver available at nvidia.com 
minibatch-size=512, network id: 11250, go nodes 1000000

             fp32    fp16    
Titan V:     13295   29379
RTX 2080Ti:  12208   32472

So, the (top) RTX card actually outperforms a Titan V for Lc0 when using fp16. Ankan will also post some benchmarks for the RTX 2080 soon.

Robert Pope · Post by **Robert Pope** » Thu Sep 20, 2018 10:29 pm

How does that compare to a 1080 or 1080 ti?

Lion · Post by **Lion** » Fri Sep 21, 2018 8:15 am

jkiliani wrote: ↑Thu Sep 20, 2018 10:10 pm Ankan posted Lc0 benchmarks for the RTX 2080 Ti on Leela Discord today, since nondisclosure clauses regarding benchmarks of those are no longer in force now that the hardware is released:
Code: Select all
with cudnn 7.3 and 411.63 driver available at nvidia.com 
minibatch-size=512, network id: 11250, go nodes 1000000

             fp32    fp16    
Titan V:     13295   29379
RTX 2080Ti:  12208   32472
So, the (top) RTX card actually outperforms a Titan V for Lc0 when using fp16. Ankan will also post some benchmarks for the RTX 2080 soon.

Thank you !

Can you or someone else explain me what is fp32 and fp16 and how to set that?
Is that also existing with other GPU such as the 1080Ti and or 1060 ?

rgds

jkiliani · Post by **jkiliani** » Fri Sep 21, 2018 8:27 am

Lion wrote: ↑Fri Sep 21, 2018 8:15 am Thank you !

Can you or someone else explain me what is fp32 and fp16 and how to set that?
Is that also existing with other GPU such as the 1080Ti and or 1060 ?

rgds

Small update, Ankan added the 2080 to his benchmarks:

Code: Select all

with cudnn 7.3 and 411.63 driver available at nvidia.com 
minibatch-size=512, network id: 11250, go nodes 1000000

             fp32    fp16    
Titan V:     13295   29379
RTX 2080:     9708   26678
RTX 2080Ti:  12208   32472

About fp32 and fp16, this is the calculation precision of the neural network inference. fp32 refers to 32 bit floats, fp16 to 16 bit floats. It has been experimentally confirmed that the reduced floating point accuracy of 16 bit NN inference does not reduce playing strength for Lc0 significantly. However, there is not much point with GTX 10xx GPUs since those are not optimised for fp16. The RTX cards on the other hand are, in their case fp16 gains a large amount of speed as can be seen from those benchmarks.

As for how to use, it, you initialise Lc0 with "backend=cudnn-fp16" instead of "backend=cudnn".

Lion · Post by **Lion** » Fri Sep 21, 2018 8:38 am

jkiliani wrote: ↑Fri Sep 21, 2018 8:27 am
Lion wrote: ↑Fri Sep 21, 2018 8:15 am Thank you !

Can you or someone else explain me what is fp32 and fp16 and how to set that?
Is that also existing with other GPU such as the 1080Ti and or 1060 ?

rgds
Small update, Ankan added the 2080 to his benchmarks:
Code: Select all
with cudnn 7.3 and 411.63 driver available at nvidia.com 
minibatch-size=512, network id: 11250, go nodes 1000000

             fp32    fp16    
Titan V:     13295   29379
RTX 2080:     9708   26678
RTX 2080Ti:  12208   32472
About fp32 and fp16, this is the calculation precision of the neural network inference. fp32 refers to 32 bit floats, fp16 to 16 bit floats. It has been experimentally confirmed that the reduced floating point accuracy of 16 bit NN inference does not reduce playing strength for Lc0 significantly. However, there is not much point with GTX 10xx GPUs since those are not optimised for fp16. The RTX cards on the other hand are, in their case fp16 gains a large amount of speed as can be seen from those benchmarks.

As for how to use, it, you initialise Lc0 with "backend=cudnn-fp16" instead of "backend=cudnn".

Thank you very much for the explanation!
fp16 is over 2x faster than f32 so can we assume that the ELO would be around 50 higher in fp16 vs fp32?

Also am I mistaken to say that LC0 in CCCC tournament is not using fp16?
If correct, why are they using fp32?

Werewolf · Post by **Werewolf** » Fri Sep 21, 2018 9:48 am

jkiliani wrote: ↑Fri Sep 21, 2018 8:27 am
Lion wrote: ↑Fri Sep 21, 2018 8:15 am Thank you !

Can you or someone else explain me what is fp32 and fp16 and how to set that?
Is that also existing with other GPU such as the 1080Ti and or 1060 ?

rgds
Small update, Ankan added the 2080 to his benchmarks:
Code: Select all
with cudnn 7.3 and 411.63 driver available at nvidia.com 
minibatch-size=512, network id: 11250, go nodes 1000000

             fp32    fp16    
Titan V:     13295   29379
RTX 2080:     9708   26678
RTX 2080Ti:  12208   32472
About fp32 and fp16, this is the calculation precision of the neural network inference. fp32 refers to 32 bit floats, fp16 to 16 bit floats. It has been experimentally confirmed that the reduced floating point accuracy of 16 bit NN inference does not reduce playing strength for Lc0 significantly. However, there is not much point with GTX 10xx GPUs since those are not optimised for fp16. The RTX cards on the other hand are, in their case fp16 gains a large amount of speed as can be seen from those benchmarks.

As for how to use, it, you initialise Lc0 with "backend=cudnn-fp16" instead of "backend=cudnn".

Presumably if we keep going down in precision there will be a penalty as the weights won't be as precise.

Is the "backend=cudnn-fp16" command something which goes in the command line? What about for GUIs without one?

Guenther · Post by **Guenther** » Fri Sep 21, 2018 10:32 am

Werewolf wrote: ↑Fri Sep 21, 2018 9:48 am
Presumably if we keep going down in precision there will be a penalty as the weights won't be as precise.

Is the "backend=cudnn-fp16" command something which goes in the command line? What about for GUIs without one?

It is just another uci option

Code: Select all

0.343: < option name Network weights file path type string default <autodiscover>
0.343: < option name Number of worker threads type spin default 2 min 1 max 128
0.343: < option name NNCache size type spin default 200000 min 0 max 999999999
0.343: < option name NN backend to use type combo default cudnn var cudnn var cudnn-fp16 var check var random var multiplexing
0.343: < option name NN backend parameters type string default 
0.343: < option name Scale thinking time type string default 2.400000
0.343: < option name Move time overhead in milliseconds type spin default 100 min 0 max 10000
0.343: < option name Time weight curve peak ply type string default 26.200001
0.343: < option name Time weight curve width left of peak type string default 82.000000
0.343: < option name Time weight curve width right of peak type string default 74.000000
0.343: < option name List of Syzygy tablebase directories type string default 
0.343: < option name Ponder type check default false
0.343: < option name Minibatch size for NN inference type spin default 256 min 1 max 1024
0.343: < option name Max prefetch nodes, per NN call type spin default 32 min 0 max 1024
0.343: < option name Cpuct MCTS option type string default 3.400000
0.343: < option name Initial temperature type string default 0.000000
0.343: < option name Moves with temperature decay type spin default 0 min 0 max 100
0.343: < option name Add Dirichlet noise at root node type check default false
0.343: < option name Display verbose move stats type check default false
0.343: < option name Aversion to search if change unlikely type string default 1.330000
0.343: < option name First Play Urgency Reduction type string default 0.900000
0.343: < option name Length of history to include in cache type spin default 1 min 0 max 7
0.343: < option name Policy softmax temperature type string default 2.200000
0.343: < option name Allowed node collisions, per batch type spin default 32 min 0 max 1024
0.343: < option name Out-of-order cache backpropagation type check default false
0.343: < option name Ignore alternatives to checkmate type check default false
0.343: < option name Configuration file path type string default lc0.config
0.343: < option name Do debug logging into file type string default

jkiliani · Post by **jkiliani** » Fri Sep 21, 2018 10:36 am

Werewolf wrote: ↑Fri Sep 21, 2018 9:48 am Presumably if we keep going down in precision there will be a penalty as the weights won't be as precise.

Is the "backend=cudnn-fp16" command something which goes in the command line? What about for GUIs without one?

The penalty for the case of going from fp32 to fp16 was found to be small enough to be easily compensated by the speed increase. int8 inference which is also supported by the RTX cards got some preliminary experiments, but at least here the accuracy loss was severe enough to lose considerable strength. For fp16, we're fine (this has actually been used both for TCEC bonus games and CCCC).

jkiliani · Post by **jkiliani** » Fri Sep 21, 2018 12:14 pm

Robert Pope wrote: ↑Thu Sep 20, 2018 10:29 pm How does that compare to a 1080 or 1080 ti?

Code: Select all

             fp32    fp16    
GTX 1080Ti:   8996     -
Titan V:     13295   29379
RTX 2080:     9708   26678
RTX 2080Ti:  12208   32472

So an RTX 2080 will give around a factor 3 improvement to the GTX 1080 Ti when using fp16 which the GTX doesn't support. For the 2080Ti, the improvement is a factor of 3.6. The RTX 2070 isn't released yet, but I would guess it should still pull around 20k nps on fp16, for a much lower power consumption and purchase price than the 1080Ti.

Werewolf · Post by **Werewolf** » Fri Sep 21, 2018 12:55 pm

Great that it's so easy to configure.

How good is the RTX 2080 Ti for Leela?

Re: How good is the RTX 2080 Ti for Leela?

Re: How good is the RTX 2080 Ti for Leela?

Re: How good is the RTX 2080 Ti for Leela?

Re: How good is the RTX 2080 Ti for Leela?

Re: How good is the RTX 2080 Ti for Leela?

Re: How good is the RTX 2080 Ti for Leela?

Re: How good is the RTX 2080 Ti for Leela?

Re: How good is the RTX 2080 Ti for Leela?

Re: How good is the RTX 2080 Ti for Leela?

Re: How good is the RTX 2080 Ti for Leela?