Page 1 of 6

Re: How good is the RTX 2080 Ti for Leela?

Posted: Sun Sep 16, 2018 5:46 am
by ankan
It should be very similar to a Titan V for lc0.

It has tensor cores enabled, and it's peak fp16 tensor math throughput is almost exactly same as a Titan V (114 Tflops vs 110 Tflops):
https://www.anandtech.com/show/13282/nv ... eep-dive/6

I have one, but I can't post any benchmarks before reviews are out :)

Note that right now lc0 can't make use of int8 (or int4) math, but google did it with A0 on their TPUs so its something lc0 team wants to try in future. If successful, we hope to get another 2x speedup.

Re: How good is the RTX 2080 Ti for Leela?

Posted: Mon Sep 17, 2018 6:05 am
by Werewolf
ankan wrote: Sun Sep 16, 2018 5:46 am It should be very similar to a Titan V for lc0.

It has tensor cores enabled, and it's peak fp16 tensor math throughput is almost exactly same as a Titan V (114 Tflops vs 110 Tflops):
https://www.anandtech.com/show/13282/nv ... eep-dive/6

I have one, but I can't post any benchmarks before reviews are out :)

Note that right now lc0 can't make use of int8 (or int4) math, but google did it with A0 on their TPUs so its something lc0 team wants to try in future. If successful, we hope to get another 2x speedup.
I’m not accusing you of lying but why would Nvidia cripple the CUDA cores on the 2080 Ti for FP16 (presumably to protect Quadro) and then allow the tensor cores to run full speed?

In a week or two Lc0’s speed on this card will finally be revealed- I hope you’re right

Re: How good is the RTX 2080 Ti for Leela?

Posted: Mon Sep 17, 2018 6:25 am
by ankan
Werewolf wrote: Mon Sep 17, 2018 6:05 am I’m not accusing you of lying but why would Nvidia cripple the CUDA cores on the 2080 Ti for FP16 (presumably to protect Quadro) and then allow the tensor cores to run full speed?

In a week or two Lc0’s speed on this card will finally be revealed- I hope you’re right
I don't know from where people got the rumors that Nvidia crippled non-tensor fp16 math on 2080Ti.

See page 8/9 of this document for full specs:
https://www.nvidia.com/content/dam/en-z ... epaper.pdf

The only thing that is different from Quadro is "Peak FP16 Tensor TFLOPS with FP32 Accumulate" which lc0 doesn't use.

Milos has no idea what he is talking about...

Re: How good is the RTX 2080 Ti for Leela?

Posted: Mon Sep 17, 2018 6:37 am
by Werewolf
ankan wrote: Mon Sep 17, 2018 6:25 am
I don't know from where people got the rumors that Nvidia crippled non-tensor fp16 math on 2080Ti.

Milos has no idea what he is talking about...
But it's not from Milos, it's from Wikipedia:

https://en.wikipedia.org/wiki/List_of_N ... _20_series

Unless you're saying the wiki page is wrong the CUDA cores are crippled for FP16. In addition to that there's also the debate as to whether LC0 can use tensor cores, but that's not something I know about.

Re: How good is the RTX 2080 Ti for Leela?

Posted: Mon Sep 17, 2018 9:26 am
by Error323
Werewolf wrote: Mon Sep 17, 2018 6:37 am
ankan wrote: Mon Sep 17, 2018 6:25 am
I don't know from where people got the rumors that Nvidia crippled non-tensor fp16 math on 2080Ti.

Milos has no idea what he is talking about...
But it's not from Milos, it's from Wikipedia:

https://en.wikipedia.org/wiki/List_of_N ... _20_series

Unless you're saying the wiki page is wrong the CUDA cores are crippled for FP16. In addition to that there's also the debate as to whether LC0 can use tensor cores, but that's not something I know about.
It's not the CUDA cores that will be doing the FP16 computations, but the tensorcores. They are specifically designed for neural network inference, because that's what the new raytracing technique is using to make it work in realtime. Fortunately for us those cores are perfect for Lc0 as we use a very similar neural network architecture for chess (convolutional layers).

Also, you should listen to ankan, he's got that 2080 for a reason ;) And he wrote our cudnn backend!

Re: How good is the RTX 2080 Ti for Leela?

Posted: Mon Sep 17, 2018 10:40 am
by Werewolf
Error323 wrote: Mon Sep 17, 2018 9:26 am
It's not the CUDA cores that will be doing the FP16 computations, but the tensorcores. They are specifically designed for neural network inference, because that's what the new raytracing technique is using to make it work in realtime. Fortunately for us those cores are perfect for Lc0 as we use a very similar neural network architecture for chess (convolutional layers).

Also, you should listen to ankan, he's got that 2080 for a reason ;) And he wrote our cudnn backend!
Well if that's correct it's great news for everyone, I'm not complaining!
However, there do seem to be some differences with the CUDA cores between Quadro and Geforce.

Re: How good is the RTX 2080 Ti for Leela?

Posted: Mon Sep 17, 2018 12:25 pm
by Werewolf
Werewolf wrote: Mon Sep 17, 2018 10:40 am
However, there do seem to be some differences with the CUDA cores between Quadro and Geforce.
Forget that comment - I see why it's wrong now.

What's FP16 accumulate?

Re: How good is the RTX 2080 Ti for Leela?

Posted: Mon Sep 17, 2018 5:29 pm
by ankan
Werewolf wrote: Mon Sep 17, 2018 12:25 pm
Werewolf wrote: Mon Sep 17, 2018 10:40 am
However, there do seem to be some differences with the CUDA cores between Quadro and Geforce.
Forget that comment - I see why it's wrong now.

What's FP16 accumulate?
Tensor cores perform small matrix multiplies and accumulate. See https://devblogs.nvidia.com/programming ... es-cuda-9/ for more details.
They support two modes - either you can do everything in fp16, or you can do the multiply in fp16 and the accumulation in fp32. From the whitepaper it seems for gaming cards (RTX 20xx), the performance of fp32 accumulate mode has been cut to half compared to quadro cards. AFAIK, 32 bit accumulation mode is more useful for training. For inference doing everything in fp16 is generally sufficient (and that's what we use for lc0).

Re: How good is the RTX 2080 Ti for Leela?

Posted: Thu Sep 20, 2018 10:10 pm
by jkiliani
Ankan posted Lc0 benchmarks for the RTX 2080 Ti on Leela Discord today, since nondisclosure clauses regarding benchmarks of those are no longer in force now that the hardware is released:

Code: Select all

with cudnn 7.3 and 411.63 driver available at nvidia.com 
minibatch-size=512, network id: 11250, go nodes 1000000

             fp32    fp16    
Titan V:     13295   29379
RTX 2080Ti:  12208   32472
So, the (top) RTX card actually outperforms a Titan V for Lc0 when using fp16. Ankan will also post some benchmarks for the RTX 2080 soon.

Re: How good is the RTX 2080 Ti for Leela?

Posted: Thu Sep 20, 2018 10:29 pm
by Robert Pope
How does that compare to a 1080 or 1080 ti?