This is not really correct.cucumber wrote: ↑Sun Oct 21, 2018 7:57 pmTensor cores as is are just matmul ASICs. Matrix multiplication makes up a large part of convolutions, but it is totally possible to get even more application specific, should you want another large jump up. Currently, matmul ASICS are limited by data movement, which puts an upper bound on latency, which is a big Leela killer. That's probably what will be benefiting Leela the most. Whether or not that will be enough to fix Leela in calculation heavy endgames is open for debate.
Tensor cores are 4x4 direct matrix multiplication.
In most of the cases inference of CNNs convolutions are 3x3 realized using Vinograd and this implementation is faster when using regular (CUDA) cores then using 4x4 direct matrix multiplication in Tensor cores. In addition you waste a full 4x4 Tensor core for single 3x3 convolution meaning you have roughly 30% efficiency. So, much lower efficiency, plus slower operation yields very little additional benefit of Tensor cores compared to only CUDA cores.