The net is 30 blocks x 384 filters J92-190, so the default with my GPU is cuDNN FP16 with custom_winograd=true enabled. The benchmarks are here:
cuDNN FP16 (default)
lc0_v263rc1.exe benchmark --minibatch-size=240
Total time (ms) : 342097
Nodes searched : 2372484
Nodes/second : 6935
CUDA FP16
lc0_v263rc1.exe benchmark --backend=cuda-fp16 --minibatch-size=240
Total time (ms) : 342239
Nodes searched : 2122476
Nodes/second : 6202
DX12
lc0_v263rc1_dx.exe benchmark --minibatch-size=240
Total time (ms) : 341409
Nodes searched : 3077528
Nodes/second : 9014
To remark the excellent result of DX12 backend, which seems by NPS vastly superior to the other two. A glitch occurred with this command line:
lc0_v263rc1.exe benchmark --backend=cudnn-fp16 --minibatch-size=240
which sometimes exits with this error message:
Position: 1/34 rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Unhandled exception in worker thread: CUDA error: an illegal memory access was encountered (c:\projects\lc0\src\neural\cuda\network_cudnn.cc:789)
Unhandled exception in worker thread:
=============================
To check also the strength:
300 games at 15s + 0.25s, RR
Code: Select all
Rank Name Elo +/- Games Score Draw
1 DX12 26 20 200 53.8% 55.5%
2 cuda_fp16 -3 20 200 49.5% 53.0%
3 cudnn_fp16 -23 21 200 46.8% 50.5%
Finished match