AB search with NN on GPU...
Posted: Thu Aug 13, 2020 10:11 am
Maybe it is of interest for someone, I tried, but could not figure out a way
how to do it.
https://eta-chess.app26.de/
In short, the host-device-latencies, aka. kernel-launch-overhead, are currently
in the range of 5 microseconds up to 100 of microseconds, you end up up with
max. 200K kernel calls per second. This is primary not caused by the PCIe
connection (maybe 10s of ns?) but (speculation) by the embedded CPU controller
on GPU who launches the kernels. So you need to couple tasks to batches to be
executed in one run, not that conform with the serial nature of AlphaBeta.
Maybe upcoming architectures will have lower latencies, dunno.
Another path could be to drop the search part completely, encode all in another
kind of mega NN structure and perform only a depth 1 search for evaluation,
maybe with multiple kind of NNs as ID loop replacement...
--
Srdja
how to do it.
https://eta-chess.app26.de/
In short, the host-device-latencies, aka. kernel-launch-overhead, are currently
in the range of 5 microseconds up to 100 of microseconds, you end up up with
max. 200K kernel calls per second. This is primary not caused by the PCIe
connection (maybe 10s of ns?) but (speculation) by the embedded CPU controller
on GPU who launches the kernels. So you need to couple tasks to batches to be
executed in one run, not that conform with the serial nature of AlphaBeta.
Maybe upcoming architectures will have lower latencies, dunno.
Another path could be to drop the search part completely, encode all in another
kind of mega NN structure and perform only a depth 1 search for evaluation,
maybe with multiple kind of NNs as ID loop replacement...
--
Srdja