Now I took this Saturday to translate all algos we talked about in the CPU thread to CUDA. Now I am very interested in the performance you get.
If you read this - run this cuda executable and post results here: https://github.com/Gigantua/Chess_Moveg ... ompare.exe
Nvidia only for now.
Results:
Code: Select all
NVIDIA GeForce RTX 3080
Fancy Magic: 6.82 GigaQueens/s
QBB Algo: 58.02 GigaQueens/s
Bob Lookup: 0.77 GigaQueens/s
Kogge Stone: 40.43 GigaQueens/s
Hyperbola Qsc: 17.43 GigaQueens/s
Switch Lookup: 4.40 GigaQueens/s
Slide Arithm: 18.38 GigaQueens/s
Pext Lookup: 16.85 GigaQueens/s
SISSY Lookup: 8.08 GigaQueens/s
Hypercube Alg: 1.31 GigaQueens/s
Now everyone here can see that switching hardware from the usual x64 to some other architecture really changes the usual performance metrics we are used to. PEXT emulated is faster than fancy magic! This is something i did not expect.
The general performance is much much higher than the maximum on 32 Threads with a Ryzen 3cpu which was 10 Gigalookups/s.
I am looking forward to some discussions. All memory accessing algorithms seem to be much slower than the zero table lookups we have.
Sourcecode: https://github.com/Gigantua/Chess_Movegen
Winners: QBB, Kogge Stone
Losers: Fancy Magic, Bob Lookup
CPU discussion: http://www.talkchess.com/forum3/posting ... 7&p=917782