It uses a lookup of just 768 slots in 6kb- which already makes it attractive since the transposition table congests with lookups from fancy magic which needs 694kb.
In essence it is a optimized galois field movegenerator with support for normal modern cpus. (So everything of the past decade)
Assembly for normal CPU:
Code: Select all
mov r8, qword ptr [rip + Chess_Lookup::Classified::msk@GOTPCREL]
movsxd rax, edi
vmovdqa ymm3, ymmword ptr [rip + .LCPI0_0] # ymm3 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
vmovdqa ymm5, ymmword ptr [rip + .LCPI0_1] # ymm5 = [0,128,64,192,32,160,96,224,16,144,80,208,48,176,112,240,0,128,64,192,32,160,96,224,16,144,80,208,48,176,112,240]
vmovdqa ymm6, ymmword ptr [rip + .LCPI0_2] # ymm6 = [0,8,4,12,2,10,6,14,1,9,5,13,3,11,7,15,0,8,4,12,2,10,6,14,1,9,5,13,3,11,7,15]
mov rcx, qword ptr [rip + Chess_Lookup::Classified::sql@GOTPCREL]
vmovdqa ymm7, ymmword ptr [rip + .LCPI0_3] # ymm7 = [7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8]
vmovq xmm1, rsi
mov rdx, qword ptr [rip + Chess_Lookup::Classified::sqr@GOTPCREL]
shl rax, 5
vpbroadcastq ymm1, xmm1
vmovdqa ymm0, ymmword ptr [r8 + rax]
vpand ymm1, ymm0, ymm1
vpsubq ymm2, ymm1, ymmword ptr [rcx + rax]
vpand ymm4, ymm1, ymm3
vpsrlw ymm1, ymm1, 4
vpand ymm1, ymm1, ymm3
vpshufb ymm4, ymm5, ymm4a
vpshufb ymm1, ymm6, ymm1
vpshufb ymm4, ymm4, ymm7
vpshufb ymm1, ymm1, ymm7
vpor ymm1, ymm1, ymm4
vpsubq ymm1, ymm1, ymmword ptr [rdx + rax]
vpand ymm4, ymm1, ymm3
vpsrlw ymm1, ymm1, 4
vpand ymm1, ymm1, ymm3
vpshufb ymm4, ymm5, ymm4
vpshufb ymm1, ymm6, ymm1
vpshufb ymm3, ymm4, ymm7
vpshufb ymm1, ymm1, ymm7
vpor ymm1, ymm1, ymm3
vpxor ymm1, ymm2, ymm1
vpand ymm0, ymm1, ymm0
vextracti128 xmm1, ymm0, 1
vpor xmm0, xmm0, xmm1
vpshufd xmm1, xmm0, 238 # xmm1 = xmm0[2,3,2,3]
vpor xmm0, xmm0, xmm1
vmovq rax, xmm0
vzeroupper
ret
Code: Select all
mov r8, qword ptr [rip + Chess_Lookup::Classified::msk@GOTPCREL]
movsxd rax, edi
vpbroadcastq ymm3, qword ptr [rip + .LCPI0_0] # ymm3 = [9241421688590303745,9241421688590303745,9241421688590303745,9241421688590303745]
mov rcx, qword ptr [rip + Chess_Lookup::Classified::sql@GOTPCREL]
mov rdx, qword ptr [rip + Chess_Lookup::Classified::sqr@GOTPCREL]
vpbroadcastq ymm1, rsi
shl rax, 5
vmovdqa ymm0, ymmword ptr [r8 + rax]
vpand ymm1, ymm0, ymm1
vpsubq ymm2, ymm1, ymmword ptr [rcx + rax]
vgf2p8affineqb ymm1, ymm1, ymm3, 0
vpsubq ymm1, ymm1, ymmword ptr [rdx + rax]
vgf2p8affineqb ymm1, ymm1, ymm3, 0
vpternlogq ymm1, ymm0, ymm2, 72
vextracti128 xmm0, ymm1, 1
vpor xmm0, xmm1, xmm0
vpshufd xmm1, xmm0, 238 # xmm1 = xmm0[2,3,2,3]
vpor xmm0, xmm0, xmm1
vmovq rax, xmm0
vzeroupper
ret
Here is the result with the znver3 version:
--------------------------------------------------------------------------------------------------------------------------------
AMD Ryzen 9 5950X 16-Core Processor
Million Lookups/s Random Squares, Random Occupation/s:
Name Performance [MQueens/s] Tablesize Dependencies Template Author Reference
SBAMG o^(o-3cbn) 236.461030 576 [4kb] countl_zero, bswap yes Syed Fahad http://www.talkchess.com/forum3/viewtopic.php?t=59845
SBAMG Inline 135.169323 0 [0kb] countl_zero, bswap yes Syed Fahad and Daniel Inführ http://www.talkchess.com/forum3/viewtopic.php?t=59845
GaloisField - AVX512 5.964714 0 [0kb] AVX512F_GFNI no Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtop ... =7&t=81335
Hyperbola Quintessence o^(o-2r) 293.663045 256 [2kb] bswap no Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Hyperbola Quintessence Inline 97.483496 0 [0kb] bswap yes Ryan Mack https://www.chessprogramming.org/Hyperbola_Quintessence
Genetic 8 Ray 44.142536 0 [0kb] bswap no Daniel Inführ (dangi12012) Abstract C++ Syntax Tree Sifter (c) Daniel Infuehr
Bitrotation 41.177125 0 [0kb] ReverseBits no Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtop ... 8&start=20
Unpublished 524.436100 768 [6kb] AVX2 no Daniel Inführ (dangi12012) Unpublished
Binary Neural Network 50.935576 5852 [45kb] pdep_u64, AVX2 no Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtop ... =7&t=79332
Exploding Bitboards 67.071720 768 [6kb] imul64 no Harald Lüßen http://www.open-aurec.com/wbforum/viewt ... 3&start=80
Reference (Switch Lookup) 36.798630 0 [0kb] none yes Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtop ... so#p907362
AVX Branchless Shift 197.760820 0 [0kb] AVX2 no Daniel Inführ (dangi12012) http://www.talkchess.com/forum3/viewtop ... 5&start=60
Pext Emulated 67.034859 107904 [843kb] none no Zach Wegner https://randombit.net/bitbashing/posts/ ... tions.html
Dumb7 Fill 69.428542 0 [0kb] none no Gunnar Andersson https://www.chessprogramming.org/Dumb7Fill
Kogge-Stone 113.308413 0 [0kb] none no Peter M. Kogge, Harold S. Stone https://www.chessprogramming.org/Kogge-Stone_Algorithm
Rotated Bitboards 46.859935 1848 [14kb] none no Robert Hyatt https://www.chessprogramming.org/Rotated_Bitboards
QBBEngine 177.567695 0 [0kb] countr_zero, countl_zero yes Fabio Gobbato https://www.chessprogramming.org/QBBEngine
QBBEngine - Shifted Mask 169.989730 0 [0kb] countr_zero, countl_zero no Fabio Gobbato http://www.talkchess.com/forum3/viewtop ... 90#p924623
Classical Bob-Mike 207.630418 1024 [8kb] countr_zero, countl_zero yes Robert Hyatt and Michael Sherwin https://www.chessprogramming.org/Classical_Approach
Advanced Bob-Mike 213.972092 520 [4kb] countr_zero, countl_zero no Michael Sherwin and Daniel Inführ http://www.talkchess.com/forum3/viewtop ... 50#p924653
Leorik 226.305699 128 [1kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Leorik Inline 114.931146 0 [0kb] countl_zero no Thomas Jahn (lithander) https://github.com/lithander/MinimalChessEngine
Obstruction Difference 241.274217 768 [6kb] countl_zero no Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Obstruction Difference Inline 93.470013 0 [0kb] countl_zero yes Michael Hoffmann http://www.talkchess.com/forum3/viewtopic.php?t=29087
Genetic Obstruction Difference 272.860354 384 [3kb] countl_zero no Daniel Inführ and Michael Hoffmann http://www.talkchess.com/forum3/viewtop ... =7&t=79701
Genetic Obstruction Difference V2 314.512388 768 [6kb] countl_zero no Daniel Inführ http://www.talkchess.com/forum3/viewtop ... =7&t=79701
Slide Arithmetic 254.999044 256 [2kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inführ http://www.talkchess.com/forum3/viewtop ... hm#p914767
Slide Arithmetic Inline 111.597532 0 [0kb] bzhi_u64, blsmsk_u64 no Jakob Progsch and Daniel Inführ http://www.talkchess.com/forum3/viewtop ... Arithm#p91
Kindergarten 448.205051 16640 [130kb] imul64 no Urban Koistinen https://www.chessprogramming.org/Kindergarten_Bitboards
SISSY Bitboards 208.441390 180416 [1409kb] none no Michael Sherwin http://www.talkchess.com/forum3/viewtop ... =7&t=73083
Kindergarten Super SISSY Bitboards 500.132535 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
Fancy Magic BB - Variable shift 400.688650 93376 [729kb] imul64 yes Pradu Kannan https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic 218.656741 6468 [50kb] none no Daniel Inführ tbd
Plain Magic BB 496.907579 295168 [2306kb] imul64 no Lasse Hansen https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift 511.605299 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy
Pext constexpr 797.212944 107904 [843kb] pext_u64 yes Zach Wegner https://www.chessprogramming.org/BMI2#PEXTBitboards
HyperCube 69.920951 107680 [841kb] none yes Daniel Inführ (dangi12012)
--------------------------------------------------------------------------------------------------------------------------------
I probably will publish once I can verify with a Zen4 processor - the AVX512 version looks much jucier in terms of performance and it replaces 20! instructions with a single one that takes 0.5 cylces to execute! So even with the overhead of 20 additional instructions it has the performance above.