mar wrote: ↑Wed Mar 17, 2021 2:31 am
I don't know about PDEP/PEXT but from the docs it seems you actually want pext for compression and pdep for decompression?
Nice idea - the point is the trainer compresses FENs this way in memory I suppose.
Where do you handle (piece) color? I guess you could easily encode say black pieces in a similar way of what you already do.
Good points. Forgot color and inverted pdep/pext. For color, we just need cnt(occupied) bits to store pext(white, occupied) (and deduce black as occupied & ~white).
AndrewGrant wrote: ↑Wed Mar 17, 2021 2:39 am
Not nearly as good as yours, although trivial to decompress and matches my internal use during training. Load times are pretty fast, to the point where speeding it up is a non-factor for me, but I imagine your scheme would be faster, trading off disk reads for decompression.
Occupancy : 8-bytes
Evaluation : 2-bytes
Turn : 1-bytes
White King Sq : 1-bytes
Black King Sq : 1-bytes
Piece Count : 1-bytes
Packed Pieces : 1-bytes per two pieces (16 max, 12 average)
= Average of 26-bytes, max of 30-bytes.
Guess I'll add that I have Ryzens, so PEXT/PDEP are kinda no-gos
This is already better, and simpler, than the one from NNUE folks. And yes, pext/pdep are a massive pain to compute in software. Given that Intel is being obsoleted by AMD, and x86 is also being obsoleted by ARM, building the future on pext/pdep is probably not such a good idea…
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
I believe Uri Blass wrote a FEN compressor that squeezes a fen below 165 bits
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit wrote: ↑Wed Mar 17, 2021 5:58 am
I believe Uri Blass wrote a FEN compressor that squeezes a fen below 165 bits
I ran the experience with my pext algorithm, playing 100 games (demolito vs. itself at depth=8), and computing the average compressed size for all encountered positions. I ran it twice: with and without adjudication (no adjudication is favorable, of course, as games end in long sequence of positions that contain little information with few pieces):
* no adjudication: avg = 138 bits
* resign (3 moves 700bp) + draw (8 moves 10cp): avg = 142 bits
Beat that
PS: In fact, it's even 10 bits less than stated above, once you remove the fullMove, which is totally useless (it's just an annotation, contains zero position information relevant to tuning an eval for example).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Occupancy : 8-bytes
Evaluation : 2-bytes
Turn : 1-bytes
White King Sq : 1-bytes
Black King Sq : 1-bytes
Piece Count : 1-bytes
Packed Pieces : 1-bytes per two pieces (16 max, 12 average)
= Average of 26-bytes, max of 30-bytes.
This is already better, and simpler, than the one from NNUE folks. And yes, pext/pdep are a massive pain to compute in software. Given that Intel is being obsoleted by AMD, and x86 is also being obsoleted by ARM, building the future on pext/pdep is probably not such a good idea…
Really? I looked at that file but its a few thousand lines and it was not immediately clear where to look for the compression scheme.
Do you know off hand if the Stockfish trainer continually reloads the data back into memory, or if it is a one-time process?
Occupancy : 8-bytes
Evaluation : 2-bytes
Turn : 1-bytes
White King Sq : 1-bytes
Black King Sq : 1-bytes
Piece Count : 1-bytes
Packed Pieces : 1-bytes per two pieces (16 max, 12 average)
= Average of 26-bytes, max of 30-bytes.
This is already better, and simpler, than the one from NNUE folks. And yes, pext/pdep are a massive pain to compute in software. Given that Intel is being obsoleted by AMD, and x86 is also being obsoleted by ARM, building the future on pext/pdep is probably not such a good idea…
Really? I looked at that file but its a few thousand lines and it was not immediately clear where to look for the compression scheme.
Do you know off hand if the Stockfish trainer continually reloads the data back into memory, or if it is a one-time process?
It continually loads the data from disk. Each batch is thrown away after being used.
Instruction Zen2 Zen3
PDEP/PEXT 300 cycle latency 250 cycles per clock 3 cycle latency 1 per clock
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.