FEN compression

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

AndrewGrant
Posts: 1756
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: FEN compression

Post by AndrewGrant »

lucasart wrote: Tue Apr 13, 2021 2:07 am Are you using this for tuning an HCE (Ethereal) or NN eval (Halogen) ?

Btw, I think you can save 24-bits, without complicating code or using PEXT/PDEP:

Code: Select all

Occupancy     : 8-bytes
Evaluation    : 2-bytes
Turn          : 1-bytes
Packed Pieces : 1-bytes per two pieces (16 max, 12 average)
The point is that a nibble (4-bits) can encode (piece,color) for any piece including the king (6 pieces * 2 colors = 12 values). So you don't need white/black king squares in separate fields. ALso, you don't need to store the piece count, because popcnt(occupancy) tells you how many packed pieces are in the array (for variable length encoding, where you store (popcnt(occupancy)+1)/2 elements instead of 16).
I thought I responded to you, but did not ...

Yes I need to remove the wkingsq and bkingsq. At first there was a sort of reason, but now not so much. Free savings there. And you are right about the popcnt, thats more savings. Will be updating my code shortly. With file sizes so large (100GB for one dataset I have), every literal bit counts. 4GB of savings for my dataset right there. Should improve load times slightly.

This was for Ethereal NNUE / Halogen NN.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: FEN compression

Post by hgm »

The 'Turn' byte seems superfluous too. I did not follow the entire discussion, but with 4 bits per piece you have enough piece-type codes to reserve two codes for the King, and use a different code for the King of the side to move. And you could have two codes for Rook to distinguish Rooks that can castle from those that cannot. Like

0 = Pawn
1 = Knight
2 = Bishop
3 = Rook that cannot castle
4 = Queen
5 = King of the side to move
6 = King of the other side
7 = Rook that can castle or Pawn that can be e.p. captured

This doesn't even exhaust all encoding possibilities, because the Pawn code could be taken to mean different things on 1st and 8th rank.

Note that saving the occupied bitboard is basically the first step towards Huffman encoding, combined with re-ordering the bits. Progressing along those lines, you could follow it by bytes that pack 2 bits per occupied square (max 8, avg 6), to indicate their color, and whether they are Pawn or not. These could then be followed by bytes that pack 2 bits per non-Pawn, to distinguish the piece types N, B, R and K/Q. Finally you would need a few bits to indicate which of the K/Q is the King.
Nevsor
Posts: 3
Joined: Sun Aug 15, 2021 8:35 pm
Full name: Sven Stegemann

Re: FEN compression

Post by Nevsor »

lucasart wrote: Thu Mar 18, 2021 5:36 am I think this algorithm is provably optimal, at least when the metric is the worst case. The proof is basically obvious by just following each step of the algorithm.
I am pretty sure there are better algorithms for the worst case. If we just look at the piece placements for now, the pext format takes

7*cnt(kings) + 7*cnt(queens) + 6*cnt(rooks) + 5*cnt(bishops) + 4*cnt(knights) + 3*cnt(pawns) + 1*cnt(empty)

bits, which for the starting position results in 168 bits.

With a huffman coding you can for example achieve

6*cnt(kings) + 6*cnt(queens) + 5*cnt(rooks) + 5*cnt(bishops) + 5*cnt(knights) + 3*cnt(pawns) + 1*cnt(empty)

bits, which in the starting position results in 164 bits. In the worst case the difference is even greater, because queening a pawn results in a greater increase in bits. Keep in mind that you have to capture a piece before pawns can promote. So for example you can have white's a pawn capture the black's b pawn and then both sides can promote one pawn, which will increase the number of bits by 2*6-3*3+1=4 for huffman coding and by 2*7-3*3+1=6 for pext encoding.

That said, I really like the idea of decompressing multiple pieces in parallel with a single instruction instead of decoding them square by square. If you are willing to slightly complicate the algorithm you can optimize the pext format even further:

There can never be any pawn in the first or last rank => save pext(pawns, occupied & ~rank1 & ~rank8) instead of pext(pawns, occupied). This will save 16 bit in the starting position and is probably most effective in the opening.

There will always be exactly one king of each color => save tzcnt(pext(white_king, occupied)) and tzcnt(pext(black_king, occupied)) and remove pext(queens, remaining). This will take ceil(log2(cnt(occupied)-1)) bits per king, but will save 12+cnt(queens) bits in the following pext compressions.

There will be at most one ep square => save tzcnt(pext(epSquare, candidates)) instead of pext(epSquare, candidates). This will take ceil(log2(cnt(candidates))) bits instead of cnt(candidates) bits. This optimization is only helpful, if cnt(candidates) is greater than 2, which should be quite rare.
lucasart wrote: Wed Mar 17, 2021 1:35 am Is this useful? Absolutely not. But it's fun!
I guess really useful is a format like Stockfish's binpack, which saves compressed moves instead of compressed positions for related positions (positions from the same game). I am not aware, of an application for saving a collection of unrelated FENs, were one of the "established" formats like .bin or .binpack is too slow or needs a better compression ratio, but I'd really like to see how low we can get the bitcount of the average case with this format.
Nevsor
Posts: 3
Joined: Sun Aug 15, 2021 8:35 pm
Full name: Sven Stegemann

Re: FEN compression

Post by Nevsor »

You could also try adding an additional bit that indicates that no casting is possible at all. There are probably many positions were for example both kings have castled or moved and were there are still multiple rooks left. You would save cnt(rooks)-1 bits in those positions, but you would add one bit to positions were castling is still possible. In positions without rooks you don't need to save this bit.
tromp
Posts: 35
Joined: Sun May 02, 2021 10:23 pm
Full name: John Tromp

Re: FEN compression

Post by tromp »

AndrewGrant wrote: Tue Jun 22, 2021 1:39 pm
tromp wrote: Tue Jun 22, 2021 1:30 pm I'm working on a FEN compressor that maps any FEN describing a reachable position into a non-negative number under N=8726713169886222032347729969256422370854716254 (~8.7E45), which amounts to ~19.1 bytes.
Position information includes side to move, en-passant status, castling rights, but no 50-move rule or evaluation information.
This would be of interest. Keep us posted if you can.
As you probably noticed already, the FEN compressor is available at https://github.com/tromp/ChessPositionRanking with the following constraints:
* It doesn't reproduce the last two FEN fields.
* If used to compress multiple FENs in one batch, then they need to be in order of their rank.
* It's slow: on the order of half an hour per batch, both for compression and decompression.

If there's sufficient interest, I could add a program to my repo that does the FEN sorting by rank,
which should be relatively fast.