Binary FEN

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Binary FEN - using permutation

Post by hgm »

But the real problem is representing positions after 'excessive' promotion. A single PxP capture is enough to unblock the path to promotion of 3 Pawns. So in principle 12 Pawns could promote without any of the primordial pieces disappearing. (All 16 could promote at the expense of 8 primordial pieces, but that gives far less material compositions, although it could give more of any single piece type, like 20 Knights.)

If it was just a matter of encoding positions with not too many promotions, you could exploit reachability of Pawn structures to compress positions into fewer bits. E.g. use 0 for empty, 10c for Pawns (c = color bit), 11xxc for N, B, R, Q (xx determining the type), and start with two groups of 6 bits for the square numbers of the Kings (which are then skipped in the raster scan).

Then make use of the fact that Pawns cannot occur on 1st and 8th rank, by encoding these ranks as leading squares in any file, and letting codes 10nnn encode a pair of opprositely colored Pawns along that file in the natural order (which can occur in 15 conformations). The nnn encode 3 bits of the conformation number, while the 4th bit would be implied by whether the code occurred as 1st or 2nd item on the encoding of the file.

The initial position would then need 8x5 (for P-pairs) + 12 (Kings) + 14x5 (NBRQ) + 32 (empty) = 154 bits. Capture of a piece by a Pawn turns P-pair + piece (5+5) into 2 lone P + empty (2x3+1), i.e. reduces the bit requirement. Similarly capturing a Pawn with a piece turns P-pair (5) into Pawn + empty (3+1), also a reduction. Finally breaking up 2 pairs by PxP turns 2 P-pair (2x5) into 3 lone P + empty (3x3+1), which takes the same number of bits. So as long as there are no promotions before any additional pieces are captured, 154 bits would be enough.

This does not account for stm, castling and e.p. rights.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Binary FEN - using permutation

Post by stegemma »

hgm wrote:But the real problem is representing positions after 'excessive' promotion. A single PxP capture is enough to unblock the path to promotion of 3 Pawns. So in principle 12 Pawns could promote without any of the primordial pieces disappearing. (All 16 could promote at the expense of 8 primordial pieces, but that gives far less material compositions, although it could give more of any single piece type, like 20 Knights.)

If it was just a matter of encoding positions with not too many promotions, you could exploit reachability of Pawn structures to compress positions into fewer bits. E.g. use 0 for empty, 10c for Pawns (c = color bit), 11xxc for N, B, R, Q (xx determining the type), and start with two groups of 6 bits for the square numbers of the Kings (which are then skipped in the raster scan).

Then make use of the fact that Pawns cannot occur on 1st and 8th rank, by encoding these ranks as leading squares in any file, and letting codes 10nnn encode a pair of opprositely colored Pawns along that file in the natural order (which can occur in 15 conformations). The nnn encode 3 bits of the conformation number, while the 4th bit would be implied by whether the code occurred as 1st or 2nd item on the encoding of the file.

The initial position would then need 8x5 (for P-pairs) + 12 (Kings) + 14x5 (NBRQ) + 32 (empty) = 154 bits. Capture of a piece by a Pawn turns P-pair + piece (5+5) into 2 lone P + empty (2x3+1), i.e. reduces the bit requirement. Similarly capturing a Pawn with a piece turns P-pair (5) into Pawn + empty (3+1), also a reduction. Finally breaking up 2 pairs by PxP turns 2 P-pair (2x5) into 3 lone P + empty (3x3+1), which takes the same number of bits. So as long as there are no promotions before any additional pieces are captured, 154 bits would be enough.

This does not account for stm, castling and e.p. rights.
If we try encoding using permutation, the promoted pawns could be encoded using a starting piece types set or something similar. For sample, we could encode:

(K)QQRRBBPPPPPP

with an extra byte:

00: no pieces of new type
01: one piece same type
10: one piece of new type
11: two pieces of new type

(01)[11][11][11][00]{110}

The piece types set is always in the same order, so we have only to change type when the couples of bit changes. When we finally reach the pawn type, we only have to store the real count (3 bits). Bits for king can be omitted. of course we can get similar result using 3 bit per any piece type but the king.

This just to say that we don't care about promotions, just we have to add some bit.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Binary FEN - using permutation

Post by hgm »

If I understand you correctly, this would require an enormous number of extra bits in the worst case, where no pieces are captured, and 12 Pawns promote to R, B, or N: you would need 12 extra 01 sequences to signal the presence of the super-numerary pieces, i.e. 24 extra bits. Note that the number of permutations also increases if the number of pieces of each type gets more equal. E.g. if each side has 2Q, 3R, 2B, 3N and 3P.

To handle this case with a Huffman-type code, would e more efficient. Unfortunately there are 5 pieces of 2 colors, so if empty squares remain a single 0 bit, you only 3 of the 5 types can get a 4-bit code (occupied, color, and 2 type bits), while two types have to be assigned 5-bit codes. Because any of the piece types could become the most-abundant one, (and actually very abundant), this gives a quite poor worst case.

One way to avoid that is to use a flexibleencoding, where depending on the case you assign the long codes to the least-abundant pieces, and add some extra bits to identify the case. With 2 extra bits you can choose between 4 different encodings, e.g. those that assign long codes either to Q+R, B+N, B+P or N+P. It turns out there will never have to be more than 10 pieces with long codes in that case (while otherwise that number could run upto 20 if there have only been PxP captures).
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Binary FEN - using permutation

Post by stegemma »

hgm wrote:If I understand you correctly, this would require an enormous number of extra bits in the worst case, where no pieces are captured, and 12 Pawns promote to R, B, or N: you would need 12 extra 01 sequences to signal the presence of the super-numerary pieces, i.e. 24 extra bits. Note that the number of permutations also increases if the number of pieces of each type gets more equal. E.g. if each side has 2Q, 3R, 2B, 3N and 3P.

To handle this case with a Huffman-type code, would e more efficient. Unfortunately there are 5 pieces of 2 colors, so if empty squares remain a single 0 bit, you only 3 of the 5 types can get a 4-bit code (occupied, color, and 2 type bits), while two types have to be assigned 5-bit codes. Because any of the piece types could become the most-abundant one, (and actually very abundant), this gives a quite poor worst case.

One way to avoid that is to use a flexibleencoding, where depending on the case you assign the long codes to the least-abundant pieces, and add some extra bits to identify the case. With 2 extra bits you can choose between 4 different encodings, e.g. those that assign long codes either to Q+R, B+N, B+P or N+P. It turns out there will never have to be more than 10 pieces with long codes in that case (while otherwise that number could run upto 20 if there have only been PxP captures).
As said, this is just an idea that could be used partially in some situation, it is not an efficient encoding on its own. In any case of no-promotions, is not so bad, for sample. Maybe it could becomes an efficient encoding but it would requires a lot of works. In Italy we say "impara l'arte e mettila da parte" ;) (google translator say: "learn the art and put it aside") that means that most you learn and most you have instruments in the future, maybe even if you learn something that seems useless, at the moment. I apply this concept to chess programming too: any strange new idea I would try to explore even if it leads to nothing... and this is why my engines play so strangely ;)

PS: if you have a fixed set of elements and the permutations number are closest to a power of two then indexing the permutations itself it is the most efficient encoding... i think that it could be proven mathematically ;)
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Binary FEN

Post by stegemma »

jwes wrote:There has been discussion of using FEN to store positions in databases. It seems to me that a binary format could be more space efficient and at least as easy to parse. The format I devised is:
64 bit bitmap where each 1 bit represents an occupied square or the ep square if there is one.
32 4 bit chunks which define the contents of each occupied square in order.
The 16 possible 4 bit chunks are:

Code: Select all

0 WP                           8 BP
1 WN                           9 BN
2 WB                          10 BB
3 WR that can castle          11 BR that can castle
4 WR that cannot castle       12 BR that cannot castle
5 WQ                          13 BQ
6 WK wtm                      14 BK
7 WK btm                      15 ep square
This would allow any legal position to be encoded in 24 bytes unless repetition count, 50 move count, or move no. are needed. This is less than half the average epd length and perhaps 25% more than the theroretical minimum.
Maybe we have to take a step back and think about "why" we need a binary format for FEN. the primary purpose is to exchange positions or to save them to files, for using in books, databases and so on. FEN and PGN are widely used and accepted as standard so a binary format should be the same. I and other we have lose ourself in searching for the most compact way but this is not the requirement, I think, and it is not needed to define a standard. Despite from that, maybe if we would define a binary standard we should define a fixed length one. FEN strings are long and not fixed but they are human readable. A binary format will not be human readable but in some situation is better if it is fixed length. For sample, you can define a fixed array of fixed length strings in C, very easily, and access any one string very fast. The same if the binary FENs were saved to a file, because you can access the Nth FEN just seeking the file pointer where needed.

If we want to define a binary format, it should be smart enough to accept other chess variants and maybe other games too. The format itself should have an "header", that defines the format used, various lines (the binary FENs) and maybe a "footer", so that we can save directly to a file or use it in memory. To provide compactness, then it would be enough to compress with zip or other compression schemes, that's over the needing of the format.

Of course I already have some idea on how to implement this... but maybe is better to know other opinions specially if this idea is interesting or not (for me "yes").
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
whereagles
Posts: 565
Joined: Thu Nov 13, 2014 12:03 pm

Re: Binary FEN

Post by whereagles »

FEN is a very human-friendly notation.

Binary is nice. To reduce size one can alternatively run usual Huffman encoding over the standard FEN.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Binary FEN

Post by stegemma »

whereagles wrote:FEN is a very human-friendly notation.

Binary is nice. To reduce size one can alternatively run usual Huffman encoding over the standard FEN.
Right, so maybe the only reason to use a binary format is for speed.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com