Binary FEN

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Binary FEN

Post by jwes »

There has been discussion of using FEN to store positions in databases. It seems to me that a binary format could be more space efficient and at least as easy to parse. The format I devised is:
64 bit bitmap where each 1 bit represents an occupied square or the ep square if there is one.
32 4 bit chunks which define the contents of each occupied square in order.
The 16 possible 4 bit chunks are:

Code: Select all

0 WP                           8 BP
1 WN                           9 BN
2 WB                          10 BB
3 WR that can castle          11 BR that can castle
4 WR that cannot castle       12 BR that cannot castle
5 WQ                          13 BQ
6 WK wtm                      14 BK
7 WK btm                      15 ep square
This would allow any legal position to be encoded in 24 bytes unless repetition count, 50 move count, or move no. are needed. This is less than half the average epd length and perhaps 25% more than the theroretical minimum.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Binary FEN

Post by Dann Corbit »

192 bits is a pretty good savings.

Alternative approaches:
https://groups.google.com/forum/#!topic ... mvI0ePH2kI

https://codegolf.stackexchange.com/ques ... ompression

http://stackoverflow.com/questions/1831 ... out-a-game

http://mathforum.org/kb/message.jspa?messageID=364071

http://www.doiserbia.nb.rs/img/doi/0354 ... 00011V.pdf

There was a big discussion on CCC some time ago.
I seem to recall that Uri Blass had the smallest representation around 160 bits, but I might not be remembering correctly.
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Binary FEN

Post by hgm »

Variable-length (Huffman) encoding usually gives a very compact representation. Applied to Chess this would encode empty squares by 1 bit (0), Pawns by 3 (100 and 110), N, B, R by 5 (101xx, 111xx, xx != 11) and K, Q by 6 bits (10111x and 11111x). For a fully populated board this would require 164 bits, and then you can add some for castling and e.p. rights, and side-to-move. (E.g. 4 castling, 1 stm and 3 for the e.p. file.)

Positions with excessive promotion would require more bits, but usually you would not want to store such positions.

This does not exploit the facts that you can only have a single King each, and that Pawns cannot be on 1st and 8th rank. As the Kings would be encoded by 6 bit, you might as well encode the squares they are on, which also requires 6 bits. The encoding of the remaining 62 squares then would not have to distinguish between K and Q, so that the Q code could be 5 bits, saving you 2 bits. Pawn codes on the back ranks could be used to indicate Rooks that can castle, so no separate castling-rights bits are needed.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Binary FEN

Post by stegemma »

jwes wrote:There has been discussion of using FEN to store positions in databases. It seems to me that a binary format could be more space efficient and at least as easy to parse. The format I devised is:
64 bit bitmap where each 1 bit represents an occupied square or the ep square if there is one.
32 4 bit chunks which define the contents of each occupied square in order.
The 16 possible 4 bit chunks are:

Code: Select all

0 WP                           8 BP
1 WN                           9 BN
2 WB                          10 BB
3 WR that can castle          11 BR that can castle
4 WR that cannot castle       12 BR that cannot castle
5 WQ                          13 BQ
6 WK wtm                      14 BK
7 WK btm                      15 ep square
This would allow any legal position to be encoded in 24 bytes unless repetition count, 50 move count, or move no. are needed. This is less than half the average epd length and perhaps 25% more than the theroretical minimum.
For standard chess, it could be a little better:

0: empty square
1: WP
2: WN
3: WB
4: WR
5: WQ
6: WK
7: enp square
8: free for future use
9: BP
A: BN
B: BB
C: BR
D: BQ
E: BK
F: bit

To signal a never moved rook or a never moved king, just put the WP/BP on the same square! Because a pawn never could be in the 1/8 row, it can easily replaced by the rook or the king, knowing that castling is possible for those piece.

The F:bit code should be mixed with empty square code. Anytime you find 0/F you know that this is an empty square but you shift an unsigned 32 bit integer and add the bit 0/1. This way, you have a free 32 bit flags (and more) still using 4 bit x 64 squares. Those bits could be used to store repetition count and other information.

Of course it could be done even better, this is just an hint for a possible binary format in fixed length, not so hard to encode.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Forty bytes

Post by sje »

Forty bytes:

Code: Select all

//// Symbolic: A portable multithreaded bitboard chessplaying program
//
// Copyright (C) 2015 by S. J. Edwards  All rights reserved

#ifndef IncludedCompact
#define IncludedCompact

class CpFSS
{
public:
  CpFSS(void) {Reset();}
  ~CpFSS(void) {}

  void Reset(void);

  void LoadFromFSS(const FSS& fss);
  void StoreToFSS(FSS& fss) const;

  Color GetGood(void) const {return (Color) good;}
  Color GetEvil(void) const {return (Color) evil;}
  Cabs  GetCabs(void) const {return (Cabs)  cabs;}
  Sq    GetEpSq(void) const {return (Sq)    epsq;}
  ui    GetHmvc(void) const {return (ui)    hmvc;}
  ui    GetFmvn(void) const {return (ui)    fmvn;}

  void PutGood(const Color color) {good = (si8)  color;}
  void PutEvil(const Color color) {evil = (si8)  color;}
  void PutCabs(const Cabs value)  {cabs = (Cabs) value;}
  void PutEpSq(const Sq sq)       {epsq = (Sq)   sq;}
  void PutHmvc(const ui value)    {hmvc = (ui16) value;}
  void PutFmvn(const ui value)    {fmvn = (ui16) value;}

  bool Read(std::ifstream * const ifsptr);
  bool Write(std::ofstream * const ofsptr) const;

private:
  friend class CpFEN;

  si8  good; // Active color
  si8  evil; // Passive color
  ui8  cabs; // Castling availability bits
  si8  epsq; // En passant target square (may be nil)
  ui16 hmvc; // Half move counter
  ui16 fmvn; // Full move number
};

#define CpBoardLen (SqLen / 2)

class CpBoard
{
public:
  CpBoard(void) {Reset();}
  ~CpBoard(void) {}

  void Reset(void);

  void LoadFromBoard(const Board& board);
  void StoreToBoard(Board& board) const;

  Man GetMan(const Sq sq) const;
  void PutMan(const Sq sq, const Man man);

  bool Read(std::ifstream * const ifsptr);
  bool Write(std::ofstream * const ofsptr) const;

private:
  friend class CpFEN;

  void ResetSq(const Sq sq) {PutMan(sq, ManVacant);}

  ui8 bvec[CpBoardLen];
};

class CpFEN
{
public:
  CpFEN(void) {Reset();}
  ~CpFEN(void) {}

  void Reset(void);

  void LoadFromPosition(const Position& position);
  void StoreToPosition(Position& position) const;

  bool Read(std::ifstream * const ifsptr);
  bool Write(std::ofstream * const ofsptr) const;

private:
  friend class PerftRec;

  CpFSS   cpfss;
  CpBoard cpboard;
};

#endif
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: Binary FEN

Post by mvk »

hgm wrote:Pawn codes on the back ranks could be used to indicate Rooks that can castle, so no separate castling-rights bits are needed.
Chopping of the last queen bit is a clever trick!

You can make each castling bit optional, based on the condition that the corresponding king and rook are on the right square. Similar idea for the en passant status.

And if it is for storing in a sorted file, you can use 8+8 bits to signal which files and ranks have differences with the previous board, and then only encode the squares on the intersections of those. You can still have fast access by skipping this step at the beginning of every block. This gets you down to 75 bits or lower on average.
[Account deleted]
ibid
Posts: 89
Joined: Mon Jun 13, 2011 12:09 pm

Re: Binary FEN

Post by ibid »

jwes wrote:There has been discussion of using FEN to store positions in databases. It seems to me that a binary format could be more space efficient and at least as easy to parse. The format I devised is:
64 bit bitmap where each 1 bit represents an occupied square or the ep square if there is one.
32 4 bit chunks which define the contents of each occupied square in order.
The 16 possible 4 bit chunks are:

Code: Select all

0 WP                           8 BP
1 WN                           9 BN
2 WB                          10 BB
3 WR that can castle          11 BR that can castle
4 WR that cannot castle       12 BR that cannot castle
5 WQ                          13 BQ
6 WK wtm                      14 BK
7 WK btm                      15 ep square
This would allow any legal position to be encoded in 24 bytes unless repetition count, 50 move count, or move no. are needed. This is less than half the average epd length and perhaps 25% more than the theroretical minimum.
I use a similar scheme for learning databases and the unique position generator used for deep perfts. I use the regular 12 pieces, plus pawn-that-can-be-captured-ep, rook-that-can-castle, and king-whose-side-is-to-move.

Speed is an issue generating these things though and it has crossed my mind that the PEXP/PDEP insrtructions from recent intels might be able to produce a 192 bit position much faster, assuming you have appropriate bitboards handy (I haven't really thought it through in detail, I don't own such a cpu.)

For example, store the 64 occupied bitboard. Now create four 32-bit "reduced bitboards" by using PEXP, occupied, and various other bitboards.

The first one would probably be white_occupied. Next one take all rooks and pawns (of both colors), OR them, and use PEXP. Third could be rooks, bishops and queens. Fourth queens and knights. So if you were to take corresponding bits from each of the "reduced bitboards" you would end up with a table something like:

Code: Select all

0000  black king
0001  black knight
0010  black bishop
0011  black queen
0100  black pawn
0101
0110  black rook
0111
1000  white king
1001  white knight
1010  white bishop
1011  white queen
1100  white pawn
1101
1110  white rook
1111
Hope I didn't screw that up, did it off the top of my head.

You could then manually set the last bit for rooks which can castle and a pawn that can be captured ep (which would use the unused codes above). Still needed is a quick way to represent which color is to move.
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Binary FEN

Post by jwes »

stegemma wrote:
jwes wrote:There has been discussion of using FEN to store positions in databases. It seems to me that a binary format could be more space efficient and at least as easy to parse. The format I devised is:
64 bit bitmap where each 1 bit represents an occupied square or the ep square if there is one.
32 4 bit chunks which define the contents of each occupied square in order.
The 16 possible 4 bit chunks are:

Code: Select all

0 WP                           8 BP
1 WN                           9 BN
2 WB                          10 BB
3 WR that can castle          11 BR that can castle
4 WR that cannot castle       12 BR that cannot castle
5 WQ                          13 BQ
6 WK wtm                      14 BK
7 WK btm                      15 ep square
This would allow any legal position to be encoded in 24 bytes unless repetition count, 50 move count, or move no. are needed. This is less than half the average epd length and perhaps 25% more than the theroretical minimum.
For standard chess, it could be a little better:

0: empty square
1: WP
2: WN
3: WB
4: WR
5: WQ
6: WK
7: enp square
8: free for future use
9: BP
A: BN
B: BB
C: BR
D: BQ
E: BK
F: bit

To signal a never moved rook or a never moved king, just put the WP/BP on the same square! Because a pawn never could be in the 1/8 row, it can easily replaced by the rook or the king, knowing that castling is possible for those piece.
This is clever.
The F:bit code should be mixed with empty square code. Anytime you find 0/F you know that this is an empty square but you shift an unsigned 32 bit integer and add the bit 0/1. This way, you have a free 32 bit flags (and more) still using 4 bit x 64 squares. Those bits could be used to store repetition count and other information.
I don't see this. My idea was to use 4 bit x 32 pieces.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Binary FEN

Post by stegemma »

jwes wrote:[...]
I don't see this. My idea was to use 4 bit x 32 pieces.
In effect you use (4x32 + 64) bits = 192 bits (still better than 256, of course).
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Binary FEN - using permutation

Post by stegemma »

An idea to explore could be to use a permutation index, to select the position. We can use the information about starting piece and build a string of piece types with their quantity:

Code: Select all

K 1
Q 1
R 2
B 2
N 2
k 1
q 1
r 2
b 2
n 2
_ 32
Now we can compute the permutation that could occurs within the full set:

Code: Select all


n = 1+1+2+...+2+32 = 64 (of course!)
p = 64!/(1!*1!*2!*....*2!*32!) = 1.495E48

This is the number of all pseudo-legal positions when no captures has occurred. Any position can be identified by its index in the set of permutations and this index requires:

Code: Select all

bits = log2(p)+1 = 161
This means that we can store such kind of positions with no more than 161 bits. Of course there must be a function that converts the FEN in the index in the permutation set and a sorting scheme should be defined (we cannot store such a huge set!). With captures, the number of spaces raises and the number of permutations diminishes, the same occurs with promotions (that requires captures).

And now? This is just an idea that maybe can be used as part of a more smart and compact system.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com