FEN compression

lucasart · Post by **lucasart** » Wed Mar 17, 2021 4:07 am

mar wrote: ↑Wed Mar 17, 2021 2:31 am I don't know about PDEP/PEXT but from the docs it seems you actually want pext for compression and pdep for decompression?
Nice idea - the point is the trainer compresses FENs this way in memory I suppose.
Where do you handle (piece) color? I guess you could easily encode say black pieces in a similar way of what you already do.

Good points. Forgot color and inverted pdep/pext. For color, we just need cnt(occupied) bits to store pext(white, occupied) (and deduce black as occupied & ~white).

Corrected version

Code: Select all

itemname                   | bits           | definition
--------------------------------------------|-----------
occupied                   | 64             |                  
pext(white,occupied)       | cnt(occupied)  | 
black = occupied & ~white  | 0              |
pext(pawns, occupied)      | cnt(occupied)  | 
pext(knights, remaining)   | cnt(remaining) | remaining = occupied & ~pawns
pext(bishops, remaining)   | cnt(remaining) | remaining &= ~bishops
pext(rooks, remaining)     | cnt(remaining) | remaining &= ~rooks        
pext(queens, remaining)    | cnt(remaining) | remaining &= ~queens
kings = remaining          | 0              |
turn                       | 1              |
ep                         | 1              |
pext(epSquare, candidates) | cnt(candidates)| candidates = ((white & pawns & rank4) << 8) | ((black & pawns & rank6) >> 8)
pext(castleRooks, rooks)   | cnt(rooks)     | castleRooks = rooks with associated castling rights (any color)
rule50                     | 7              |
fullMove                   | 10             | useless: conveys no position information

lucasart · Post by **lucasart** » Wed Mar 17, 2021 5:04 am

AndrewGrant wrote: ↑Wed Mar 17, 2021 2:39 am Not nearly as good as yours, although trivial to decompress and matches my internal use during training. Load times are pretty fast, to the point where speeding it up is a non-factor for me, but I imagine your scheme would be faster, trading off disk reads for decompression.

Code: Select all

Occupancy     : 8-bytes
Evaluation    : 2-bytes
Turn          : 1-bytes
White King Sq : 1-bytes
Black King Sq : 1-bytes
Piece Count   : 1-bytes
Packed Pieces : 1-bytes per two pieces (16 max, 12 average)

= Average of 26-bytes, max of 30-bytes.

Code: Select all

    while (fgets(line, 256, fin) != NULL) {

        boardFromFEN(board, line, 0);

        char *tail   = strstr(line, "] ");
        int16_t eval = atoi(tail + strlen("] "));
        int8_t turn  = board->turn;

        uint64_t white  = board->colours[WHITE];
        uint64_t black  = board->colours[BLACK];
        uint64_t pieces = white | black;

        uint8_t count = popcount(pieces);
        uint8_t wksq  = getlsb(white & board->pieces[KING]);
        uint8_t bksq  = getlsb(black & board->pieces[KING]);

        uint8_t types[32] = {0};
        uint8_t packed[16] = {0};

        fwrite(&pieces, sizeof(uint64_t), 1, fout);
        fwrite(&eval,   sizeof(int16_t ), 1, fout);
        fwrite(&turn,   sizeof(uint8_t ), 1, fout);
        fwrite(&wksq,   sizeof(uint8_t ), 1, fout);
        fwrite(&bksq,   sizeof(uint8_t ), 1, fout);
        fwrite(&count,  sizeof(uint8_t ), 1, fout);

        for (int i = 0; pieces; i++) {
            int sq = poplsb(&pieces);
            types[i] = encode_piece(board->squares[sq]);
        }

        for (int i = 0; i < 16; i++)
            packed[i] = pack_pieces(types[i*2], types[i*2+1]);

        fwrite(packed, sizeof(uint8_t), (count + 1) / 2, fout);
    }

Guess I'll add that I have Ryzens, so PEXT/PDEP are kinda no-gos

This is already better, and simpler, than the one from NNUE folks. And yes, pext/pdep are a massive pain to compute in software. Given that Intel is being obsoleted by AMD, and x86 is also being obsoleted by ARM, building the future on pext/pdep is probably not such a good idea…

Dann Corbit · Post by **Dann Corbit** » Wed Mar 17, 2021 5:58 am

I believe Uri Blass wrote a FEN compressor that squeezes a fen below 165 bits

lucasart · Post by **lucasart** » Wed Mar 17, 2021 6:37 am

Dann Corbit wrote: ↑Wed Mar 17, 2021 5:58 am I believe Uri Blass wrote a FEN compressor that squeezes a fen below 165 bits

I ran the experience with my pext algorithm, playing 100 games (demolito vs. itself at depth=8), and computing the average compressed size for all encountered positions. I ran it twice: with and without adjudication (no adjudication is favorable, of course, as games end in long sequence of positions that contain little information with few pieces):
* no adjudication: avg = 138 bits
* resign (3 moves 700bp) + draw (8 moves 10cp): avg = 142 bits

Beat that

PS: In fact, it's even 10 bits less than stated above, once you remove the fullMove, which is totally useless (it's just an annotation, contains zero position information relevant to tuning an eval for example).

AndrewGrant · Post by **AndrewGrant** » Wed Mar 17, 2021 10:15 am

lucasart wrote: ↑Wed Mar 17, 2021 5:04 am
AndrewGrant wrote: ↑Wed Mar 17, 2021 2:39 am
Code: Select all
Occupancy     : 8-bytes
Evaluation    : 2-bytes
Turn          : 1-bytes
White King Sq : 1-bytes
Black King Sq : 1-bytes
Piece Count   : 1-bytes
Packed Pieces : 1-bytes per two pieces (16 max, 12 average)

= Average of 26-bytes, max of 30-bytes.
This is already better, and simpler, than the one from NNUE folks. And yes, pext/pdep are a massive pain to compute in software. Given that Intel is being obsoleted by AMD, and x86 is also being obsoleted by ARM, building the future on pext/pdep is probably not such a good idea…

Really? I looked at that file but its a few thousand lines and it was not immediately clear where to look for the compression scheme.
Do you know off hand if the Stockfish trainer continually reloads the data back into memory, or if it is a one-time process?

lucasart · Post by **lucasart** » Wed Mar 17, 2021 2:11 pm

Actually, it can be improved even further, because the candidate squares for en-passant are only of the opposite(turn).

Code: Select all

itemname                   | bits           | definition
--------------------------------------------|-----------
occupied                   | 64             |                  
pext(white,occupied)       | cnt(occupied)  | 
black = occupied & ~white  | 0              |
pext(pawns, occupied)      | cnt(occupied)  | 
pext(knights, remaining)   | cnt(remaining) | remaining = occupied & ~pawns
pext(bishops, remaining)   | cnt(remaining) | remaining &= ~bishops
pext(rooks, remaining)     | cnt(remaining) | remaining &= ~rooks        
pext(queens, remaining)    | cnt(remaining) | remaining &= ~queens
kings = remaining          | 0              |
turn                       | 1              |
ep                         | 1              |
pext(epSquare, candidates) | cnt(candidates)| candidates = turn == black ? ((white & pawns & rank4) >> 8) : ((black & pawns & rank6) << 8)
pext(castleRooks, rooks)   | cnt(rooks)     | castleRooks = rooks with associated castling rights (any color)
rule50                     | 7              |
fullMove                   | 10             | useless: conveys no position information

This will probably average 1 bit lower, or 124 bits with adjudication and 120 without, once you exclude the non-position part (rule50, fullMove).

gladius · Post by **gladius** » Wed Mar 17, 2021 4:14 pm

AndrewGrant wrote: ↑Wed Mar 17, 2021 10:15 am
lucasart wrote: ↑Wed Mar 17, 2021 5:04 am
AndrewGrant wrote: ↑Wed Mar 17, 2021 2:39 am
Code: Select all
Occupancy     : 8-bytes
Evaluation    : 2-bytes
Turn          : 1-bytes
White King Sq : 1-bytes
Black King Sq : 1-bytes
Piece Count   : 1-bytes
Packed Pieces : 1-bytes per two pieces (16 max, 12 average)

= Average of 26-bytes, max of 30-bytes.
This is already better, and simpler, than the one from NNUE folks. And yes, pext/pdep are a massive pain to compute in software. Given that Intel is being obsoleted by AMD, and x86 is also being obsoleted by ARM, building the future on pext/pdep is probably not such a good idea…
Really? I looked at that file but its a few thousand lines and it was not immediately clear where to look for the compression scheme.
Do you know off hand if the Stockfish trainer continually reloads the data back into memory, or if it is a one-time process?

It continually loads the data from disk. Each batch is thrown away after being used.

odomobo · Post by **odomobo** » Wed Mar 17, 2021 9:10 pm

PSA that with the Ryzen 5000 series, AMD apparently fixed PDEP/PEXT

https://www.anandtech.com/show/16214/am ... x-tested/6

Dann Corbit · Post by **Dann Corbit** » Thu Mar 18, 2021 2:54 am

Specifically:

Code: Select all

Instruction	Zen2	                                  Zen3
PDEP/PEXT	300 cycle latency 250 cycles per clock	3 cycle latency 1 per clock

lucasart · Post by **lucasart** » Thu Mar 18, 2021 4:31 am

Dann Corbit wrote: ↑Thu Mar 18, 2021 2:54 am Specifically:
Code: Select all
Instruction	Zen2	                                  Zen3
PDEP/PEXT	300 cycle latency 250 cycles per clock	3 cycle latency 1 per clock

Definitely my next CPU when I finish frying my Intel i7.

FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression