Best way to store games

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Best way to store games

Post by hgm »

Pio wrote: Fri Nov 06, 2020 10:23 pmIf you want a failsafe simple way of storing a chess move in a byte is to use an existing correct legal move generator and save the index. To retrieve the move later you can generate all the legal moves from the position with the same generator and pick the move stored at the stored index position.
Note that this will be about 100 times slower, as you have to generate all moves before you can know what the move number means (and then perform that move). While in the other encoding you simply perform the indicated move.

Code: Select all

pieceNr = pieces[moveCode];
step = steps[moveCode];
fromSqr = location[pieceNr];
toSqr = fromSqr + step & 0x77;
board[toSqr] = board[fromSqr];
board[fromSqr] = 0;
location[pieceNr] = toSqr;
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: Best way to store games

Post by Pio »

hgm wrote: Fri Nov 06, 2020 11:23 pm
Pio wrote: Fri Nov 06, 2020 10:23 pmIf you want a failsafe simple way of storing a chess move in a byte is to use an existing correct legal move generator and save the index. To retrieve the move later you can generate all the legal moves from the position with the same generator and pick the move stored at the stored index position.
Note that this will be about 100 times slower, as you have to generate all moves before you can know what the move number means (and then perform that move). While in the other encoding you simply perform the indicated move.

Code: Select all

pieceNr = pieces[moveCode];
step = steps[moveCode];
fromSqr = location[pieceNr];
toSqr = fromSqr + step & 0x77;
board[toSqr] = board[fromSqr];
board[fromSqr] = 0;
location[pieceNr] = toSqr;
I know you are right 😀. The thing is that he wants to make a database and then there is no need for performance. It does not matter for the user if it takes an additional microsecond or two. It might however be important to store some puzzles with nine queens on the board.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: Best way to store games

Post by Pio »

Peperoni wrote: Fri Nov 06, 2020 11:01 pm Wow that's brilliant :D
Thank you! It is actually very stupid and slow 😀 but I like it. My engine is extremely slow. It might be good for you however.

Another nice property with my board representation is that for a database you could use the the first 64 bits that indicates the occupancies as the first column (make it clustered) in your composite primary key for the position. In that way you will be able to query similar positions with the same occupancies very quickly. If you want to query similar positions with maybe one piece less or one more you will need to make a union all with 64 queries where each query has one bit flipped. You can use sp_executesql (if you use sql server) to generate your dynamic sql with the 64 union all selects.

Actually you will need 65 selects since you also want to show the unperturbed position.

Good luck!
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: Best way to store games

Post by Pio »

Pio wrote: Sat Nov 07, 2020 12:30 am
Peperoni wrote: Fri Nov 06, 2020 11:01 pm Wow that's brilliant :D
Thank you! It is actually very stupid and slow 😀 but I like it. My engine is extremely slow. It might be good for you however.

Another nice property with my board representation is that for a database you could use the the first 64 bits that indicates the occupancies as the first column (make it clustered) in your composite primary key for the position. In that way you will be able to query similar positions with the same occupancies very quickly. If you want to query similar positions with maybe one piece less or one more you will need to make a union all with 64 queries where each query has one bit flipped. You can use sp_executesql (if you use sql server) to generate your dynamic sql with the 64 union all selects.

Actually you will need 65 selects since you also want to show the unperturbed position.

Good luck!
I am stupid, you can make the query for similar positions better. You won’t need dynamic sql to generate the 65 numbers (64 perturbed numbers + 1 original) you just need a function that takes a bigint as argument and return all the union all’s. you should define a user defined table type so you can send in all the 65 numbers to other functions where you do filtering and sorting. Define all your table valued functions as inline table valued functions so the query optimiser can do its job. when you query your positions you won’t need to do union all’s you just put an in-statement in the where-clause with the 65 numbers that you return from your first function.
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Best way to store games

Post by Dann Corbit »

Peperoni wrote: Fri Nov 06, 2020 11:01 pm Wow that's brilliant :D
That technique was invented by an Amiga chess program in the 1980's
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: Best way to store games

Post by Pio »

Dann Corbit wrote: Sat Nov 07, 2020 4:26 am
Peperoni wrote: Fri Nov 06, 2020 11:01 pm Wow that's brilliant :D
That technique was invented by an Amiga chess program in the 1980's
Had no idea about it. Can you please send a link. Thanks!
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Best way to store games

Post by hgm »

Pio wrote: Sat Nov 07, 2020 12:17 amI know you are right 😀. The thing is that he wants to make a database and then there is no need for performance. It does not matter for the user if it takes an additional microsecond or two. It might however be important to store some puzzles with nine queens on the board.
You obviously have no experience with databases. To search a given position you have to run through hundreds of millions, or even billions of positions. So the 'additional few microseconds' will in practice mean the difference between getting the response in a second or having to wait an hour for it.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: Best way to store games

Post by Pio »

hgm wrote: Sat Nov 07, 2020 8:56 am
Pio wrote: Sat Nov 07, 2020 12:17 amI know you are right 😀. The thing is that he wants to make a database and then there is no need for performance. It does not matter for the user if it takes an additional microsecond or two. It might however be important to store some puzzles with nine queens on the board.
You obviously have no experience with databases. To search a given position you have to run through hundreds of millions, or even billions of positions. So the 'additional few microseconds' will in practice mean the difference between getting the response in a second or having to wait an hour for it.
I have 10+ experience of databases containing tables with a couple of billion rows. I am not saying I am an expert but I usually help others far more experienced database guys than me when there are problems with performance or other problems.

What has search of a position in a database anything (or much) to do with the board representation except that a smaller board representation is better?

You are not always right (even though you usually are right). It would be fun to see that you could accept that. It is hard for me too since I am a Korinthenkacker
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Best way to store games

Post by hgm »

Well, I was specifically talking about databases of chess games, without making that explicit, and I am sorry if this caused any misunderstanding. But let me remind you of the OP's original question:
Peperoni wrote: Fri Nov 06, 2020 10:07 amI want to do a custom database of games played by engines.
I am wondering how to store efficently these games and what would be the fastest way to find the games which match a given FEN.
So (1), he wants to efficiently store these games. I proposed a method that would store the moves, and requires 1 byte per move. You propose to store every individual position in the games in a format that takes 24 bytes per position. Which is indeed rather compact as representations for arbitrary positions go, but still 24 times less compact than storing the moves. That is a pretty large factor. Storing complete position descriptions is not a competitive method when the positions you want to store are so similar asadjacent positions in a chess game.

Then the OP asks for (2) the fastest way to search a given position. When storing moves (which can be considered position differences) there is an unavoidable overhead of converting these moves back to positions. But anything you process has to be brought into memory first, and reading from disk is easily 1000 times slower than memory-based operations. Storing full positions drives up this overhead by a factor 24. And having to run a move generator to make a list of moves just for decoding the move takes about 100 times as long as the actual application of the move itself to the previous position. (Probably still faster than reading uncompressed positions from disk, though.)

The remark that 1 or 2 microseconds more doesn't matter for an operation that has to be performed a billion times (e.g. 20M games x 50 moves each) is just not a very smart one. 2us x 1 billion = 2000 sec. You really think the OP doesn't mind whether his search query will be serviced in 10 sec or in 2000 sec? Speed is every bit as important in (chess) databases as in engines.

I never claimed I am always right. It seems to me I am right in this case, though. Or at least that your remark that it doesn't matter is very wrong. Using move generation for decoding moves, or storing positions rather than moves is just not good advice to someone who asks for efficiency and speed.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: Best way to store games

Post by Pio »

hgm wrote: Sat Nov 07, 2020 10:29 am Well, I was specifically talking about databases of chess games, without making that explicit, and I am sorry if this caused any misunderstanding. But let me remind you of the OP's original question:
Peperoni wrote: Fri Nov 06, 2020 10:07 amI want to do a custom database of games played by engines.
I am wondering how to store efficently these games and what would be the fastest way to find the games which match a given FEN.
So (1), he wants to efficiently store these games. I proposed a method that would store the moves, and requires 1 byte per move. You propose to store every individual position in the games in a format that takes 24 bytes per position. Which is indeed rather compact as representations for arbitrary positions go, but still 24 times less compact than storing the moves. That is a pretty large factor. Storing complete position descriptions is not a competitive method when the positions you want to store are so similar asadjacent positions in a chess game.

Then the OP asks for (2) the fastest way to search a given position. When storing moves (which can be considered position differences) there is an unavoidable overhead of converting these moves back to positions. But anything you process has to be brought into memory first, and reading from disk is easily 1000 times slower than memory-based operations. Storing full positions drives up this overhead by a factor 24. And having to run a move generator to make a list of moves just for decoding the move takes about 100 times as long as the actual application of the move itself to the previous position. (Probably still faster than reading uncompressed positions from disk, though.)

The remark that 1 or 2 microseconds more doesn't matter for an operation that has to be performed a billion times (e.g. 20M games x 50 moves each) is just not a very smart one. 2us x 1 billion = 2000 sec. You really think the OP doesn't mind whether his search query will be serviced in 10 sec or in 2000 sec? Speed is every bit as important in (chess) databases as in engines.

I never claimed I am always right. It seems to me I am right in this case, though. Or at least that your remark that it doesn't matter is very wrong.
Hi HGM!

You don’t have to feel sorry, I don’t get easily upset. I like to read your posts and won’t be offended that you stated that I obviously don’t have any database experience when you in fact obviously meant that I don’t have any chess database experience 🥰.

According to the problem statement
I want to do a custom database of games played by engines.
I am wondering how to store efficently these games and what would be the fastest way to find the games which match a given FEN.
he has two objectives. The first is to store the games efficiently and the second is to find the games that match a particular fen quickly. Your method wins the first objective with compactness with a factor 24 (but less if you consider other information about the position and more if you consider the indexing) and mine wins the second objective to find a particular fen by finding it exponentially faster than you (in his case probably tens of thousands of times faster).

My way has some other nice properties in that he can 1) use normal databases with build in transaction handling, backups, indexing... and 2) he can use the database for puzzles as well (even though it was not stated in the problem description) where there is no path from the start to the puzzle available (but obviously you thought of doing retrograde analysis to the start 😀) and 3) it is colour agnostic.

He can choose whatever he wants. I know what I would choose.