Well, I was specifically talking about databases of chess games, without making that explicit, and I am sorry if this caused any misunderstanding. But let me remind you of the OP's original question:
Peperoni wrote: ↑Fri Nov 06, 2020 10:07 amI want to do a custom database of games played by engines.
I am wondering how to store efficently these games and what would be the fastest way to find the games which match a given FEN.
So (1), he wants to
efficiently store these games. I proposed a method that would store the moves, and requires 1 byte per move. You propose to store every individual position in the games in a format that takes 24 bytes per position. Which is indeed rather compact as representations for arbitrary positions go, but still 24 times less compact than storing the moves. That is a pretty large factor. Storing complete position descriptions is not a competitive method when the positions you want to store are so similar asadjacent positions in a chess game.
Then the OP asks for (2) the
fastest way to search a given position. When storing moves (which can be considered
position differences) there is an unavoidable overhead of converting these moves back to positions. But anything you process has to be brought into memory first, and reading from disk is easily 1000 times slower than memory-based operations. Storing full positions drives up this overhead by a factor 24. And having to run a move generator to make a list of moves just for decoding the move takes about 100 times as long as the actual application of the move itself to the previous position. (Probably still faster than reading uncompressed positions from disk, though.)
The remark that 1 or 2 microseconds more doesn't matter for an operation that has to be performed a billion times (e.g. 20M games x 50 moves each) is just not a very smart one. 2us x 1 billion = 2000 sec. You really think the OP doesn't mind whether his search query will be serviced in 10 sec or in 2000 sec? Speed is every bit as important in (chess) databases as in engines.
I never claimed I am always right. It seems to me I am right in this case, though. Or at least that your remark that it doesn't matter is very wrong. Using move generation for decoding moves, or storing positions rather than moves is just not good advice to someone who asks for efficiency and speed.