phhnguyen wrote: ↑Mon Jul 18, 2022 7:58 am
Do you mean you still use multi-files instead of one file for your database? Is it your own binary database
Yes, this is still a dedicated open-source binary database format.
The main objectives are:
1) maximum speed for finding an exact position
2) minimum compressed size
Finding an exact position is the most common and most critical operation. When studying an opening it is necessary to have the results in less than a second.
Having the smallest compressed size makes a difference in terms of bandwidth when downloading.
As for the general databases, RocksDB is great.
It is not really a database, it has no search functions.
But it has bloom filters, LRU caches, lz4/zstd compression and auto compaction built in.
However, when searching for a position, the ability to reorder the games makes the difference. When a SCID database is compacted, it also reorders the games optimally and becomes over 6x faster than rocksdb.
When it comes to compression, which is again fantastic and super fast, there is a catch though. Compressing the entire database into a single file (rocksdb uses an entire directory for its files) no longer reduces its size much. It varies a lot, but the compressed rocksdb database becomes ~20% larger than the same compressed SCID5 database.
I also checked your sqlite database structure (is it complete? Importing a PGN file with many tag-pairs, comments, NAGs, variations and then exporting it back to PGN produces a file equal to the original?).
Adding the tag "TimeControl" to the Game's record is a very interesting idea. It is a fundamental information in my opinion. Unfortunately I believe it is only present in lichess PGNs.
Adding a "Ply" column in the "Comment" table instead is a mistake in my opinion. For example lichess adds a %clk comment for each move and you end up memorizing all those consecutive plies even if it is not necessary.