Although these are not overly important limits, it would still be useful to remove them.Fulvio wrote: ↑Sun Oct 17, 2021 1:18 pm 10) The most important limitation in my opinion is that neither the tags nor the comments are compressed in any way. It is now common to have games where the clock and the eventual evaluation of each position are stored as comments, and it would be possible to save a lot of space.
The NameBase has a couple of useful properties:
- the IDs are sequentially increasing values starting with value 0
- the tag values cannot contain the null char ('\ 0').
The simplest idea is therefore to store only the tag values, sorted by ID, as c-strings (terminated by a null char '\ 0').
By preceding a section (players name, events, sites, rounds) with 8-bytes indicating the entire size of a section, it is possible to concatenate them.
And then you can compress everything with zstd.
In summary decoding would become:
- read the .sn4 file in memory
- ZSTD_getFrameContentSize -> total size of all uncompressed sections
- ZSTD_decompress -> unzip all sections
- Convert the first 8-bytes to uint64_t -> size of the next section
- Read the strings and load them into memory (currently a std :: vector <std :: unique_ptr <const char [] >> is used)
Benefits:
- no limit on length
- probably a smaller size
- the four sections can possibly be loaded in parallel
- an easier format to decode
Disadvantages:
- dependence on the zstd library. Managing dependencies in C ++ is always a nightmare. Simply connecting to the library in the system dynamically could lead to incompatibility problems. For example, a database created with a newer version that is not read by older versions.
- reading the file requires double the RAM memory.
Another interesting idea, since the NameBase is always written entirely anyway, would be to remove the .sn4 file and insert the Namebase at the end of the .si4 file.
Benefits:
- the database would consist of only 2 files
- there can be no incompatibility problems between the .si4 file and the values it refers to in the .sn4 file
Disadvantages:
- when adding games, the old NameBase is overwritten and only remains in memory. If an error or a power failure occurs, the database is irrimedially damaged (with two separate files only the new names added are lost).