Open Chess Game Database Standard

phhnguyen · Post by **phhnguyen** » Fri Dec 24, 2021 5:10 am

Position searching: 6th attempt

So far we have done some attempts to use hash keys to match positions. They work so well. However, there are still some problems we want to address and solve in this attempt.

Hash key
We match a given position with records in the database by using the hash key. One of the problems is how to get the hash key for a given position. @Dann asked us to create a SQL user-defined function to create hash keys from FENs. Unfortunately, SQLite doesn’t support user-defined functions. Thus we can’t create that function to run in any SQLite GUI/browser. Instead, we need to run our code, use our provided functions to create hash keys.

One of the potential solutions for that issue is to save FENs with or replace hash keys.

Using FENs may have some issues as the below:

Size: A FEN string with an average length of 55 bytes takes much more space than a hash key of 8 only (near 7 times larger). When we may ignore the problem with the database size in hard disks (it is large but still fine with modern hardware) it may create a big issue by eating too much memory, taking too much time when creating. We will mention more in the next part

Standardized for searching: Two completely the same positions may still have different FENs because of having different half move clocks and full move numbers. Even we may use SQK statements with LIKE to match those FENs, the performance of LIKE is much worse than full-string marching. To solve the problem, we standardize all FENs to have half move clocks as 0 and full move numbers as 1

Memory pressure
In the last attempt, we keep all information in memory till the end. They are 200 million hash keys, 3.45 million game IDs, all take 200 x 8 + 3.45 x 4, about 1.6 GB. That size itself is not so huge for my computer with 16 GB RAM. However, the compiler has added some space to each item to manage them, thus the real memory should be significantly bigger. To make the SQL engine works fast, we use a transaction for the whole process. That makes SQLite keep the whole database in memory. In total all take a huge amount of memory. Some of my tries in some previous attempts crashed because of running out of memory and we were successful only after some tries to reduce taken memory. On the last try, my computer informed the app using over 14 GB RAM.

Even though in the end the program can run well, it is actually so close to some boundaries. The app may be crashed again if we add more data to each record and/or work with larger numbers of games.

1. Building the database

Adding FEN strings is a challenge since it may push the memory usage over the limit. If we keep all those strings in memory, with 200 million positions and an average length of 55, they will take about 11 GB. My old computer has only 16 GB of RAM. That additional amount will surely bring trouble.

1st try
This was simple, straightforward, to keep all FEN strings in memory. When running, the system quickly became so laggy and then almost stopped responding. The speed became crawling. I canceled since couldn’t wait for anymore. FAILED

2nd try
Instead of keeping FEN strings in memory, we wrote them down immediately into the database. The problem was that the UPDATE statements would take too much time due to scanning all hash keys to find updating records. In this attempt, we didn’t update by hash keys but their IDs instead, in the hope the SQL engine could find updating locations quickly without scanning all items. Unfortunately, it didn’t work as we want, the speed was too bad to accept. FAILED

3rd try
We stored FEN strings into a temporary file and then read it back to write down into the database later with hash keys and game IDs. To know which FEN belongs to which hash key we stored that hash key with that FEN. Furthermore, we stored the first game ID that comes with that FEN, saving it from memory.

The stats of this trial are as the below: