Position searching: 4th attempt
Attempt 4th - A
This attempt is just a simple improvement of the previous one (the attempt 3rd), to speed up the search by hash keys. In the previous attempt, the search may take average 2.6 minutes for the first ply. The idea for this attempt is: if SQLite searches so slowly because of performing directly on the hard disk, we may improve the speed by making it perform in memory only.
There are two solutions to make SQLite perform in memory. The first one is to create a virtual hard disk and copy the database into it. We won’t use it since it requires users to learn and do some extra and boring things. The second one is to create an in-memory database and then load the physical/real one from the hard disk. We are going to implement this one.
The stats are as below:
Code: Select all
Loaded into memory, elapsed: 286132 ms, 04:46
SQLite database file ':memory:' opened successfully
ply: 1, #total queries: 1000, elapsed: 1982008 ms, 33:02, time per query: 1982 ms, total results: 1138017428, results/query: 1138017
ply: 2, #total queries: 1000, elapsed: 668297 ms, 11:08, time per query: 668 ms, total results: 359616424, results/query: 359616
ply: 3, #total queries: 1000, elapsed: 394425 ms, 06:34, time per query: 394 ms, total results: 199687182, results/query: 199687
ply: 4, #total queries: 1000, elapsed: 86780 ms, 01:26, time per query: 86 ms, total results: 37938700, results/query: 37938
ply: 5, #total queries: 1000, elapsed: 15266 ms, 00:15, time per query: 15 ms, total results: 5548104, results/query: 5548
ply: 6, #total queries: 1000, elapsed: 16501 ms, 00:16, time per query: 16 ms, total results: 6513774, results/query: 6513
ply: 7, #total queries: 1000, elapsed: 7095 ms, 00:07, time per query: 7 ms, total results: 2447268, results/query: 2447
ply: 8, #total queries: 1000, elapsed: 14105 ms, 00:14, time per query: 14 ms, total results: 5305266, results/query: 5305
ply: 9, #total queries: 1000, elapsed: 6703 ms, 00:06, time per query: 6 ms, total results: 2412860, results/query: 2412
ply: 10, #total queries: 1000, elapsed: 5785 ms, 00:05, time per query: 5 ms, total results: 2077934, results/query: 2077
ply: 11, #total queries: 1000, elapsed: 1323 ms, 00:01, time per query: 1 ms, total results: 421201, results/query: 421
ply: 12, #total queries: 1000, elapsed: 854 ms, 00:00, time per query: 0 ms, total results: 283617, results/query: 283
ply: 13, #total queries: 1000, elapsed: 654 ms, 00:00, time per query: 0 ms, total results: 239088, results/query: 239
ply: 14, #total queries: 1000, elapsed: 354 ms, 00:00, time per query: 0 ms, total results: 145977, results/query: 145
ply: 15, #total queries: 1000, elapsed: 37 ms, 00:00, time per query: 0 ms, total results: 10677, results/query: 10
ply: 16, #total queries: 1000, elapsed: 92 ms, 00:00, time per query: 0 ms, total results: 36655, results/query: 36
ply: 17, #total queries: 1000, elapsed: 36 ms, 00:00, time per query: 0 ms, total results: 11374, results/query: 11
ply: 18, #total queries: 1000, elapsed: 37 ms, 00:00, time per query: 0 ms, total results: 8662, results/query: 8
ply: 19, #total queries: 1000, elapsed: 31 ms, 00:00, time per query: 0 ms, total results: 9538, results/query: 9
ply: 20, #total queries: 1000, elapsed: 24 ms, 00:00, time per query: 0 ms, total results: 4397, results/query: 4
ply: 21, #total queries: 1000, elapsed: 31 ms, 00:00, time per query: 0 ms, total results: 7290, results/query: 7
ply: 22, #total queries: 1000, elapsed: 30 ms, 00:00, time per query: 0 ms, total results: 4971, results/query: 4
ply: 23, #total queries: 1000, elapsed: 9 ms, 00:00, time per query: 0 ms, total results: 977, results/query: 0
ply: 24, #total queries: 1000, elapsed: 11 ms, 00:00, time per query: 0 ms, total results: 728, results/query: 0
ply: 25, #total queries: 1000, elapsed: 13 ms, 00:00, time per query: 0 ms, total results: 762, results/query: 0
ply: 26, #total queries: 1000, elapsed: 14 ms, 00:00, time per query: 0 ms, total results: 748, results/query: 0
ply: 27, #total queries: 1000, elapsed: 22 ms, 00:00, time per query: 0 ms, total results: 772, results/query: 0
ply: 28, #total queries: 1000, elapsed: 7 ms, 00:00, time per query: 0 ms, total results: 765, results/query: 0
ply: 29, #total queries: 1000, elapsed: 16 ms, 00:00, time per query: 0 ms, total results: 751, results/query: 0
The load and convert the whole database into an in-memory one took about 5 minutes, quite long. However, the search is so fast. At ply 1, a query needs only 2 seconds instead of 2.6 minutes as the previous attempt. From ply 2, the time is much faster.
Attempt 4th - B
I don’t like to spend 5 minutes waiting for loading a database. I don’t like either loading over 13 GB of data into my computer’s memory. My old computer has only 16 GB RAM, loading that size makes the whole system almost runs out of memory, slow down significantly. Furthermore, even that database (3.45 million games) is runnable on my computer, any bigger one, say, 7 million games, may overflow/stop my computer.
Thus I try another way. The idea is simple: drop/omit the index since it is too large and check if the speed without the indexes is acceptable. The database without indexing (for hash keys) is 8.87 GB, a reduction of 5 GB. The size of the database now is fine for my computer.
Below are the new stats:
Code: Select all
Loaded into memory, elapsed: 133295 ms, 02:13
ply: 1, #total queries: 1000, elapsed: 12514400 ms, 3:28:34, time per query: 12514 ms, total results: 1138017428, results/query: 1138017
ply: 2, #total queries: 1000, elapsed: 12721859 ms, 3:32:01, time per query: 12721 ms, total results: 359616424, results/query: 359616
ply: 3, #total queries: 1000, elapsed: 12378915 ms, 3:26:18, time per query: 12378 ms, total results: 199687182, results/query: 199687
ply: 4, #total queries: 1000, elapsed: 11123044 ms, 3:05:23, time per query: 11123 ms, total results: 37938700, results/query: 37938
ply: 5, #total queries: 1000, elapsed: 11782850 ms, 3:16:22, time per query: 11782 ms, total results: 5548104, results/query: 5548
The loading time was cut more than a half - a good result. Each query took only 12 seconds - another good result compared with 2.6 minutes. Just a bit different is that the query time is almost the same by plies (not reduced as previous attempts), perhaps caused by scanning all records.
Code has been pushed into a new branch named “searchwithhashkeysinmem”.