Position searching: 3rd attempt
Some people have mentioned using hash keys as a solution for position searching. This time that idea should be given a try.
Creating database
Hash keys of all chess positions will be stored in a new table HashPositions with their game IDs and plies.
Code: Select all
CREATE TABLE HashPositions (ID INTEGER PRIMARY KEY, GameID INTEGER, AtPly INTEGER, HashKey INTEGER)
The stats when creating database as the below:
Code: Select all
Thread count: 4
#games: 228742, #errors: 0, #hashKeys: 17353710, elapsed: 56401ms 00:56, speed: 4055 games/s
#games: 444564, #errors: 0, #hashKeys: 33670584, elapsed: 107830ms 01:47, speed: 4122 games/s
#games: 660467, #errors: 0, #hashKeys: 50006190, elapsed: 160091ms 02:40, speed: 4125 games/s
#games: 876236, #errors: 0, #hashKeys: 66314397, elapsed: 212072ms 03:32, speed: 4131 games/s
#games: 1084961, #errors: 0, #hashKeys: 82413379, elapsed: 263623ms 04:23, speed: 4115 games/s
#games: 1282580, #errors: 0, #hashKeys: 98234700, elapsed: 312420ms 05:12, speed: 4105 games/s
#games: 1481295, #errors: 0, #hashKeys: 114086982, elapsed: 362654ms 06:02, speed: 4084 games/s
#games: 1682015, #errors: 0, #hashKeys: 129877265, elapsed: 409788ms 06:49, speed: 4104 games/s
#games: 1878876, #errors: 0, #hashKeys: 145665804, elapsed: 457185ms 07:37, speed: 4109 games/s
#games: 2070568, #errors: 0, #hashKeys: 161468086, elapsed: 504432ms 08:24, speed: 4104 games/s
#games: 2263597, #errors: 0, #hashKeys: 177174148, elapsed: 552428ms 09:12, speed: 4097 games/s
#games: 2454031, #errors: 0, #hashKeys: 192886927, elapsed: 602850ms 10:02, speed: 4070 games/s
#games: 2642990, #errors: 0, #hashKeys: 208655313, elapsed: 652779ms 10:52, speed: 4048 games/s
#games: 2831375, #errors: 0, #hashKeys: 224407558, elapsed: 704457ms 11:44, speed: 4019 games/s
#games: 2994165, #errors: 231, #hashKeys: 239157678, elapsed: 755334ms 12:35, speed: 3964 games/s
#games: 3141649, #errors: 308, #hashKeys: 252700404, elapsed: 799287ms 13:19, speed: 3930 games/s
#games: 3297553, #errors: 370, #hashKeys: 265944625, elapsed: 842684ms 14:02, speed: 3913 games/s
#games: 3456399, #errors: 372, #hashKeys: 277802645, elapsed: 883179ms 14:43, speed: 3913 games/s
Completed!
There are a total of over 277 million hash keys (equal to the total of all moves). Perhaps, it is too large for a table! They make the database file size increases from 2.08 to 8.87 GB, 4.3 times larger.
There is an interesting issue. My computer has 4 main (physical) and 4 virtual (hyper-threading) cores. In the previous attempt, the creating speed was increased clearly when I change from using 4 to 8 threads. However, this time, the speed is decreased more than 2 times. I am not sure the reason yet since the code is almost the same (with the previous attempt), no more computing. Perhaps it was caused by overflowing SQLite.
Searching
Below is the statement to search games with a hash key of the given position:
Code: Select all
SELECT g.ID, g.Round, Date, w.Name White, WhiteElo, b.Name Black, BlackElo, Result, Timer, ECO, PlyCount, FEN, Moves, Hash
FROM Games g
INNER JOIN Players w ON WhiteID = w.ID
INNER JOIN Players b ON BlackID = b.ID
INNER JOIN HashPositions h ON g.ID = h.GameID
WHERE h.HashKey = ?
I have run the app with the benchmark function, waited for a very long time, and see… nothing! After verifying, I realized the query took too much time. Below are the stats with the number of tests reduced a lot:
Code: Select all
ply: 1, #total queries: 20, elapsed: 4489259 ms, 1:14:49, time per query: 224462 ms, total results: 69014820, results/query: 3450741
ply: 2, #total queries: 20, elapsed: 3874209 ms, 1:04:34, time per query: 193710 ms, total results: 22869993, results/query: 1143499
ply: 3, #total queries: 20, elapsed: 3775149 ms, 1:02:55, time per query: 188757 ms, total results: 8906460, results/query: 445323
ply: 4, #total queries: 20, elapsed: 3766566 ms, 1:02:46, time per query: 188328 ms, total results: 5169930, results/query: 258496
ply: 5, #total queries: 20, elapsed: 3448294 ms, 57:28, time per query: 172414 ms, total results: 788888, results/query: 39444
ply: 6, #total queries: 20, elapsed: 3320534 ms, 55:20, time per query: 166026 ms, total results: 96279, results/query: 4813
ply: 7, #total queries: 20, elapsed: 3395092 ms, 56:35, time per query: 169754 ms, total results: 150744, results/query: 7537
I stopped the test here since I run out of the patient. A query may take 224 seconds (3.7 minutes). That speed is painfully slow! I think the reasons it took too much time are:
- it has to scan all hash keys (277 million)
- SQLite performs the search on the hard disk instead of in memory
I have tried to create the index for the field HashKey. The process took over 10 minutes and the file size increased to 13.74 GB.
The search stats are the below:
Code: Select all
ply: 1, #total queries: 1000, elapsed: 156168774 ms, 43:22:48, time per query: 156168 ms, total results: 1137429724, results/query: 1137429
ply: 2, #total queries: 1000, elapsed: 88930814 ms, 24:42:10, time per query: 88930 ms, total results: 359591200, results/query: 359591
ply: 3, #total queries: 1000, elapsed: 77368156 ms, 21:29:28, time per query: 77368 ms, total results: 199681330, results/query: 199681
ply: 4, #total queries: 1000, elapsed: 7232008 ms, 2:00:32, time per query: 7232 ms, total results: 37934732, results/query: 37934
ply: 5, #total queries: 1000, elapsed: 46667 ms, 00:46, time per query: 46 ms, total results: 5547110, results/query: 5547
ply: 6, #total queries: 1000, elapsed: 87649 ms, 01:27, time per query: 87 ms, total results: 6512286, results/query: 6512
ply: 7, #total queries: 1000, elapsed: 28355 ms, 00:28, time per query: 28 ms, total results: 2446515, results/query: 2446
ply: 8, #total queries: 1000, elapsed: 262181 ms, 04:22, time per query: 262 ms, total results: 5303523, results/query: 5303
ply: 9, #total queries: 1000, elapsed: 18622 ms, 00:18, time per query: 18 ms, total results: 2411908, results/query: 2411
ply: 10, #total queries: 1000, elapsed: 13884 ms, 00:13, time per query: 13 ms, total results: 2077677, results/query: 2077
ply: 11, #total queries: 1000, elapsed: 3335 ms, 00:03, time per query: 3 ms, total results: 420394, results/query: 420
ply: 12, #total queries: 1000, elapsed: 2306 ms, 00:02, time per query: 2 ms, total results: 283109, results/query: 283
ply: 13, #total queries: 1000, elapsed: 2114 ms, 00:02, time per query: 2 ms, total results: 238831, results/query: 238
ply: 14, #total queries: 1000, elapsed: 1498 ms, 00:01, time per query: 1 ms, total results: 145742, results/query: 145
ply: 15, #total queries: 1000, elapsed: 236 ms, 00:00, time per query: 0 ms, total results: 10677, results/query: 10
ply: 16, #total queries: 1000, elapsed: 838 ms, 00:00, time per query: 0 ms, total results: 36655, results/query: 36
ply: 17, #total queries: 1000, elapsed: 143 ms, 00:00, time per query: 0 ms, total results: 11374, results/query: 11
ply: 18, #total queries: 1000, elapsed: 111 ms, 00:00, time per query: 0 ms, total results: 8662, results/query: 8
ply: 19, #total queries: 1000, elapsed: 174 ms, 00:00, time per query: 0 ms, total results: 9301, results/query: 9
ply: 20, #total queries: 1000, elapsed: 142 ms, 00:00, time per query: 0 ms, total results: 4397, results/query: 4
ply: 21, #total queries: 1000, elapsed: 246 ms, 00:00, time per query: 0 ms, total results: 7290, results/query: 7
ply: 22, #total queries: 1000, elapsed: 94 ms, 00:00, time per query: 0 ms, total results: 4971, results/query: 4
ply: 23, #total queries: 1000, elapsed: 66 ms, 00:00, time per query: 0 ms, total results: 977, results/query: 0
ply: 24, #total queries: 1000, elapsed: 86 ms, 00:00, time per query: 0 ms, total results: 728, results/query: 0
ply: 25, #total queries: 1000, elapsed: 30 ms, 00:00, time per query: 0 ms, total results: 762, results/query: 0
ply: 26, #total queries: 1000, elapsed: 36 ms, 00:00, time per query: 0 ms, total results: 748, results/query: 0
ply: 27, #total queries: 1000, elapsed: 85 ms, 00:00, time per query: 0 ms, total results: 772, results/query: 0
ply: 28, #total queries: 1000, elapsed: 55 ms, 00:00, time per query: 0 ms, total results: 765, results/query: 0
ply: 29, #total queries: 1000, elapsed: 53 ms, 00:00, time per query: 0 ms, total results: 751, results/query: 0
I have run my computer near 4 days for the above stats. We may have some conclusions:
- The average search time per query is shorter than ones without index and reduce quicker by plies
- From 1-4 plies, the search time is slow, but from ply 5, they are so quick
Compare with the previous attempt:
- For the first two plies, the numbers of results are quite closed, the previous ones are a little bit larger (it is a myth because theory they should be smaller)
- From 3rd plies, the new attempt has more results than the previous ones and the gaps become significantly larger by plies
Code has been pushed with a new branch called "searchwithhashkeys".