Open Chess Game Database Standard

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Position searching: 6th attempt

Post by phhnguyen »

Position searching: 6th attempt

So far we have done some attempts to use hash keys to match positions. They work so well. However, there are still some problems we want to address and solve in this attempt.

Hash key
We match a given position with records in the database by using the hash key. One of the problems is how to get the hash key for a given position. @Dann asked us to create a SQL user-defined function to create hash keys from FENs. Unfortunately, SQLite doesn’t support user-defined functions. Thus we can’t create that function to run in any SQLite GUI/browser. Instead, we need to run our code, use our provided functions to create hash keys.

One of the potential solutions for that issue is to save FENs with or replace hash keys.

Using FENs may have some issues as the below:
  • Size: A FEN string with an average length of 55 bytes takes much more space than a hash key of 8 only (near 7 times larger). When we may ignore the problem with the database size in hard disks (it is large but still fine with modern hardware) it may create a big issue by eating too much memory, taking too much time when creating. We will mention more in the next part
  • Standardized for searching: Two completely the same positions may still have different FENs because of having different half move clocks and full move numbers. Even we may use SQK statements with LIKE to match those FENs, the performance of LIKE is much worse than full-string marching. To solve the problem, we standardize all FENs to have half move clocks as 0 and full move numbers as 1

Memory pressure
In the last attempt, we keep all information in memory till the end. They are 200 million hash keys, 3.45 million game IDs, all take 200 x 8 + 3.45 x 4, about 1.6 GB. That size itself is not so huge for my computer with 16 GB RAM. However, the compiler has added some space to each item to manage them, thus the real memory should be significantly bigger. To make the SQL engine works fast, we use a transaction for the whole process. That makes SQLite keep the whole database in memory. In total all take a huge amount of memory. Some of my tries in some previous attempts crashed because of running out of memory and we were successful only after some tries to reduce taken memory. On the last try, my computer informed the app using over 14 GB RAM.

Even though in the end the program can run well, it is actually so close to some boundaries. The app may be crashed again if we add more data to each record and/or work with larger numbers of games.


1. Building the database

Adding FEN strings is a challenge since it may push the memory usage over the limit. If we keep all those strings in memory, with 200 million positions and an average length of 55, they will take about 11 GB. My old computer has only 16 GB of RAM. That additional amount will surely bring trouble.

1st try
This was simple, straightforward, to keep all FEN strings in memory. When running, the system quickly became so laggy and then almost stopped responding. The speed became crawling. I canceled since couldn’t wait for anymore. FAILED

2nd try
Instead of keeping FEN strings in memory, we wrote them down immediately into the database. The problem was that the UPDATE statements would take too much time due to scanning all hash keys to find updating records. In this attempt, we didn’t update by hash keys but their IDs instead, in the hope the SQL engine could find updating locations quickly without scanning all items. Unfortunately, it didn’t work as we want, the speed was too bad to accept. FAILED

3rd try
We stored FEN strings into a temporary file and then read it back to write down into the database later with hash keys and game IDs. To know which FEN belongs to which hash key we stored that hash key with that FEN. Furthermore, we stored the first game ID that comes with that FEN, saving it from memory.

The stats of this trial are as the below:

Code: Select all

Completed writing hash keys, total: 199217076, one: 188479599, blob: 10737476, FEN average length: 55 max: 79 min: 27, elapsed: 3013648 ms 50:13, speed: 66104 hash keys/s
#games: 3456399, #errors: 0, #hashKeys: 199217076, #posCnt: 548690734, hashCnt: 0, hashHit: 78579956, elapsed: 4228185ms 1:10:28, speed: 817 games/s
The process of writing down hash keys took 50:13, 7.5 times longer than Attempt 5th (vs 06:41). That process took most of the extra time, causing the total time of creating the database to be 1:10:28, almost 3 times longer (vs previous one of 25:10). The database size is increased significantly to 18.41 GB, 2.6 times larger (vs the previous one of 6.95 GB).

BTW, all work as we plan. SUCCEED.

2. Searching
FEN index
In Attempt 5th, the time to search a hash key is fast enough even the app may have to scan the whole database (without index). However, in this Attempt, the search time for a FEN may take too much time, especially for non-hit ones, over 5 minutes on my computer. Perhaps, that is caused by matching strings taking more time and the whole database is much larger to load and process, compared with the previous ones.

To speed up we create an index for the column FEN. The process took almost 1 hour and increased the database size to 31.46 GB (an increase of 13 GB just for FEN indexes). All make the current database size 4.5 times larger than the previous one.

Code: Select all

Created Index, elapsed: 3552358 ms, 59:12

Searching
In general, searching is quite similar to Attempt 5th except we can query directly with the FEN of the given position instead of firstly converting it into a hash key. We still need to standardize the FEN string first by replacing the half move clocks and full move numbers to 0 and 1, respectively.

The stats are as the below:

Code: Select all

ply: 1, #total queries: 1000, elapsed: 2592555 ms, 43:12, time per query: 2592 ms, total results: 1137427224, results/query: 1137427
ply: 2, #total queries: 1000, elapsed: 1637633 ms, 27:17, time per query: 1637 ms, total results: 359585256, results/query: 359585
ply: 3, #total queries: 1000, elapsed: 1170815 ms, 19:30, time per query: 1170 ms, total results: 199675249, results/query: 199675
ply: 4, #total queries: 1000, elapsed: 619721 ms, 10:19, time per query: 619 ms, total results: 15892107, results/query: 15892
ply: 5, #total queries: 1000, elapsed: 755671 ms, 12:35, time per query: 755 ms, total results: 5547110, results/query: 5547
ply: 6, #total queries: 1000, elapsed: 746903 ms, 12:26, time per query: 746 ms, total results: 6512038, results/query: 6512
ply: 7, #total queries: 1000, elapsed: 743398 ms, 12:23, time per query: 743 ms, total results: 2446515, results/query: 2446
ply: 8, #total queries: 1000, elapsed: 737885 ms, 12:17, time per query: 737 ms, total results: 5303250, results/query: 5303
ply: 9, #total queries: 1000, elapsed: 710761 ms, 11:50, time per query: 710 ms, total results: 2411668, results/query: 2411
ply: 10, #total queries: 1000, elapsed: 763072 ms, 12:43, time per query: 763 ms, total results: 2076649, results/query: 2076
ply: 11, #total queries: 1000, elapsed: 744002 ms, 12:24, time per query: 744 ms, total results: 420394, results/query: 420
The searching speeds are comparable with ones of Attempt 5, though they are more consistent with high plies (when having more no-hit) dude to index.

3. Quick conclusions for this attempt
Using FENs for position matching is workable. It can help to work with the database using normal SQLite GUIs/Browsers (doesn't require any special code/app). However, it comes with a high price: requires more time to build, more space for the database. IMO, it seems to be not worth that trade-off.

We may not use FENs but this Attempt gave us some more knowledge/understanding as well as some new techniques to use later.


4. Source code
All code of this Attempt has been pushed already into a new branch “searchwithhashkeysblob2”.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
ndbd
Posts: 9
Joined: Mon Jan 11, 2021 11:15 am
Full name: Roger S.

Re: Open Chess Game Database Standard

Post by ndbd »

Just a quick question: if you hash positions it won't be possible to search for pawn structures like in chessbase, right? Or unless you hash for each position the plan structure as a separate hash value?! How is it done in chessbase I wonder?

Same as positions with 'joker' pieces etc. These are quite useful features.
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

ndbd wrote: Fri Dec 24, 2021 9:31 am Just a quick question: if you hash positions it won't be possible to search for pawn structures like in chessbase, right? Or unless you hash for each position the plan structure as a separate hash value?!
No, we can't at the moment. Wait for other attempts.
ndbd wrote: Fri Dec 24, 2021 9:31 am How is it done in chessbase I wonder?

Same as positions with 'joker' pieces etc. These are quite useful features.
No idea. I think that is their business secret, they never share.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
Harald
Posts: 318
Joined: Thu Mar 09, 2006 1:07 am

Re: Open Chess Game Database Standard

Post by Harald »

Could you try to use 32 byte blobs for a position in the position table.
That is 4 bit for each square, enough for all black and white pieces.
Of course the poasition that you search should also be transformed that way
to allow an easy comparison.
I would also create a table with the material signature for black and white
Just two integers, fixed bit positions for the piece numbers.
Then create a third table with the n:m relation of the signatures and positions.
With these tables every position related search should be possible and fast.
You could later create more similar extra tables if needed.
The search for a position first looks for the material signature and then joins the
related positions for a final comparison. For more general searches you could
also use just parts of the signature or parts of the positions.
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

Queries for positional features can't really be done any better than O(n) (you can optimize for specific cases but not in general), and for that there's scoutfish/SCID. I don't think it can be done efficiently in sql, all implementations I know either utilize a compressed binary position format or don't store the positions at all (checked on the fly). One way to allow a good subset of positional queries within SQL could be to store positions as fixed width 64-char array (+ special bits), each for one square, and query with LIKE, but it would be unbearably slow since LIKE can't benefit from indexes (an alternative would be to have 64 byte columns, index them, and form queries based on logical relations between them. But I can't imagine it being good either).

Harald's idea is good, and could work in tandem with using LIKE to narrow down the possibilities, but it still accelarates only a subset of interesting positional queries. And sure, we can store hash for position, hash for pawn structure, hash for material, hash for bishops, hash for knights, hash for mobility, hash for double attacks.. but where does it stop?
To make the SQL engine works fast, we use a transaction for the whole process. That makes SQLite keep the whole database in memory.
This is unacceptable. The big advantage of using SQL is that you have a DBMS that can work very well with large on-disk databases. This completely kills the point of this project I think. Transactions are fine, but they should be done in such a way to never exceed a specified amount of memory.
To speed up we create an index for the column FEN.
yea, fens will be bad for three reasons: 1. they are large and not fixed width, 2. they are not unique (though for most purposes the minimal form is used, no one is preventing someone from using `1111` in place of `4` in a fen), 3. indexes will be large because they need to store a copy of the values. I suggest staying with 64 bit hashes (which will be fine for most use cases), in light of the lack of possibility of handling more complex positional queries in an efficient way.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

Harald wrote: Fri Dec 24, 2021 1:45 pm Could you try to use 32 byte blobs for a position in the position table.
That is 4 bit for each square, enough for all black and white pieces.
Of course the poasition that you search should also be transformed that way
to allow an easy comparison.
I would also create a table with the material signature for black and white
Just two integers, fixed bit positions for the piece numbers.
Then create a third table with the n:m relation of the signatures and positions.
With these tables every position related search should be possible and fast.
You could later create more similar extra tables if needed.
The search for a position first looks for the material signature and then joins the
related positions for a final comparison. For more general searches you could
also use just parts of the signature or parts of the positions.
Yes if this standard should support queries like: "All positions where a pawn is on this square" then we need something like a 32 byte encoding. I think this is already a hard limit since smaller board representations need a decompression step.

But the cost is there:
32 byte per position.

When already using the compressed blob ID per position this might not even add that much in terms of % size.
Then you could query all transpositions that lead to a commonly known pawn configuration etc.

On the memory vs non memory db - I think thats best left to the enduser. Since this is just a setting in the connectionstring - it can be chosen by anyone for what fits best.
This standard should describe the table layout and give some code to fill the db (which we already have)
Performance will be best with in memory db and real actual banking systems use that all the time. (just make sure you use ecc ram)

Merry christmas!
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Thanks all for the ideas and suggestions.

I am going to complete a new attempt with the first implementation of approximately position searching. We will see how good/bad is it :)

Merry Christmas!
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Position searching: Xmas (7th) attempt

Post by phhnguyen »

Position searching: Xmas (7th) attempt

In this attempt, we will implement the ability of approximately position searching. This kind of search can answer a huge range of questions such as:

Find all positions/games that:
- White has 3 Queens
- have White Pawns in d4, e5, f4, g4, and Black King in d7
- have Black King in middle squares (d4, d5, e4, e5)

Clearly, to answer those questions we should store all positions and enough information with many aspects for each of them. From some previous attempts, we have stored all possible chess positions in the form of records in the database with their hash keys (there are over 200 million records for our test database). Now we need to add “enough” information for each record. How much is enough? After considering for a while, reviewing many more possible questions, we concluded we should store full board information for each position. In other words, we should use a board representation.

On Attempt 6th we have stored FENs for positions/moves in a database. Theoretically, a FEN is a good board representation. It has enough information to answer any question about positions/position matching or searching. However, using FENs have some big drawbacks as the following:

- Matching strings is slow because strings normally have many bytes, much more than integers, and because of loading huge data from storage
- FEN strings don’t have fixed sizes, which may affect the creating and searching speeds
- Hardly use directly. Basically, we need to convert a FEN into an internal board, using some callback functions, then check conditions with that board to return the results. That process may take too much time

IMO, FEN strings are not the solution. Some similar forms such as using board diagrams look like the same and are discarded quickly. We have considered several other forms such as using the list of pieces, mailbox… but then discarded them all, mostly because they are hard to compare, find pieces’ locations.

In the end, we decided to use another and popular one: bitboard. It has some advantages as the below:
- Being widely used for chess. All necessary techniques are mature and many developers have experience using them
- They are sets of integers that are more compact and faster to process than strings
- They have a fixed space, much better for speeds both creating and searching, compared with unfixed lengths of FENs
- Can process directly without converting (into internal boards)
- SQLite can work directly with them using normal comparisons and bitwise operators. That can speed up some searching, or using directly in common SQL GUIs/browsers

We use 8 bitboards to present a (standard) chess position. They include 6 bitboards for 6 piece types (a bitboard for each piece type), one for all pieces of White, one for Black. There is a redundancy here since we may omit one of the last two bitboards, but for being easier and speedy we use both of them. Another integer is used to store information about en-passant and castle rights.

Each bitboard is a 64-bit integer, 8 bytes. The extra integer is 4 bytes. All take 68 bytes, larger than FENs with an average length of 55 bytes.


1. Building the database
The chessboard in our database builder uses the mailbox representation thus it doesn’t have bitboards. We are going to create them (bitboards) when needed. It should be slower than typical chess engines because we didn’t optimize the code and did not update it on the fly.

Eight bitboards and the extra integer will be stored in a table as independent columns with straightforward names as White, Black, King, Queen, Rook, Bishop, Knight, Pawn, and Prop. The reason we don’t use blob here is that blobs are slower, almost impossible for such large numbers of positions (we have mentioned in a previous attempt), and harder to be queried directly.

Image
The table BitboardPositions with 8 bitboards columns

Similar to Attempt 6th, when creating the database we stored all bitboards with their hash keys in a temporary file first and at the end read them back to store in the database. When writing down to a database instead of one, it has 3 transactions to avoid SQLite keeping too much data in memory. As the result, the building process took very moderate memory for the whole time.

We may store hash keys with bitboards. However, bitboards themselves can be used for a position exactly matching. To save some space (8 bytes per position) we didn’t store them (hash keys).

Just a few more words about hash keys vs bitboards: Bitboards can be used to compare, similar to hash keys. However, when a hash key is a one-way solution when from the chessboard we can get the hash key but we can’t do it in a reverse way, from a hash key to the chessboard. In contrast, bitboards can do both ways with chessboards.

The stats are as the below:

Code: Select all

Completed writing bitboards, total: 199217076, one: 188479599, blob: 10737476, elapsed: 3449820ms 57:29, speed: 57747 hash keys/s
#games: 3456399, #hashKeys: 199217076, #posCnt: 548690734, elapsed: 4721277ms 1:18:41, speed: 732 games/s
In this attempt, some building parameters are quite similar to Attempt 6th (the one stores FENs into databases). Compared with Attempt 5th, the process of writing down hash keys took 57:29, 7.5 times longer than Attempt 5th (vs 06:41). That process took most of the extra time, causing the total time of creating the database to be 1:18:41, 3 times longer (vs previous one of 25:10).

The database size is increased significantly to 17.6 GB, 2.5 times larger than Attempt 5th (vs 6.95 GB). However, the size is surprisingly a bit smaller than Attempt 6th (18.41 GB). Perhaps, that is caused by the way SQLite stores integers. It has only one type of integer and stores as many bytes as it needs, not using fix the length, thus some 64-bit integers use fewer than 8 bytes.

All work well as we have planned. SUCCEED.


2. Searching

a. Matching exactly
We can use bitboards for position matching exactly. Instead of comparing only one 64-bit integer, now we need to compare 7 integers of 64-bit. We guess that should be slower.

The SQL statement is below. Note that we don’t need to compare the bitboard for Black because the rest is enough:

Code: Select all

SELECT GameID, MultiGameIDs FROM BitboardPositions WHERE White=? AND King=? AND Queen=? AND Rook=? AND Bishop=? AND Knight=? AND Pawn=? AND Prop=?
The stats are as the below:

Code: Select all

ply: 1, #total queries: 1000, elapsed: 2472795 ms, 41:12, time per query: 2472 ms, total results: 1137427224, results/query: 1137427
ply: 2, #total queries: 1000, elapsed: 1439571 ms, 23:59, time per query: 1439 ms, total results: 359585256, results/query: 359585
ply: 3, #total queries: 1000, elapsed: 1252114 ms, 20:52, time per query: 1252 ms, total results: 199675249, results/query: 199675
ply: 4, #total queries: 1000, elapsed: 946437 ms, 15:46, time per query: 946 ms, total results: 37933525, results/query: 37933
ply: 5, #total queries: 1000, elapsed: 792190 ms, 13:12, time per query: 792 ms, total results: 5547110, results/query: 5547
ply: 6, #total queries: 1000, elapsed: 770812 ms, 12:50, time per query: 770 ms, total results: 6512038, results/query: 6512
ply: 7, #total queries: 1000, elapsed: 740613 ms, 12:20, time per query: 740 ms, total results: 2446515, results/query: 2446
ply: 8, #total queries: 1000, elapsed: 787913 ms, 13:07, time per query: 787 ms, total results: 5303250, results/query: 5303

It was a surprise when the search was actually a bit faster than Attempt 5th (both attempts didn’t use any SQL index). Other parameters such as the number of results… are quite similar.

b. Approximately searching
To prepare for that kind of searching we create some callback functions to work with bitboards. They are simple, straightforward, taken from Chess Programming Wiki (CPW).
- popCount (population count) to count the number of bit 1 in a given bitboard
- firstOne (the longer name in CPW is bitScanForward) to get the bit-index of the least significant 1 bit in a given bitboard

There are some other functions for working with bitboards such as bitScanForwardWithReset, get bitboard for a position, convert bitboards back to a chessboard…

Basically, to get all results, the database has to scan all records in the table BitboardPositions to perform bitwise operators. For our test database, that table has over 200 million records. Scanning them all may take a while.

b.1. Find all games having 3 White Queens

White Queen is detected by applying an operator AND between bitboards White and Queen, then call the function popCount to get the total of bit 1s:

Code: Select all

popCount(White & Queen)=3
The query to answer the question is the bellow:

Code: Select all

SELECT * FROM BitboardPositions WHERE popCount(White & Queen)=3
The stats are as the below:

Code: Select all

Finding 3 White Queens DONE. Elapsed: 1491844 ms, 24:51, total results: 246, time per results: 6064 ms

There is a total of 246 satisfying games, the program needs about 6 seconds to find a game (from the database of 3.45 million games). That is much faster than we expected (when starting this project, we expected to wait for hours for that kind of task).

Bellow is a PGN of the first game - the (our) first-ever game found by that kind of searching. You can see at the move 23 the board has 3 White Queens!

[pgn]
[Event "Moscow"]
[White "Grigoriev, Nikolai"]
[Black "Alekhine, A."]
[Date "1915.01.01"]
[Result "1-0"]

1.e4 e6 2.d4 d5 3.Nc3 Nf6 4.Bg5 Bb4 5.e5 h6 6.exf6 hxg5 7.fxg7 Rg8 8.h4
gxh4 9.Qg4 Be7 10.g3 c5 11.gxh4 cxd4 12.h5 dxc3 13.h6 cxb2 14.Rb1 Qa5+ 15.
Ke2 Qxa2 16.h7 Qxb1 17.hxg8=Q+ Kd7 18.Qxf7 Qxc2+ 19.Kf3 Nc6 20.Qgxe6+ Kc7
21.Qf4+ Kb6 22.Qee3+ Bc5 23.g8=Q b1=Q 24.Rh6 1-0
[/pgn]

b.2. Find all positions with White Pawns in d4, e5, f4, g4, and Black King in b7

We create a bitboard (called MaskPawns) with bit 1 at d4, e5, g4 to mask and check Pawns. The conditions are as the below, using bitwise operators:

Code: Select all

(White & Pawn & MaskPawns) = MaskPawns
We need another bitboard (MaskKing) with b7 to check black King and the condition is a bit simpler:

Code: Select all

(Black & King) = MaskKing
The SQL statement is as the below:

Code: Select all

SELECT * FROM BitboardPositions WHERE (White & Pawn & ?)=? AND (Black & King)=?
We limit to extract the first 1000 results from the table BitboardPositions (just to save time, I don’t want to wait too long ;) )

Code: Select all

Finding 4 Pawns & King in specific squares DONE. Elapsed: 2618837 ms, 43:38, total results: 1037, time per results: 2525 ms
There are 1037 satisfying games for that first 1000 matches with an average of only 1 second to find a game.

Bellow is a PGN of one game from the results. You can see at move 26 the board has 4 White Pawns and the Black King are in the wanted locations (d4, e5, f4, g4, and b7, respectively).

[pgn]
[Event "Rhode Island"]
[White "Fredenburgh"]
[Black "Barry"]
[Date "1965.01.01"]
[Result "1-0"]

1.e4 c6 2.d4 d5 3.e5 Bf5 4.Ne2 e6 5.Ng3 Bg6 6.Bd3 Bxd3 7.Qxd3 h5 8.Qb3 Qc7
9.Nc3 Nd7 10.Bg5 h4 11.Nge2 b6 12.O-O-O Rh5 13.Bf4 Ne7 14.g4 Rh8 15.h3
O-O-O 16.Kb1 Qb7 17.Ng1 Ng6 18.Bh2 Ne7 19.f4 Nb8 20.Nf3 Qd7 21.Ng5 Ng8 22.
Rhe1 Re8 23.Rd2 Nh6 24.Nd1 Be7 25.Nf3 Bd8 26.Qd3 Kb7 27.Ne3 g6 28.Bg1 Na6
29.c3 c5 30.Bf2 cxd4 31.Qxd4 Qe7 32.Bxh4 Qc5 33.Bxd8 Rxd8 34.Ng5 Qxd4 35.
cxd4 Nb8 36.Rc2 Rc8 37.Kc1 Rxc2+ 38.Nxc2 Nc6 39.Re3 Rg8 40.Kd2 Rc8 41.Rc3
Rg8 42.Kd3 Rf8 43.a3 a6 44.Ne3 Ne7 45.Ke2 Rc8 46.Nh7 Rh8 47.Ng5 Rc8 48.Nd1
Rh8 49.Nf2 Nc6 50.Kd3 Nd8 51.Nh1 Nc6 52.Ng3 Ne7 53.Kd2 Rg8 54.Ke3 Rh8 55.
Kf3 Nc6 56.Rd3 Nd8 57.h4 Ng8 58.h5 Nh6 59.Rd1 Rg8 60.Rh1 Nc6 61.Rd1 Kc7
62.Nh7 gxh5 63.gxh5 Kd8 64.Ng5 Ke7 65.Rd2 Rc8 66.Nh7 Rg8 67.Nf6 Rg7 68.f5
Nxd4+ 69.Kf4 Ndxf5 70.Nxf5+ Nxf5 71.Nxd5+ exd5 72.Kxf5 Rh7 73.Rh2 b5 74.h6
a5 75.b3 Ke8 76.e6 Ke7 77.exf7 Kxf7 78.Ke5 Kg8 79.Rh5 1-0

[/pgn]


3. Quick conclusions for this attempt
We stored a full set of bitboards as a board representation for each chess position. That guarantees all necessary information is always available for any kind of position searching questions. A bitboard is a ready and easy form to process and suitable for bitwise operators of SQL engines.

By having bitboards, bitwise operators, and some of our provided callback functions for bitboards, an advance/experience user can query almost all types of position searching, including very complicated ones.

The trade-off of using bitboards:
- The database size and building time grow for a few times larger
- Not easy to use for average users
- Harder to apply for chess variants that have more than 64 cells

So far we have just created only a few tests, just to be sure both exactly matching and approximately searching can work well with them and with very reasonable times. More tests/questions will be added later.

Of course, our program is just the core/the basement/libraries (low levels) of chess position searching. For being used by average users, chess GUIs/tools still need to build their own user interface (upper levels), say, some dialog boxes or a query system, to help users to work with that.


4. Source code
All code of this Attempt has been pushed already into a new branch “searchwithbitboards”.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

Looks ok, I'm happy this is workable. One disadvantage is that bitwise operations prevent the index from being used, so all positional queries will result in full table scan. For your small database it's still in the order of seconds, but it's not really acceptable. I'd suggest adding a full position hash with an index for exact position queries to be fast at least. Also, some space optimizations can be made. I propose the following schema

Code: Select all

ID : uint64_t
Hash : uint64_t (with an index to support fast exact position queries)
White : uint64_t
Black : uint64_t
Queens : uint64_t
Rooks : uint64_t
Bishops : uint64_t
Knights : uint64_t
Pawns : uint64_t
GameID : uint32_t
Ep+Castling : uint8_t
WhiteKingSquare : uint8_t
BlackKingSquare : uint8_t
total 79 bytes, will be rounded either to 80 depending on implementation.

Alternatively to save space (16 bytes/20%) with a slight performance cost

Code: Select all

ID : uint64_t
Hash : uint64_t (with an index to support fast exact position queries)
White : uint64_t
RookLike : uint64_t
BishopLike : uint64_t
KNP0 : uint64_t
KNP1 : uint64_t
GameID : uint32_t
Ep+Castling : uint8_t
where the following bitboards can be retrieved in this way:

Code: Select all

White = White
Black = ~White & (RookLike | BishopLike | KNP0 | KNP1)
Queens = RookLike & BishopLike
Rooks = RookLike & ~BishopLike
Bishops = ~RookLike & BishopLike
Kings = KNP0 & KNP1
Knights = KNP0 & ~KNP1
Pawns = ~KNP0 & KNP1
(I still see multi game ids blob, is that to stay?)
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Sopel wrote: Sun Dec 26, 2021 1:03 pm Looks ok, I'm happy this is workable. One disadvantage is that bitwise operations prevent the index from being used, so all positional queries will result in full table scan. For your small database it's still in the order of seconds, but it's not really acceptable.
You so hurry, aren’t you? ;)

“not really acceptable” compared to what?

AFAIK, SCIDvsPC/CQL has to scan all games/records too and it may take a night (I have read that, @stevenaaus may correct me) to return results, not seconds/minutes as our tests.

Our solution is using bitboards, a kind of pre-computing thus it can help a lot to speed up searching, much faster than anything without such information.

Furthermore, SQLite can create the index for any column you want and that can speed up a lot. You worried about matching exactly using only bitboards is slower than using hash keys. However, if you make an index for even one bitboard column, say, White column, the speed should be almost the same with using hash keys + index.

I believe the bitboards+indexes can be comparable with any binary one on position searching. We surely win in flexibility, say, users can query almost no limited kinds of position-searching.

Sopel wrote: Sun Dec 26, 2021 1:03 pm I'd suggest adding a full position hash with an index for exact position queries to be fast at least. Also, some space optimizations can be made. I propose the following schema

Code: Select all

ID : uint64_t
Hash : uint64_t (with an index to support fast exact position queries)
White : uint64_t
Black : uint64_t
Queens : uint64_t
Rooks : uint64_t
Bishops : uint64_t
Knights : uint64_t
Pawns : uint64_t
GameID : uint32_t
Ep+Castling : uint8_t
WhiteKingSquare : uint8_t
BlackKingSquare : uint8_t
total 79 bytes, will be rounded either to 80 depending on implementation.

Alternatively to save space (16 bytes/20%) with a slight performance cost

Code: Select all

ID : uint64_t
Hash : uint64_t (with an index to support fast exact position queries)
White : uint64_t
RookLike : uint64_t
BishopLike : uint64_t
KNP0 : uint64_t
KNP1 : uint64_t
GameID : uint32_t
Ep+Castling : uint8_t
where the following bitboards can be retrieved in this way:

Code: Select all

White = White
Black = ~White & (RookLike | BishopLike | KNP0 | KNP1)
Queens = RookLike & BishopLike
Rooks = RookLike & ~BishopLike
Bishops = ~RookLike & BishopLike
Kings = KNP0 & KNP1
Knights = KNP0 & ~KNP1
Pawns = ~KNP0 & KNP1
(I still see multi game ids blob, is that to stay?)
Your suggestions are useful, I will try some ideas. Thanks!
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager