Open Chess Game Database Standard

Discussion of chess software programming and technical issues.

Moderator: Ras

amanjpro
Posts: 883
Joined: Sat Mar 13, 2021 1:47 am
Full name: Amanj Sherwany

Re: Open Chess Game Database Standard

Post by amanjpro »

phhnguyen wrote: Thu Dec 23, 2021 3:07 am
Sopel wrote: Wed Dec 22, 2021 12:52 pm
phhnguyen wrote: Tue Dec 21, 2021 10:49 pm
Sopel wrote: Tue Dec 21, 2021 2:23 pm How do you query the games in which a given position occured after a specific date with the current setup?
Dates are stored as TEXT, you may compare directly or use some build-in functions such as date(), strftime()...

For example (the last line):

Code: Select all

SELECT g.ID, g.Round, Date, w.Name White, WhiteElo, b.Name Black, BlackElo, Result, Timer, ECO, PlyCount, FEN, Moves
FROM Games g
INNER JOIN Players w ON WhiteID = w.ID
INNER JOIN Players b ON BlackID = b.ID 
WHERE inResultSet(g.ID)
AND Date > "2012-10-18"
This doesn't look right. Where's the comparison to the provided fen?
At the moment the database doesn’t store FENs but hash keys thus you can’t query using FENs directly. You need to convert the given FEN to a hash key first. You can find the function makehash in our code to do that task. Or you can use any function from the 3rd parties since we are using hash keys as Polyglot (opening books) ones.

Once you have a hash key you can start querying. The table HashPositions contains the info about hash keys and their corespondent game IDs. With game IDs you can get the game’s information (such as players, results, dates, moves…) from the table Games.

With the database of Attempt 4th, you may need only one query for all game information, using a SELECT-JOIN statement. With the database of Attempt 5th, you need two queries, the first one to get the game ID list (for the given hash key), the second one to query Games (for game information) with that list of game IDs.

There is a function named benchMatchingMoves in the code to measure the speed of querying positions. You may use it as a simple example of how to query chess positions.
That is the point. Imagine in attempt 5, you want to search for all games that has a position after d4-d5, and is played by grand masters. You get 40000 games. Now you send the second query containing 40000 game IDs, and unfortunately there is no easy way to do that.

While in attempt 4 (with pagination), you send one query, the db sends back one page (pagination is like limit, but you also get a handle to ask for the next pages). And that would be the end of it.

Databases are excellent tools for searching indexed data, lets use it and off load as much task as possible to them
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

amanjpro wrote: Thu Dec 23, 2021 3:18 am
That is the point. Imagine in attempt 5, you want to search for all games that has a position after d4-d5, and is played by grand masters. You get 40000 games. Now you send the second query containing 40000 game IDs,
We have some real stats already. With our test database of 3.45 million games (the real one from the Rebel website), the largest number of games belonging to a position is 1.6 million (much bigger than your number ;) ). In Attempt 5th, we store them as an array of integers, 4 bytes each, thus the total size is 6.4 MB, just the size of a small image. Even you create a database 10 bigger and the size becomes 64 MB, that is almost nothing (to load and process) with modern computers!

Please note that there are only a few positions that have that number. Other ones have much smaller numbers. Almost all (96%) have only one game ID.
amanjpro wrote: Thu Dec 23, 2021 3:18 am and unfortunately there is no easy way to do that.
Nothing easy. However, in that case, we found and implemented an already reasonable/runnable solution. Please read more on the post about Attempt 5th.
amanjpro wrote: Thu Dec 23, 2021 3:18 am While in attempt 4 (with pagination), you send one query, the db sends back one page (pagination is like limit, but you also get a handle to ask for the next pages). And that would be the end of it.

Databases are excellent tools for searching indexed data, lets use it and off load as much task as possible to them
Let me quote myself from a previous post if you are still confused about the results of Attempt 5th. This one was run smoothly on my old 5-year-old computer and compared with Attempt 4th:
phhnguyen wrote: Sat Dec 18, 2021 12:29 am just look again into the end result: the database size took only half size and the search speed is 50 times faster. Even each test queries twice, loading blobs, scanning all results but takes (average) under 3 seconds - very fast for a database of over 3 million games.

Why do you have to worry? What do you expect more? ;)
If we all are happy with Attempt 4th, life is so easy! A small problem is that the results of Attempt 5th are… more attractive in some aspects, IMHO ;)

BTW, we didn’t stick on, select anything. We have been pushing all boundaries we could for understanding. Considering, selecting… will be done much later when we have more information!

All have been just starting…
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
amanjpro
Posts: 883
Joined: Sat Mar 13, 2021 1:47 am
Full name: Amanj Sherwany

Re: Open Chess Game Database Standard

Post by amanjpro »

phhnguyen wrote: Thu Dec 23, 2021 5:31 am
amanjpro wrote: Thu Dec 23, 2021 3:18 am
That is the point. Imagine in attempt 5, you want to search for all games that has a position after d4-d5, and is played by grand masters. You get 40000 games. Now you send the second query containing 40000 game IDs,
We have some real stats already. With our test database of 3.45 million games (the real one from the Rebel website), the largest number of games belonging to a position is 1.6 million (much bigger than your number ;) ). In Attempt 5th we store them as an array of integers, 4 bytes each, thus the total size is 6.4 MB, just the size of a small image. Even you create a database 10 bigger and the size becomes 64 MB, that is almost nothing (to load and process) with modern computers!

Please note that there are only a few positions that have that number. Other ones have much smaller numbers. Almost all (96%) have only one game ID.
amanjpro wrote: Thu Dec 23, 2021 3:18 am and unfortunately there is no easy way to do that.
Nothing easy. However, in that case, we found and implemented an already reasonable/runnable solution. Please read more on the post about Attempt 5th.
amanjpro wrote: Thu Dec 23, 2021 3:18 am While in attempt 4 (with pagination), you send one query, the db sends back one page (pagination is like limit, but you also get a handle to ask for the next pages). And that would be the end of it.

Databases are excellent tools for searching indexed data, lets use it and off load as much task as possible to them
Let me quote myself from a previous post if you are still confused about the results of Attempt 5th. This one was run smoothly on my old 5-year-old computer and compared with Attempt 4th:
phhnguyen wrote: Sat Dec 18, 2021 12:29 am just look again into the end result: the database size took only half size and the search speed is 50 times faster. Even each test queries twice, loading blobs, scanning all results but takes (average) under 3 seconds - very fast for a database of over 3 million games.

Why do you have to worry? What do you expect more? ;)
If we all are happy with Attempt 4th, life is so easy! The problem is that the results of Attempt 5th are… more extractive in some aspects, IMHO ;)

BTW, we didn’t stick on, select anything. We have been pushing all boundaries we could for understanding. Considering, selecting… will be done much later when we have more information!

All have been just starting…
I actually am following the development of the attempts. But I believe in attempt 4 most of the energy was wasted on creating the resultset rather than executing the query plan. With pagination we can remove this bottleneck. If you still have your attempt 4 available, can you please test it with pagination, it is really really worth it
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

amanjpro wrote: Thu Dec 23, 2021 5:45 am
I actually am following the development of the attempts. But I believe in attempt 4 most of the energy was wasted on creating the resultset rather than executing the query plan. With pagination we can remove this bottleneck. If you still have your attempt 4 available, can you please test it with pagination, it is really really worth it
You and @Sopel have been insisting on pagination (especially Sopel, repeated in several posts). Frankly speaking, I don't know how to apply it to SQLite and I don't know either if it can solve the bottleneck.

If you and/or Sopel could, please help to implement and test instead. This is an open-source, open standard, feel free to push your results and code. Any effort is more than welcome!
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
amanjpro
Posts: 883
Joined: Sat Mar 13, 2021 1:47 am
Full name: Amanj Sherwany

Re: Open Chess Game Database Standard

Post by amanjpro »

phhnguyen wrote: Thu Dec 23, 2021 6:10 am
amanjpro wrote: Thu Dec 23, 2021 5:45 am
I actually am following the development of the attempts. But I believe in attempt 4 most of the energy was wasted on creating the resultset rather than executing the query plan. With pagination we can remove this bottleneck. If you still have your attempt 4 available, can you please test it with pagination, it is really really worth it
You and @Sopel have been insisting on pagination (especially Sopel, repeated in several posts). Frankly speaking, I don't know how to apply it to SQLite and I don't know either if it can solve the bottleneck.

If you and/or Sopel could, please help to implement and test instead. This is an open-source, open standard, feel free to push your results and code. Any effort is more than welcome!
I'll submit a PR if it is ok
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

phhnguyen wrote: Thu Dec 23, 2021 3:07 am
Sopel wrote: Wed Dec 22, 2021 12:52 pm
phhnguyen wrote: Tue Dec 21, 2021 10:49 pm
Sopel wrote: Tue Dec 21, 2021 2:23 pm How do you query the games in which a given position occured after a specific date with the current setup?
Dates are stored as TEXT, you may compare directly or use some build-in functions such as date(), strftime()...

For example (the last line):

Code: Select all

SELECT g.ID, g.Round, Date, w.Name White, WhiteElo, b.Name Black, BlackElo, Result, Timer, ECO, PlyCount, FEN, Moves
FROM Games g
INNER JOIN Players w ON WhiteID = w.ID
INNER JOIN Players b ON BlackID = b.ID 
WHERE inResultSet(g.ID)
AND Date > "2012-10-18"
This doesn't look right. Where's the comparison to the provided fen?
At the moment the database doesn’t store FENs but hash keys thus you can’t query using FENs directly. You need to convert the given FEN to a hash key first. You can find the function makehash in our code to do that task. Or you can use any function from the 3rd parties since we are using hash keys as Polyglot (opening books) ones.

Once you have a hash key you can start querying. The table HashPositions contains the info about hash keys and their corespondent game IDs. With game IDs you can get the game’s information (such as players, results, dates, moves…) from the table Games.

With the database of Attempt 4th, you may need only one query for all game information, using a SELECT-JOIN statement. With the database of Attempt 5th, you need two queries, the first one to get the game ID list (for the given hash key), the second one to query Games (for game information) with that list of game IDs.
You see my point now I hope. Queries like GROUP BY date for example become O(n*k) where n is the number of bins and k is the number of occurences of a given position (instead of O(k) like in a normal DB) (Unless you want to do this logic in the application, in which case what's the purpose of the SQL database? Let's just do everything in the application like the other existing software does.).

Querying all games given a set of ids is also not quite trivial to do either (required for above), utilizing https://dev.mysql.com/doc/refman/8.0/en ... ind-in-set (I can already see that 10GB query string). Does it work with things other than strings? I don't know.

Please note that there are only a few positions that have that number. Other ones have much smaller numbers. Almost all (96%) have only one game ID.
Please note that position's interstingness is directly proportional to the number of occurences.


This won't EVER be as fast or space efficient as SCID or chess_pos_db are for example for the tasks they can do. I'd much rather see interesting features than trying to enforce crippling limitations in pursue of an impossible goal.


re pagination: https://stackoverflow.com/a/3799223/3763139
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

phhnguyen wrote: Thu Dec 23, 2021 6:10 am
amanjpro wrote: Thu Dec 23, 2021 5:45 am
I actually am following the development of the attempts. But I believe in attempt 4 most of the energy was wasted on creating the resultset rather than executing the query plan. With pagination we can remove this bottleneck. If you still have your attempt 4 available, can you please test it with pagination, it is really really worth it
You and @Sopel have been insisting on pagination (especially Sopel, repeated in several posts). Frankly speaking, I don't know how to apply it to SQLite and I don't know either if it can solve the bottleneck.

If you and/or Sopel could, please help to implement and test instead. This is an open-source, open standard, feel free to push your results and code. Any effort is more than welcome!
Sopel is not interested in providing any solutions as you keep seeing he keeps moving his goalpoast. He wants you to do the work and then complain about something else. A constructive community will ignore such people to keep pushing forward.
Amanjpro on the other hand seems like a genuine constructive person - since he will develop himself and contribute something. And in the above 2 posts you see the exact difference night and day.

You phhnguyen have provided a very good standard solution that is fast enough and flexible enough to accomodate everyones needs. Especially because its sql if you need another format you can always export it to csv with a simple query.

And sqlite dbs are also a well defined format. So you can copy them from windows to android to linux and back to some raspberry pi and it will all work and at the end of it you are not forced to one particular language. You can use python, java, c#, c++, c, etc..

This is truly a one size fits all solution.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
op12no2
Posts: 547
Joined: Tue Feb 04, 2014 12:25 pm
Location: Gower, Wales
Full name: Colin Jenkins

Re: Open Chess Game Database Standard

Post by op12no2 »

dangi12012 wrote: Thu Dec 23, 2021 10:26 pm
Sopel is not interested in providing any solutions as you keep seeing he keeps moving his goalpoast. He wants you to do the work and then complain about something else. A constructive community will ignore such people to keep pushing forward.
FFS you really are a twat.
amanjpro
Posts: 883
Joined: Sat Mar 13, 2021 1:47 am
Full name: Amanj Sherwany

Re: Open Chess Game Database Standard

Post by amanjpro »

dangi12012 wrote: Thu Dec 23, 2021 10:26 pm
phhnguyen wrote: Thu Dec 23, 2021 6:10 am
amanjpro wrote: Thu Dec 23, 2021 5:45 am
I actually am following the development of the attempts. But I believe in attempt 4 most of the energy was wasted on creating the resultset rather than executing the query plan. With pagination we can remove this bottleneck. If you still have your attempt 4 available, can you please test it with pagination, it is really really worth it
You and @Sopel have been insisting on pagination (especially Sopel, repeated in several posts). Frankly speaking, I don't know how to apply it to SQLite and I don't know either if it can solve the bottleneck.

If you and/or Sopel could, please help to implement and test instead. This is an open-source, open standard, feel free to push your results and code. Any effort is more than welcome!
Sopel is not interested in providing any solutions as you keep seeing he keeps moving his goalpoast. He wants you to do the work and then complain about something else. A constructive community will ignore such people to keep pushing forward.
Amanjpro on the other hand seems like a genuine constructive person - since he will develop himself and contribute something. And in the above 2 posts you see the exact difference night and day.

You phhnguyen have provided a very good standard solution that is fast enough and flexible enough to accomodate everyones needs. Especially because its sql if you need another format you can always export it to csv with a simple query.

And sqlite dbs are also a well defined format. So you can copy them from windows to android to linux and back to some raspberry pi and it will all work and at the end of it you are not forced to one particular language. You can use python, java, c#, c++, c, etc..

This is truly a one size fits all solution.
Can you please calm down. You are the only one in this thread that derails the conversations. Move your hatred towards Sopel to some other thread. Respect phhnguyen's efforts please
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Please be polite and patient with each other. Please don’t make personal attacks or blame anyone. We all are adults, not kids!

BTW, as the author/creator of the topic, I have been always treating you all with the highest equal respect.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager