Open Chess Game Database Standard

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Open Chess Game Database Standard

Post by mvanthoor »

dangi12012 wrote: Mon Dec 20, 2021 4:44 pm This scales really well accross multiple cores :)
Very good. In that case I don't really care if a solution such as this is slower than either SCID or Chessbase. Even if SCID and/or ChesssBase are 10x faster (on a single core) than this SQL solution (on a single core), the number of cores will win out in the end. My next computer is going to have at least 16 cores and 32 threads, so I'd welcome a multi-threaded solution.

At the moment I don't have a lot of time to spend on chess programming, but when I start on my own GUI (just because I want to, same as with the engine), it'll probably implement either this, or an SQL solution as well.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
emadsen
Posts: 440
Joined: Thu Apr 26, 2012 1:51 am
Location: Oak Park, IL, USA
Full name: Erik Madsen

Re: Open Chess Game Database Standard

Post by emadsen »

mvanthoor wrote: Mon Dec 20, 2021 3:22 pm It would be great if SQL can somehow be parallelized when searching and storing. The main problem in ChessBase (at least my old version 12 from 2012) is that it is completely single-threaded. I don't know if this is still the case with ChessBase 16, but if so, buying a computer with more cores is completely useless.
I don't have any insight into if / how the new Hiarcs software (Hiarcs Chess Explorer Pro) and underlying SQLite database leverage multiple CPUs. However...

The Hiarcs development team has indicated their new software enables import of ChessBase databases. And once imported into Hiarc's format (that uses SQLite internally), searches execute faster than in ChessBase. Testers have indicated satisfaction with the beta software. Those of us who purchase it will find out for ourselves at the end of the month (I hope).

I mention Hiarcs in this thread because, in addition to the OP's efforts, the performance of HCE Pro may settle the question about whether SQL technology can perform adequately for large (multi-million game) databases.
Erik Madsen | My C# chess engine: https://www.madchess.net
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Fulvio wrote: Fri Dec 17, 2021 12:06 pm Maybe to have something more generic you could try to keep the game data separate from the tables used for position searches.
If I read the code correctly, at the moment you don't even memorize all the game tags?
You are right. The current database stores some basic/important PGN tags but not all.
Fulvio wrote: Fri Dec 17, 2021 12:06 pm The basic tables in my opinion are:

Code: Select all

PLAYERS
------
PlayerID
Surname
Name
Birth_date
FIDE_ID
The FIDE ID and date of birth are important because often the names of the players are incorrect or incomplete (Magnus, C).
You need those to be able to find all the games of a player (in SCID these data are stored in the "spelling" file).
On one hand, I agreed with you about the structure of table Players. Having more fields is better than having fewer. Some important info may lose if we combine fields.

On the other hand, it is a dilemma for having such extra fields because they should be entered from some specific sources but hardly or cannot automatically be entered from game collections. Database creators may have to spend a lot of labor to match, fix missing/conflicts between games data and players' data. The data may be fine for some top players, but for the rest as well as for engines games we don’t have enough data to fill them. Most of them may be empty forever.

I will consider those extra fields. One of the strong points of SQL databases (compared with binary ones) is that they are easy to add, modify, remove fields. Some changes in data fields/structures are just a matter of some commands/minutes.
Fulvio wrote: Fri Dec 17, 2021 12:06 pm

Code: Select all

GAMES
-------
GameID
Date
White_playerID
Black_playerID
MoveText
Current our Games table has more fields than that. It has all 7-basic PGN tags and plus some more:

Code: Select all

Games
-------
EventID
SiteID
Date
Round
WhiteID
WhiteElo
BlackID
BlackElo
Result
Timer
ECO
PlyCount
FEN
Moves
Again, adding more fields is not a problem for SQL databases. Even I may implement a feature of adding fields dynamically, on the fly. It means when creating, the database may have all PGN tags found from PGN files.
Fulvio wrote: Fri Dec 17, 2021 12:06 pm
Here what can be debated is how to memorize the moves.
Maybe just use the PGN notation, perhaps with some enhancements, such as the uci notation instead of SAN.

Code: Select all

GAME_TAGS
----------
Tag_name
Tag_value
GameID
Fulvio wrote: Fri Dec 17, 2021 12:06 pm
I believe this is it all for a standard database, although I may have forgotten something.
Tables for the position search can then be added separately.
I prefer moves in coordinate/UCI notation. However, there is no decision yet nor enough considerations/tests.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

ndbd wrote: Sun Dec 19, 2021 10:37 pm I am lurking here and reading this thread with interest. First I want to say that I think BanksiaGUI is a great program and obviously Pham is a talented programmer. But:
Please note that the main purpose of this project is a standard to share game databases between programs and can be used directly for some programs. IMHO, it has reached about 90% of the purpose (to share and to be able to be used directly in some programs).

I have been working on other targets: pushing all boundaries to see how far we can go.
ndbd wrote: Sun Dec 19, 2021 10:37 pm
There are two existing widely used database formats, CBH/CBF/CBV (Chessbase) and SCID.
Those formats are not the answer. Some are proprietary of some companies and/or have some strict licenses which could not be supported by other programs (Banksia GUI can't use SCID code since it is still a close source).

All those formats have some disadvantages I have mentioned in the project's GitHub repository: hard to support other chess variants; hard to convert to other formats; fixed data structures (hard to modify or add new fields); supported by very specific software...
ndbd wrote: Sun Dec 19, 2021 10:37 pm All the issues that are talked about in this thread have been mostly solved already with these formats. Moreover it is not difficult to enhance these formats.
If ChessBase can do something with their software doesn’t mean other software can do it, our knowledge, our understanding can get benefits from it since ChessBase won’t share anything. I knew someone did reverse engineering (it is a kind of hack IMO) to get Chessbase database formats but they got nothing about position-searching because it is their secret algorithm.

The situation is better for SCID since it is open source but it is still a problem I will mention below.
ndbd wrote: Sun Dec 19, 2021 10:37 pm
They are, after all, essentially binary database tables with custom index informations, ideally suited for chess.
So far, based on my tests, attempts, I can confirm SQL can do somewhat the same levels for chess databases. Even binary databases can be a bit faster and a few times smaller (not big advantages with modern hardware) but SQL has many stronger points too.

What the missing (and we have been working on) is position-searching. However, that feature (of SCID) didn't come with the original design but come as an extra work of a 3rd party (by the authors of Chess Query Language, for only one but all folks of SCID). If one day someone develops that feature for SQL databases, we will have it too. Otherwise, we have been moving anyway...
ndbd wrote: Sun Dec 19, 2021 10:37 pm The original intention here was to figure out if SQL or SQLlite creates an _easy_ and straight-forward way of maintaining, storing and accessing game information, and all that without required space not going rampant. With several tries required by a skilled programmer, it is clear that it's not straight-forward and so easy, and that thought must be put into everything.

The fact that Bill Forster (Tarrasch GUI, has a blog post about that) didn't get it working quickly and opted for an in-memory approach supports the above. Indeed, it is very likely that Shane Hudson also considered SQL when designing SCID.
That just confirmed it is a big challenge!

IMHO, it is not right if we all give up easily without reviewing and trying after a such long period (all those databases are over 20 years old).

If we don't try we don't know.
ndbd wrote: Sun Dec 19, 2021 10:37 pm There is a vast material out there in CBH/CBF/CBV and SCID, and it is most easy for users if they can keep their data in the format. It also makes it easy for them to transition from an existing program to another.

If the time to figure out how to get a clean fast and nice SQL implementation would have been spent to document the SCID file-format and implement it, a fast solution might already be in production. There are some issues with SCID4, like the hard-coded index value which limits databases to maximum of 2^32 games, but these things can be easily extended. It is also not difficult to imagine more tables (more indices) if one wants to index certain searches more efficiently. Even though position search is so fast in SCID, that it might not be required.
For clearing: developing, fixing SCID… are the tasks of their developers such as Fulvio, stevenaaus..., not me (who even is not the user nor understanding their structures and codes). I knew they have been working hard for 20 years to improve and they are responsible for requirements from users.

Frankly speaking, if SCID, Chessbase's software could easily share databases (not via PGN) I wouldn't bored to think about a new database format. Many people have been waiting for that feature for 20 years too. Should we wait for more time?
ndbd wrote: Sun Dec 19, 2021 10:37 pm
As for CBH/CBF/CBV, there is quite a degree of information out there, for example here github.com/antoyo/uncbv or even here on talkchess.
Those databases/files are proprietary of Chessbase. Since Chessbase doesn't want to share why do we insist? You can't get their position-searching anyway.
ndbd wrote: Sun Dec 19, 2021 10:37 pm Again, I'd love to see database support coming to BanksiaGUI, but I think you guys make everything more difficult than it has to be. Also there are some things that I think are currently not addressed, like searching for comments within games...
No, I did not want to add nor face difficulties. Simply old ways won't work for me.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

emadsen wrote: Mon Dec 20, 2021 3:26 am ...
I'd like to point out the new version of Hiarcs Chess Explorer uses SQLite. See "HCE NextGen will internally use Sqlite database" stated by a developer on the Hiarcs team.
It is nice to see commercial software supporting SQLite databases!

From what I have read, HCE will use the SQLite database internally only. It means (guessing) it will write down databases in their own binary formats. They need converting for any in-out of the database. Extra steps! I guess they do that just for protecting their database structures as well as the content itself.

Actually, there are some better ways to protect data without extra steps of converting. SQLite has some extensions for that tasks, using public key tech. Almost noway can break it.

BTW, just hope they won’t be too strict with all kinds of data and can share basic ones with other software, say, via this standard ;)
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

How do you query the games in which a given position occured after a specific date with the current setup?
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Sopel wrote: Tue Dec 21, 2021 2:23 pm How do you query the games in which a given position occured after a specific date with the current setup?
Dates are stored as TEXT, you may compare directly or use some build-in functions such as date(), strftime()...

For example (the last line):

Code: Select all

SELECT g.ID, g.Round, Date, w.Name White, WhiteElo, b.Name Black, BlackElo, Result, Timer, ECO, PlyCount, FEN, Moves
FROM Games g
INNER JOIN Players w ON WhiteID = w.ID
INNER JOIN Players b ON BlackID = b.ID 
WHERE inResultSet(g.ID)
AND Date > "2012-10-18"
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

phhnguyen wrote: Tue Dec 21, 2021 10:49 pm
Sopel wrote: Tue Dec 21, 2021 2:23 pm How do you query the games in which a given position occured after a specific date with the current setup?
Dates are stored as TEXT, you may compare directly or use some build-in functions such as date(), strftime()...

For example (the last line):

Code: Select all

SELECT g.ID, g.Round, Date, w.Name White, WhiteElo, b.Name Black, BlackElo, Result, Timer, ECO, PlyCount, FEN, Moves
FROM Games g
INNER JOIN Players w ON WhiteID = w.ID
INNER JOIN Players b ON BlackID = b.ID 
WHERE inResultSet(g.ID)
AND Date > "2012-10-18"
This doesn't look right. Where's the comparison to the provided fen?
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
User avatar
emadsen
Posts: 440
Joined: Thu Apr 26, 2012 1:51 am
Location: Oak Park, IL, USA
Full name: Erik Madsen

Re: Open Chess Game Database Standard

Post by emadsen »

phhnguyen wrote: Tue Dec 21, 2021 5:36 am From what I have read, HCE will use the SQLite database internally only... Actually, there are some better ways to protect data without extra steps of converting. SQLite has some extensions for that tasks, using public key tech.

BTW, just hope they won’t be too strict with all kinds of data and can share basic ones with other software, say, via this standard ;)
We'll find out soon whether the HCE Pro database file is readable by a SQLite GUI. I have my doubts though. The Hiarcs team needs to cover their costs. The performance of very large databases is one of the most important value propositions of their commercial software. So I don't expect the HCE Pro DB file to be readable by anything but HCE Pro.

But we'll find out soon.
Erik Madsen | My C# chess engine: https://www.madchess.net
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Sopel wrote: Wed Dec 22, 2021 12:52 pm
phhnguyen wrote: Tue Dec 21, 2021 10:49 pm
Sopel wrote: Tue Dec 21, 2021 2:23 pm How do you query the games in which a given position occured after a specific date with the current setup?
Dates are stored as TEXT, you may compare directly or use some build-in functions such as date(), strftime()...

For example (the last line):

Code: Select all

SELECT g.ID, g.Round, Date, w.Name White, WhiteElo, b.Name Black, BlackElo, Result, Timer, ECO, PlyCount, FEN, Moves
FROM Games g
INNER JOIN Players w ON WhiteID = w.ID
INNER JOIN Players b ON BlackID = b.ID 
WHERE inResultSet(g.ID)
AND Date > "2012-10-18"
This doesn't look right. Where's the comparison to the provided fen?
At the moment the database doesn’t store FENs but hash keys thus you can’t query using FENs directly. You need to convert the given FEN to a hash key first. You can find the function makehash in our code to do that task. Or you can use any function from the 3rd parties since we are using hash keys as Polyglot (opening books) ones.

Once you have a hash key you can start querying. The table HashPositions contains the info about hash keys and their corespondent game IDs. With game IDs you can get the game’s information (such as players, results, dates, moves…) from the table Games.

With the database of Attempt 4th, you may need only one query for all game information, using a SELECT-JOIN statement. With the database of Attempt 5th, you need two queries, the first one to get the game ID list (for the given hash key), the second one to query Games (for game information) with that list of game IDs.

There is a function named benchMatchingMoves in the code to measure the speed of querying positions. You may use it as a simple example of how to query chess positions.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager