Open Chess Game Database Standard

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Fulvio wrote: Thu Nov 18, 2021 6:58 am
phhnguyen wrote: Thu Nov 18, 2021 2:11 am As I have posted in some previous pages, matching exactly given positions is just a small part of searching, useful mostly for opening positions but not from middle games. IMHO, it is not hard to implement since we can use some simple techniques such as storing/matching hash keys or even just matching some beginning moves.
Do you already have some benchmarks?
For example, how long does it take in that database to find all the Sicilian games (e4 c5), calculate the statistics, and show the 10 games with a higher elo average that contain that position?
Not yet. Zero implemented (for searching). I am going around just for... chatting ;)

BTW, if you could, give me yours (benchmarks) first, I will try later to compare.
Fulvio wrote: Thu Nov 18, 2021 6:58 am
phhnguyen wrote: Thu Nov 18, 2021 2:11 am The challenge is about searching appropriately, say, finding all positions having 3 Queens or given Pawn structures… Do you have any video showing that kind of searching (approximately)?
I don't have a video, but you can download it and try (in that database there are 108 games where at least one of the players has 3 queens, even one from Nakamura! And there are 3 games where both have 3 queens!).
The problem is I don't know how to use your app properly.
Fulvio wrote: Thu Nov 18, 2021 6:58 am
phhnguyen wrote: Thu Nov 18, 2021 2:11 am On the other hand, searching (PGN) header is the (very) strong point of SQL. You may beat it in some very specific searches but in general, you can’t win an SQL engine about the flexibility, coverage, average speed… :D
I can agree on flexibility and scalability, but based on the benchmarks you posted even that is 20 times slower.
It was the first experiment. We still have a lot of space to improve. The problem is that I have been being happy with that speed thus I don't rush to improve ;)

Just remind you that the app now can convert faster than SCID4 when using only one single thread and I have been waiting for some significant-faster improvements from @dangi12012!

BTW, don't take it seriously since we may compare apple-orange and our machines are quite different.
Fulvio wrote: Thu Nov 18, 2021 6:58 am (and you won't expect the average user to enter SQL queries, right? You will still need to create a more user friendly interface).
I don't have any problem designing and implementing a user interface for searching. Will do it when needed.

However, SQL itself is an excellent interface. Sometimes I am not sure which one is easier for the average user: learning using a complicated dialog box (for searching) or learning/editing some SQL statements? (They may have a list of SQL templates, just select and re-edit)

Image
Users can query by using SQL statements
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

stevenaaus wrote: Thu Nov 18, 2021 4:20 am Just seems to me position match is too important to ignore a binary dB...
Yes, that format may exist forever since our computers prefer binary to text ;)

We are working on a new standard which is for exchanging data between different apps and/or for some specific apps, purposes, not (and cannot) for destroying other formats.

BTW, SQL can contain binary data thus I believe its databases can do what a binary one can do. Even working with humans in text forms, SQL databases are actually binary ones. SQLite is open-source, integrated directly into our code. If needed, I don't mind changing its code to reach our goals.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
stevenaaus
Posts: 613
Joined: Wed Oct 13, 2010 9:44 am
Location: Australia

Re: Open Chess Game Database Standard

Post by stevenaaus »

We are working on a new standard which is for exchanging data between different apps and/or for some specific apps, purposes, not (and cannot) for destroying other formats
Sure, i appreciate this. Probably even a bigger tangent but I just remembered the name of the ancient MySQL flavoured chess project i once looked at.
http://jose-chess.sourceforge.net/index.html
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

phhnguyen wrote: Thu Nov 18, 2021 8:51 am Just remind you that the app now can convert faster than SCID4 when using only one single thread and I have been waiting for some significant-faster improvements from @dangi12012!
Dont wait for me I think its fast enough as it is. You already implemented the main points and multithreading should be reserved for the very very top level of any algorithm. (I dont want my Math.Sine() to spawn 64 Threads thank you)
Maybe support importing multiple pgns at the same time? For example a whole folder - like it is forces if you import the a lot lichess dbs.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

phhnguyen wrote: Thu Nov 18, 2021 9:07 am
stevenaaus wrote: Thu Nov 18, 2021 4:20 am Just seems to me position match is too important to ignore a binary dB...
Yes, that format may exist forever since our computers prefer binary to text ;)

We are working on a new standard which is for exchanging data between different apps and/or for some specific apps, purposes, not (and cannot) for destroying other formats.

BTW, SQL can contain binary data thus I believe its databases can do what a binary one can do. Even working with humans in text forms, SQL databases are actually binary ones. SQLite is open-source, integrated directly into our code. If needed, I don't mind changing its code to reach our goals.
Thats 100% right. People seem to forget that sql binary blobs exist and can be stored like any other column.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: Open Chess Game Database Standard

Post by Fulvio »

phhnguyen wrote: Thu Nov 18, 2021 8:51 am Just remind you that the app now can convert faster than SCID4 when using only one single thread and I have been waiting for some significant-faster improvements from @dangi12012!

BTW, don't take it seriously since we may compare apple-orange and our machines are quite different.
Yes, it seems in fact that you are comparing two different approach.
When SCID imports a PGN it recreates all the positions it contains and checks the legality of all the moves.
While, taking a quick look at your code, it seems to me that it just puts all the "MoveText section" into the database.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

Fulvio wrote: Thu Nov 18, 2021 10:28 am
phhnguyen wrote: Thu Nov 18, 2021 8:51 am Just remind you that the app now can convert faster than SCID4 when using only one single thread and I have been waiting for some significant-faster improvements from @dangi12012!

BTW, don't take it seriously since we may compare apple-orange and our machines are quite different.
Yes, it seems in fact that you are comparing two different approach.
When SCID imports a PGN it recreates all the positions it contains and checks the legality of all the moves.
While, taking a quick look at your code, it seems to me that it just puts all the "MoveText section" into the database.
Well then we have to work on the standard. I would like this standard to query each and every position + comments or eval tags.
It would be great for a GUI to drop this in and you get (like on lichess.com) all the games this position was reached and you can filter by elo etc.

Could be a such great preparation tool: Select this player - and find all his usual openings where he plays an inaccuracy
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Update:

I continue twisting the code for performance. The code has been cleaned again, some functions have been rewritten for being optimized. Names of tables, fields are changed closer to PGN tags as a part of the standard.

I have implemented some main tricks to reduce the number of executions of SQL statements: store and extract name/ID in maps; manage myself the IDs instead of inserting/querying from SQL engines.

The result is good. Even still using only a single thread, now the time to convert PGN files can reduce about 30%. It can convert 3.45 million games and write to a file in under 1/2 minute (30 seconds) on my computer.

The code has been pushed already to the repository.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Open Chess Game Database Standard

Post by Dann Corbit »

I had to change the code for Windows. Here is the link:


fseek()/ftell() do not work for files bigger than 32 bits.

I get 140,000 games per second but I have a fast disk.

There is a visual studio project and binary in the archive, but it uses /arch:AVX2 on the command line.
If you do not have an advanced CPU, you will want to change the command line option.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

Dann Corbit wrote: Thu Nov 25, 2021 11:20 am I had to change the code for Windows. Here is the link:


fseek()/ftell() do not work for files bigger than 32 bits.

I get 140,000 games per second but I have a fast disk.

There is a visual studio project and binary in the archive, but it uses /arch:AVX2 on the command line.
If you do not have an advanced CPU, you will want to change the command line option.
Use memory mapped IO. All seeking will go away and you can use extremely optimized code like memchr to read line by line!
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer