Open Chess Game Database Standard

Discussion of chess software programming and technical issues.

Moderator: Ras

dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

Dann Corbit wrote: Fri Dec 10, 2021 5:58 pm It would be nice to have a user defined function that can make the hash from a FEN/EPD.
That way we could do simple lookups like
SELECT gameid from hash where hash = makehash('rnbqkbnr/pppppppp/8/8/1P6/8/P1PPPPPP/RNBQKBNR b KQkq -')
I forgot that sqlite doesnt do functions in sql. It needs you to register a callback. Its good news and bad news.
The bad news is that its not persistent. (very bad news so we suddenly are bound to c++/c# or whatever language the makehash is created. So in a portable sqlite db it would help only for creation. Once its out there you would need the function created in the language that opens the connection.
https://stackoverflow.com/questions/210 ... -in-sqlite

The good news is that you can use it exactly like you wrote it. You just need to register a callback (in c++ code)
So if you already have a fen parser you are good to go.
https://www.sqlite.org/appfunc.html
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

Here is the implementaion for one. just replace [board[k] = ...] with the hash calulation (s, i) which is fen char + square index to the zobrist hashes array and we are good to go :)

Code: Select all

static void Setup(std::string fen)
{
	char s, mv, pos[128], cas[5], enps[3];
	ep = -1; castle = 0; halfmove = 0; fullmove = 1;
	memset(board, '_', sizeof(board));
	int i = 0, k = 0,  col = 0, row = 7;
	if (sscanf(fen.c_str(), "%s %c %s %s %d %d", pos, &mv, cas, enps, &halfmove, &fullmove) < 3) {
		printf("Error parsing fen string %s\n", fen.c_str()); return;
	}
	//Board
	while ((s = pos[i++])) {
		if (s == '/') k = ((k + 7) / 8) * 8; //Round up to next 8 if not on 8
		else if (s >= '1' && s <= '8') k += s - '0';
		else board[k++] = s;
	}
	//Color
	movecolor = mv == 'b' ? -1 : 1; i = 0;
	//Castling
	while ((s = cas[i++])) {
		castle |= s == 'k' ? 1 : s == 'q' ? 2 : s == 'K' ? 4 : s == 'Q' ? 8 : 0;
	}
	//Enpassant
	if (enps[0] >= 'a' && enps[0] <= 'h') ep = (enps[0] - 'a') + (movecolor + 1) ? 24 : 32;
}
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Position searching: 4th attempt

Attempt 4th - A

This attempt is just a simple improvement of the previous one (the attempt 3rd), to speed up the search by hash keys. In the previous attempt, the search may take average 2.6 minutes for the first ply. The idea for this attempt is: if SQLite searches so slowly because of performing directly on the hard disk, we may improve the speed by making it perform in memory only.

There are two solutions to make SQLite perform in memory. The first one is to create a virtual hard disk and copy the database into it. We won’t use it since it requires users to learn and do some extra and boring things. The second one is to create an in-memory database and then load the physical/real one from the hard disk. We are going to implement this one.

The stats are as below:

Code: Select all

Loaded into memory, elapsed: 286132 ms, 04:46
SQLite database file ':memory:' opened successfully
ply: 1, #total queries: 1000, elapsed: 1982008 ms, 33:02, time per query: 1982 ms, total results: 1138017428, results/query: 1138017
ply: 2, #total queries: 1000, elapsed: 668297 ms, 11:08, time per query: 668 ms, total results: 359616424, results/query: 359616
ply: 3, #total queries: 1000, elapsed: 394425 ms, 06:34, time per query: 394 ms, total results: 199687182, results/query: 199687
ply: 4, #total queries: 1000, elapsed: 86780 ms, 01:26, time per query: 86 ms, total results: 37938700, results/query: 37938
ply: 5, #total queries: 1000, elapsed: 15266 ms, 00:15, time per query: 15 ms, total results: 5548104, results/query: 5548
ply: 6, #total queries: 1000, elapsed: 16501 ms, 00:16, time per query: 16 ms, total results: 6513774, results/query: 6513
ply: 7, #total queries: 1000, elapsed: 7095 ms, 00:07, time per query: 7 ms, total results: 2447268, results/query: 2447
ply: 8, #total queries: 1000, elapsed: 14105 ms, 00:14, time per query: 14 ms, total results: 5305266, results/query: 5305
ply: 9, #total queries: 1000, elapsed: 6703 ms, 00:06, time per query: 6 ms, total results: 2412860, results/query: 2412
ply: 10, #total queries: 1000, elapsed: 5785 ms, 00:05, time per query: 5 ms, total results: 2077934, results/query: 2077
ply: 11, #total queries: 1000, elapsed: 1323 ms, 00:01, time per query: 1 ms, total results: 421201, results/query: 421
ply: 12, #total queries: 1000, elapsed: 854 ms, 00:00, time per query: 0 ms, total results: 283617, results/query: 283
ply: 13, #total queries: 1000, elapsed: 654 ms, 00:00, time per query: 0 ms, total results: 239088, results/query: 239
ply: 14, #total queries: 1000, elapsed: 354 ms, 00:00, time per query: 0 ms, total results: 145977, results/query: 145
ply: 15, #total queries: 1000, elapsed: 37 ms, 00:00, time per query: 0 ms, total results: 10677, results/query: 10
ply: 16, #total queries: 1000, elapsed: 92 ms, 00:00, time per query: 0 ms, total results: 36655, results/query: 36
ply: 17, #total queries: 1000, elapsed: 36 ms, 00:00, time per query: 0 ms, total results: 11374, results/query: 11
ply: 18, #total queries: 1000, elapsed: 37 ms, 00:00, time per query: 0 ms, total results: 8662, results/query: 8
ply: 19, #total queries: 1000, elapsed: 31 ms, 00:00, time per query: 0 ms, total results: 9538, results/query: 9
ply: 20, #total queries: 1000, elapsed: 24 ms, 00:00, time per query: 0 ms, total results: 4397, results/query: 4
ply: 21, #total queries: 1000, elapsed: 31 ms, 00:00, time per query: 0 ms, total results: 7290, results/query: 7
ply: 22, #total queries: 1000, elapsed: 30 ms, 00:00, time per query: 0 ms, total results: 4971, results/query: 4
ply: 23, #total queries: 1000, elapsed: 9 ms, 00:00, time per query: 0 ms, total results: 977, results/query: 0
ply: 24, #total queries: 1000, elapsed: 11 ms, 00:00, time per query: 0 ms, total results: 728, results/query: 0
ply: 25, #total queries: 1000, elapsed: 13 ms, 00:00, time per query: 0 ms, total results: 762, results/query: 0
ply: 26, #total queries: 1000, elapsed: 14 ms, 00:00, time per query: 0 ms, total results: 748, results/query: 0
ply: 27, #total queries: 1000, elapsed: 22 ms, 00:00, time per query: 0 ms, total results: 772, results/query: 0
ply: 28, #total queries: 1000, elapsed: 7 ms, 00:00, time per query: 0 ms, total results: 765, results/query: 0
ply: 29, #total queries: 1000, elapsed: 16 ms, 00:00, time per query: 0 ms, total results: 751, results/query: 0
The load and convert the whole database into an in-memory one took about 5 minutes, quite long. However, the search is so fast. At ply 1, a query needs only 2 seconds instead of 2.6 minutes as the previous attempt. From ply 2, the time is much faster.


Attempt 4th - B

I don’t like to spend 5 minutes waiting for loading a database. I don’t like either loading over 13 GB of data into my computer’s memory. My old computer has only 16 GB RAM, loading that size makes the whole system almost runs out of memory, slow down significantly. Furthermore, even that database (3.45 million games) is runnable on my computer, any bigger one, say, 7 million games, may overflow/stop my computer.

Thus I try another way. The idea is simple: drop/omit the index since it is too large and check if the speed without the indexes is acceptable. The database without indexing (for hash keys) is 8.87 GB, a reduction of 5 GB. The size of the database now is fine for my computer.

Below are the new stats:

Code: Select all

Loaded into memory, elapsed: 133295 ms, 02:13
ply: 1, #total queries: 1000, elapsed: 12514400 ms, 3:28:34, time per query: 12514 ms, total results: 1138017428, results/query: 1138017
ply: 2, #total queries: 1000, elapsed: 12721859 ms, 3:32:01, time per query: 12721 ms, total results: 359616424, results/query: 359616
ply: 3, #total queries: 1000, elapsed: 12378915 ms, 3:26:18, time per query: 12378 ms, total results: 199687182, results/query: 199687
ply: 4, #total queries: 1000, elapsed: 11123044 ms, 3:05:23, time per query: 11123 ms, total results: 37938700, results/query: 37938
ply: 5, #total queries: 1000, elapsed: 11782850 ms, 3:16:22, time per query: 11782 ms, total results: 5548104, results/query: 5548
The loading time was cut more than a half - a good result. Each query took only 12 seconds - another good result compared with 2.6 minutes. Just a bit different is that the query time is almost the same by plies (not reduced as previous attempts), perhaps caused by scanning all records.

Code has been pushed into a new branch named “searchwithhashkeysinmem”.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Sopel wrote: Fri Dec 10, 2021 2:37 pm who even needs millions of results, just slap LIMIT 100 on it and be done
Definitely not for viewing only. But sometimes we need some stats/studies.

We create tools, users know what to do with those tools.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

phhnguyen wrote: Sat Dec 11, 2021 12:08 am
Sopel wrote: Fri Dec 10, 2021 2:37 pm who even needs millions of results, just slap LIMIT 100 on it and be done
Definitely not for viewing only. But sometimes we need some stats/studies.

We create tools, users know what to do with those tools.
Dont feed. Sopel is a known troll. Admins are watching him closely. Just report his posts.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

phhnguyen wrote: Fri Dec 10, 2021 11:59 pm The loading time was cut more than a half - a good result. Each query took only 12 seconds - another good result compared with 2.6 minutes. Just a bit different is that the query time is almost the same by plies (not reduced as previous attempts), perhaps caused by scanning all records.

Code has been pushed into a new branch named “searchwithhashkeysinmem”.
Can we merge that into the master? I think this is better in every way compared to the current state of the main branch! If the indexes are removed then even inserting is so fast again.
I think 5gb ram usage is very fine for such a project. Also 16++ 32 and 64 gb are becoming more and more mainstream.

You can see that 16Gb or more is in 2021 above 50%
https://store.steampowered.com/hwsurvey ... e-to-Steam

I think the performance cannot be improved more? Its really fast as it is. Parsing and Query is in a userfriendly speed region IMO.
So is it ready for deployment?
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

dangi12012 wrote: Sat Dec 11, 2021 12:20 am
phhnguyen wrote: Sat Dec 11, 2021 12:08 am
Sopel wrote: Fri Dec 10, 2021 2:37 pm who even needs millions of results, just slap LIMIT 100 on it and be done
Definitely not for viewing only. But sometimes we need some stats/studies.

We create tools, users know what to do with those tools.
Dont feed. Sopel is a known troll. Admins are watching him closely. Just report his posts.
Sorry Daniel, I don’t think anyone in this topic trolling, thus I treat them all with equal high respect as always.
Last edited by phhnguyen on Sat Dec 11, 2021 12:53 am, edited 1 time in total.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Dann Corbit wrote: Fri Dec 10, 2021 5:55 pm Here are some fide games with the hash:

Some issues:
We do not need the Elo in both the header and the player table, that is redundant.
The players’ Elos in a PGN are ones when players play that game. However, their Elos are not the same by time so are theirs in games. Carlsen may have Elo much lower in games 10 years ago, compared with recent ones. In contrast, Elo in the table Players are recent or the highest ones.

The Elo column in Players could be considered as redundant if it is only extracted from Games table. However, it will be useful if:
- we want to speed up queries with Elo, say, count all players with Elo > 2000
- we want Players is independent from Games, say, it has some players without games, extracted from other databases, their Elo updated from other sources…
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

phhnguyen wrote: Sat Dec 11, 2021 12:51 am
Dann Corbit wrote: Fri Dec 10, 2021 5:55 pm Here are some fide games with the hash:

Some issues:
We do not need the Elo in both the header and the player table, that is redundant.
The players’ Elos in a PGN are ones when players play that game. However, their Elos are not the same by time so are theirs in games. Carlsen may have Elo much lower in games 10 years ago, compared with recent ones. In contrast, Elo in the table Players are recent or the highest ones.

The Elo column in Players could be considered as redundant if it is only extracted from Games table. However, it will be useful if:
- we want to speed up queries with Elo, say, count all players with Elo > 2000
- we want Players is independent from Games, say, it has some players without games, extracted from other databases, their Elo updated from other sources…
Yes we need both. Normally ELO is the current for player and from game you can get players elo history over time.
So the ELO for a player *should* be identical to:

Code: Select all

select top 1 ELO from games order by date desc
Is this master now? I think its better in most aspects now and that may warrant merging. For any project its a good policy to merge when a feature has benefits to the 'customer'
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Dann Corbit wrote: Fri Dec 10, 2021 5:58 pm It would be nice to have a user defined function that can make the hash from a FEN/EPD.
That way we could do simple lookups like
SELECT gameid from hash where hash = makehash('rnbqkbnr/pppppppp/8/8/1P6/8/P1PPPPPP/RNBQKBNR b KQkq -')
I don't know how to create a user-defined function with SQLite. Perhaps it allows only callback function. It means we can't create a user-defined function and store it with databases and then call it with any SQLite browser. Instead, we must create and call the function inside our program only. However, if we must write the code, we can simply create a normal function (not callback) to create hash keys for given FENs (and then use results to query). The below code can run with our test program:

Code: Select all

int64_t makehash(const std::string& fenString)
{
    bslib::ChessBoard board;
    board.newGame(fenString);
    return static_cast<int64_t>(board.hashKey);
}
The hash created by bslib::ChessBoard is actually a Polyglot one (all hash keys are the same as hash keys with the Polyglot opening book). You may create too by other programs/tools supporting the Polyglot. Those hashes are unsigned int64 but SQLite supports only signed int64 thus we need to cast them.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager