Open Chess Game Database Standard

Discussion of chess software programming and technical issues.

Moderator: Ras

Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: Open Chess Game Database Standard

Post by Fulvio »

phhnguyen wrote: Wed Dec 29, 2021 2:30 am Having more choices is better, isn't it? ;)
Of course, but you wrote and I replied to:
phhnguyen wrote: Tue Dec 28, 2021 5:25 am I believe the bitboards+indexes can be comparable with any binary one on position searching. We surely win in flexibility, say, users can query almost no limited kinds of position-searching.
when in reality it is much slower (1491.8 seconds vs 2.2 seconds) and it also occupies much more space (17.6 GB vs 0.29 GB).
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Fulvio, still waiting for your help to use SCID (as your suggestion ;) ).

I have managed to get results for the question about 3 White Queens with SCID (it is fine now). However, I don't know how to set up SCID to find all games that have two Black Rooks in d4, d5, e4, e5. For example, if I set it as the below image, SCID returns nothing.

Image
Fulvio wrote: Wed Dec 29, 2021 7:38 am
phhnguyen wrote: Wed Dec 29, 2021 2:30 am Having more choices is better, isn't it? ;)
Of course, but you wrote and I replied to:
phhnguyen wrote: Tue Dec 28, 2021 5:25 am I believe the bitboards+indexes can be comparable with any binary one on position searching. We surely win in flexibility, say, users can query almost no limited kinds of position-searching.
when in reality it is much slower (1491.8 seconds vs 2.2 seconds) and it also occupies much more space (17.6 GB vs 0.29 GB).
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: Open Chess Game Database Standard

Post by Fulvio »

phhnguyen wrote: Thu Dec 30, 2021 3:29 am Fulvio, still waiting for your help to use SCID (as your suggestion ;) ).
Just to avoid misunderstandings: I certainly don't mean that you should give up and just use SCID !.
I suggested that it was much easier if you compared the speeds directly (writing on the forum "How long does it takes ...?" And then waiting for me to answer is not the best :) )
Also consider that SCID's Material Search is rather inefficient: it uses only one thread and in a rather brutal way recreates all the database positions. I'm sure if one adapted for example the Stockfish code it would get much faster results.

However, in the picture you posted:
- "Material" refers to the number of pieces on the board
- "Patterns" are additional restrictions where it is possible to choose if there must be (or not be) a piece in a square (or in a row or in a rank).

So that picture is searching for a position with:
2 black rooks (in all the board)
AND a black rook on d4
AND a black rook on d5
AND a black rook in e5
AND a black rook in e4
which obviously does not exist.

You can search for a position with:
2 black rook (in all the board)
AND there is no black rook in row 1
AND there is no black rook in row 2
AND there is no black rook in row 3
AND there is no black rook in row 6
AND there is no black rook in row 7
AND there is no black rook in row 8
AND there is no black rook in file a
...
AND there is no black rook in file h
In any case, the number of patterns does not particularly affect the speed of the search.

The lower part, "Operation on current filter", refers to previous searches (even of different types, such as header search) and is for super advanced users.
Let's say you want to search for a position with:
(a black rook on d4 AND a black rook on d5) OR (a black rook on d4 AND a black rook on e4)
Set the two patterns:
black rook in d4
AND black rook in d5
click "search"
select "or (add to filter)"
change the two patterns:
black rook in d4
AND black rook in e4
click "search"

All of this is quite explanatory as to why I think one should focus on searches that makes sense from a chess point of view.
Being too general, thinking that it is then possible to compose them, increases the complexity too much.
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

I'll just say that this thread motivated me to develop an efficient chess game storage+API suited for arbitrary parallel linear scan searches on positions/moves in chess games; as an alternative to this. It has become apparent that specialized solutions are needed for different use cases.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Fulvio wrote: Thu Dec 30, 2021 9:10 am
phhnguyen wrote: Thu Dec 30, 2021 3:29 am Fulvio, still waiting for your help to use SCID (as your suggestion ;) ).
Just to avoid misunderstandings: I certainly don't mean that you should give up and just use SCID !.
I suggested that it was much easier if you compared the speeds directly (writing on the forum "How long does it takes ...?" And then waiting for me to answer is not the best :) )
Also consider that SCID's Material Search is rather inefficient: it uses only one thread and in a rather brutal way recreates all the database positions. I'm sure if one adapted for example the Stockfish code it would get much faster results.
Oh my…

Don’t surprise me too much by… your worries ;)

Of course, I have never thought you suggested I give up but should understand/learn your program more and that was what I did!
Fulvio wrote: Thu Dec 30, 2021 9:10 am However, in the picture you posted:
- "Material" refers to the number of pieces on the board
- "Patterns" are additional restrictions where it is possible to choose if there must be (or not be) a piece in a square (or in a row or in a rank).

So that picture is searching for a position with:
2 black rooks (in all the board)
AND a black rook on d4
AND a black rook on d5
AND a black rook in e5
AND a black rook in e4
which obviously does not exist.

You can search for a position with:
2 black rook (in all the board)
AND there is no black rook in row 1
AND there is no black rook in row 2
AND there is no black rook in row 3
AND there is no black rook in row 6
AND there is no black rook in row 7
AND there is no black rook in row 8
AND there is no black rook in file a
...
AND there is no black rook in file h
In any case, the number of patterns does not particularly affect the speed of the search.

The lower part, "Operation on current filter", refers to previous searches (even of different types, such as header search) and is for super advanced users.
Let's say you want to search for a position with:
(a black rook on d4 AND a black rook on d5) OR (a black rook on d4 AND a black rook on e4)
Set the two patterns:
black rook in d4
AND black rook in d5
click "search"
select "or (add to filter)"
change the two patterns:
black rook in d4
AND black rook in e4
click "search"

All of this is quite explanatory as to why I think one should focus on searches that makes sense from a chess point of view.
Being too general, thinking that it is then possible to compose them, increases the complexity too much.
In short, SCID can’t do position searching with some patterns! Am I correct???

When I started this project, I have understood already and clearly the main advantages and disadvantages of binary databases (for chess games) vs SQL ones. Note that I have implemented my first database as a binary format too. Binary ones always are the best for speed and size. SQL can’t replace binary but it is the best for many aspects, including sharing, and it could be used well for some apps/purposes. Almost all (my understanding) are still correct at the moment. We were just surprised ourselves that SQL is not bad as we thought (much better than we expected) in multi aspects we have tested directly.

On the other hand, SCID has always surprised me much by its crazy speed/fast! You should know that its speed is one of my main targets, to reach or at least to reduce gaps. Frankly speaking, sometimes I thought I have reached some comparable speed but after verifying, I realized I was still far behind SCID!

Approximate-position-searching is another story. As I have read/understood, the original SCID itself can’t do fully that task and it depends on CQL library (the 3rd party coding, so far has been implemented for SCIDvsPc). You made me so confused by confirming surely SCID did but it turns out my understanding is still valid (SCID can do partly only).

I know if you want, you can implement something to answer what I or a user asked. But we still have many different patterns!!!

I think the key to answering all patterns depends on two things. One is the interface. Using only some dialog boxes can’t cover all cases as a query language (such as SQL or CQL) could. The other thing is how much information you create and keep for each position. If you keep less than a full set of board information (we store it in full as a set of bitboards) you can save time and memory but again, you can’t answer all position patterns. I am sure SCID has problem 1 but not sure about 2 even I guess it has too.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Sopel wrote: Thu Dec 30, 2021 1:05 pm I'll just say that this thread motivated me to develop an efficient chess game storage+API suited for arbitrary parallel linear scan searches on positions/moves in chess games; as an alternative to this. It has become apparent that specialized solutions are needed for different use cases.
I’m very glad that my work might motivate some other works. As a chess enthusiast, I welcome and highly appreciate any new effort for the community! Having more choices is better, isn't it? (I have just mentioned it in a previous post :D ).

Good luck with your new project!
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: Open Chess Game Database Standard

Post by Fulvio »

phhnguyen wrote: Thu Dec 30, 2021 2:09 pm Approximate-position-searching is another story. As I have read/understood, the original SCID itself can’t do fully that task and it depends on CQL library (the 3rd party coding, so far has been implemented for SCIDvsPc). You made me so confused by confirming surely SCID did but it turns out my understanding is still valid (SCID can do partly only).
You still look a little bit confused to me :wink:

1) Key fact: generating the positions is super fast
The moves of all the games are stored in the .sg4 file of a SCID4 database.
The material search use that file and for every game:
- create the starting position
- check if the position satisfy the requirements
- do the next move
- check if the position satisfy the requirements
- do the next move
...
So it doesn't make sense when you write "less than a full set of board information".
You can imagine as if Stockfish instead of generating the moves of a position, it only executes the next move, and instead of evaluating the position, he simply has to check if it matches the one sought.
And as I said it can be vastly improved, a multi-threading and super optimized code like that of Stockfish wouldn't do at least 100 million positions per second?

2) Key fact: Many chess players are not programmers
Let me improve my tactics: how do I find all positions where a bishop pins a knight, the value of the pieces is equal, but the engine evaluation is at least +2 because there is a piece that is only apparently defended by the knight?
Write a Tcl script is not the right answer for most users.
Trying to solve that need they created CQL (using some SCID code, not the other way around as you seem to believe: "it uses SCID code by Shane Hudson" http://www.gadycosteff.com/cql/).
Same procedure described in point 1, with the difference that each position is compared with a CQL filter.
Unfortunately it has become a very complex language, too complex for most users (and in some cases it is even easier to write the Tcl script).

3) Key fact: it is better to use the right tool for the job
Lichess uses MongoDB where nearly 3 billion (!) games are stored.
And yet there is a database explorer (https://lichess.org/analysis) where I can study the openings and immediately get the stats for over 380 million games...
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

Fulvio wrote: Thu Dec 30, 2021 4:56 pm And yet there is a database explorer (https://lichess.org/analysis) where I can study the openings and immediately get the stats for over 380 million games...
that appears to have only ~2.4M games? At least that's the number of moves for startpos. Maybe 380M positions. But yea, chess_pos_db could do that for whole lichess without issues, if these WDL stats are all one needs.

edit. okay ,correction, now I saw there's a "lichess" tab, still there's about 200M games "only"
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: Open Chess Game Database Standard

Post by Fulvio »

Sopel wrote: Thu Dec 30, 2021 5:34 pm edit. okay ,correction, now I saw there's a "lichess" tab, still there's about 200M games "only"
There is a "gear" icon which let's you select the games by time control, average rating and date.
If you select everything there are 383,251,358 games
amanjpro
Posts: 883
Joined: Sat Mar 13, 2021 1:47 am
Full name: Amanj Sherwany

Re: Open Chess Game Database Standard

Post by amanjpro »

Sopel wrote: Thu Dec 30, 2021 5:34 pm
Fulvio wrote: Thu Dec 30, 2021 4:56 pm And yet there is a database explorer (https://lichess.org/analysis) where I can study the openings and immediately get the stats for over 380 million games...
that appears to have only ~2.4M games? At least that's the number of moves for startpos. Maybe 380M positions. But yea, chess_pos_db could do that for whole lichess without issues, if these WDL stats are all one needs.

edit. okay ,correction, now I saw there's a "lichess" tab, still there's about 200M games "only"
it is 200M only for e4, for total number of games marked as Σ it is nearly 400M

Image