Page 1 of 12

Database snapshot

Posted: Sat Jul 27, 2019 11:54 pm
by noobpwnftw
For those who want to probe my database locally or for other unspecified reasons, here is a full database snapshot of my book project as of today:

ftp://ftp.chessdb.cn/pub/chessdb/data-s ... 190728.tar

The database contains about 3 billion unique chess positions, mostly connected to startpos, analyzed by Stockfish with no less than 22 plies at terminal node and has a very wide multi-pv exploration, the scores been back-propagated using a weighted averaging function, also for most of the positions there is a special field(encoded as 'a0a0') marking known shortest distance of the position from startpos.

Using this database snapshot is as simple as putting the data files under your database folder and launch the server, yet still, I'd recommend you to use the online API and make feature requests if you need any, since it is getting updated constantly and I have no plans to make such kind of snapshots very frequently(while waiting for a contributor to make incremental snapshots possible).

This database snapshot is released into the public domain.

Re: Database snapshot

Posted: Sun Jul 28, 2019 3:10 am
by Dann Corbit
noobpwnftw wrote: Sat Jul 27, 2019 11:54 pm For those who want to probe my database locally or for other unspecified reasons, here is a full database snapshot of my book project as of today:

ftp://ftp.chessdb.cn/pub/chessdb/data-s ... 190728.tar

The database contains about 3 billion unique chess positions, mostly connected to startpos, analyzed by Stockfish with no less than 22 plies at terminal node and has a very wide multi-pv exploration, the scores been back-propagated using a weighted averaging function, also for most of the positions there is a special field(encoded as 'a0a0') marking known shortest distance of the position from startpos.

Using this database snapshot is as simple as putting the data files under your database folder and launch the server, yet still, I'd recommend you to use the online API and make feature requests if you need any, since it is getting updated constantly and I have no plans to make such kind of snapshots very frequently(while waiting for a contributor to make incremental snapshots possible).

This database snapshot is released into the public domain.
Please leave it online for a while, i an on vacation and cannot download it right now

Re: Database snapshot

Posted: Sun Jul 28, 2019 4:31 am
by Ferdy
Thanks for sharing.

I tried to probe from startpos with the following result.

[d]rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Code: Select all

    Move   Score  Rank       Note  winrate%
0   e2e4  15 (8)     2  ! (20-04)     50.61
1   d2d4  15 (4)     2  ! (20-03)     50.30
2   g1f3  15 (2)     2  ! (20-04)     50.15
3   g2g3  10 (2)     2  ! (20-07)     50.15
4   c2c4  10 (2)     2  ! (20-04)     50.15
5   d2d3       0     1  * (20-12)     50.00
6   c2c3       0     1  * (20-08)     50.00
7   e2e3       0     1  * (20-10)     50.00
8   b2b3       0     1  * (20-10)     50.00
9   b1c3       0     1  * (20-04)     50.00
10  a2a3       0     1  * (20-09)     50.00
11  h2h3      -1     1  * (20-09)     49.92
12  f2f4      -4     0  ? (20-14)     49.70
13  a2a4      -5     0  ? (20-11)     49.62
14  b2b4      -6     0  ? (20-11)     49.55
15  g1h3     -41     0  ? (20-05)     46.90
16  b1a3     -51     0  ? (20-01)     46.14
17  h2h4     -57     0  ? (20-01)     45.69
18  f2f3     -82     0  ? (20-01)     43.82
19  g2g4    -103     0  ? (20-01)     42.26
In

Code: Select all

0   e2e4  15 (8)     2  ! (20-04)     50.61
What is (8)?
Why rank 2 and not rank 1?
What is (20-04)?

For other position there is no (value) under Score column.
[d]rnbqk2r/pppnbppp/4p3/3pP1B1/3P3P/2N5/PPP2PP1/R2QKBNR b KQkq - 0 6

Code: Select all

    Move  Score  Rank       Note  winrate%
0   e7g5    -31     2  ! (07-01)     47.65
1   h7h6    -52     0  ? (06-01)     46.07
2   e8g8    -68     0  ? (13-01)     44.87
3   c7c5    -77     0  ? (05-01)     44.19
4   b8c6    -78     0  ? (03-01)     44.12
5   a7a6    -89     0  ? (08-01)     43.30
6   c7c6   -107     0  ? (05-01)     41.96
7   f7f6   -121     0  ? (05-01)     40.93
8   b7b6   -123     0  ? (14-01)     40.79
9   d7b6   -126     0  ? (17-01)     40.57
10  d7f8   -161     0  ? (19-02)     38.04
11  g7g6   -170     0  ? (18-01)     37.40
12  f7f5   -208     0  ? (09-01)     34.74

Re: Database snapshot

Posted: Sun Jul 28, 2019 4:57 am
by noobpwnftw

Code: Select all

e2e4  15 (8)     2  ! (20-04)     50.61
This reads:
<Notation of the move> <adjusted score>(<real score>) <rank> <rank mark> (<# of known reply moves>-<# of good reply moves>) <winrate>

For rank, 2 > 1 > 0 where rank=2 means it is a preferred move, rank=1 means it is a good alternative, rank=0 means it's a bad move(also when the position itself is bad).

Adjusted score only applies to startpos, mainly to normalize the above calculations.

Score has a range of +-10000, more than that it means a known mate score, with mated score at +-30000.

All these calculations are done at API front-end, the raw database just maps position keys to a set of moves which then maps to their eval score.

Position keys are binary-encoded FEN format with white-black symmetry(using the smaller one in their hex string form).

Re: Database snapshot

Posted: Sun Jul 28, 2019 5:53 am
by Ferdy
noobpwnftw wrote: Sun Jul 28, 2019 4:57 am

Code: Select all

e2e4  15 (8)     2  ! (20-04)     50.61
This reads:
<Notation of the move> <adjusted score>(<real score>) <rank> <rank mark> (<# of known reply moves>-<# of good reply moves>) <winrate>

For rank, 2 > 1 > 0 where rank=2 means it is a preferred move, rank=1 means it is a good alternative, rank=0 means it's a bad move(also when the position itself is bad).

Adjusted score only applies to startpos, mainly to normalize the above calculations.

Score has a range of +-10000, more than that it means a known mate score, with mated score at +-30000.

All these calculations are done at API front-end, the raw database just maps position keys to a set of moves which then maps to their eval score.

Position keys are binary-encoded FEN format with white-black symmetry(using the smaller one in their hex string form).
Thanks got it.

Re: Database snapshot

Posted: Sun Jul 28, 2019 5:58 am
by noobpwnftw
Binary FEN encoding has the following format:

Code: Select all

<board unit>...<board unit><turn><special unit>...<special unit>
Where each board unit has a 8-bit value of:
0 = 1 empty space
1 = 2 empty spaces
2 = 3 empty spaces
3 = p
4 = n
5 = b
6 = r
7 = q
8 = unused to avoid ambiguity
9 = k
a = P
b = N
c = B
d = R
e = Q
f = K

Turn is a 1 bit flag of 0 = white, 1 = black.

Special unit representing castling and ep information has a 8-bit value of:
0 = none
1 = a
2 = b
3 = c
4 = d
5 = e
6 = f
7 = g
8 = h
9 = delimiter
a = K
b = Q
c = k
d = q
and the file of ep square is as-is of it's numeric value.

Then output is then tailing-zero trimmed to produce the final position key.

Internally, moves are encoded as 16-bit values:

Code: Select all

<4-bit src_rank><1-bit promotion flag><3-bit src_file><4-bit dst_rank><4-bit dst_file>
Where if promotion flag is set, dst_rank is redefined as:
0 = q
1 = r
2 = b
3 = n

Re: Database snapshot

Posted: Sun Jul 28, 2019 9:10 am
by noobpwnftw
In board unit above, if there are more than 3 empty spaces, the first unit is set to 8 and the next unit is the number of empty spaces minus 4.
And correction: turn is a 8-bit flag, instead of 1.

Re: Database snapshot

Posted: Sun Jul 28, 2019 10:04 am
by Rebel
noobpwnftw wrote: Sat Jul 27, 2019 11:54 pm For those who want to probe my database locally or for other unspecified reasons, here is a full database snapshot of my book project as of today:

ftp://ftp.chessdb.cn/pub/chessdb/data-s ... 190728.tar

The database contains about 3 billion unique chess positions, mostly connected to startpos, analyzed by Stockfish with no less than 22 plies at terminal node and has a very wide multi-pv exploration, the scores been back-propagated using a weighted averaging function, also for most of the positions there is a special field(encoded as 'a0a0') marking known shortest distance of the position from startpos.

Using this database snapshot is as simple as putting the data files under your database folder and launch the server, yet still, I'd recommend you to use the online API and make feature requests if you need any, since it is getting updated constantly and I have no plans to make such kind of snapshots very frequently(while waiting for a contributor to make incremental snapshots possible).

This database snapshot is released into the public domain.
By accident, can you offer those 3 billion in EPD with SF score and depth, or a util that converts your database to EPD?

Re: Database snapshot

Posted: Sun Jul 28, 2019 12:22 pm
by Ovyron
noobpwnftw wrote: Sat Jul 27, 2019 11:54 pm analyzed by Stockfish with no less than 22 plies at terminal node
Interesting, my private database uses depth 22 as well, looks like we found it to be optimal (depth 21 having considerably less quality, depth 23 being consirerably more slow) independently?
Ferdy wrote: Sun Jul 28, 2019 4:31 amI tried to probe from startpos with the following result.

[d]rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Code: Select all

    Move   Score  Rank       Note  winrate%
0   e2e4  15 (8)     2  ! (20-04)     50.61
1   d2d4  15 (4)     2  ! (20-03)     50.30
2   g1f3  15 (2)     2  ! (20-04)     50.15
3   g2g3  10 (2)     2  ! (20-07)     50.15
4   c2c4  10 (2)     2  ! (20-04)     50.15
5   d2d3       0     1  * (20-12)     50.00
6   c2c3       0     1  * (20-08)     50.00
7   e2e3       0     1  * (20-10)     50.00
8   b2b3       0     1  * (20-10)     50.00
9   b1c3       0     1  * (20-04)     50.00
10  a2a3       0     1  * (20-09)     50.00
11  h2h3      -1     1  * (20-09)     49.92
12  f2f4      -4     0  ? (20-14)     49.70
13  a2a4      -5     0  ? (20-11)     49.62
14  b2b4      -6     0  ? (20-11)     49.55
15  g1h3     -41     0  ? (20-05)     46.90
16  b1a3     -51     0  ? (20-01)     46.14
17  h2h4     -57     0  ? (20-01)     45.69
18  f2f3     -82     0  ? (20-01)     43.82
19  g2g4    -103     0  ? (20-01)     42.26
Surprising to see scores that high. Mine has everything at 0.00 except for 1.d4 which is 0.03 (all white tries have been refuted to a 0.00 score otherwise).

...

Oh, three billion means your database is 1000 times larger than mine :shock:

I'd wish for a way to check it online (see https://www.365chess.com/opening.php for an example)

Re: Database snapshot

Posted: Sun Jul 28, 2019 4:09 pm
by noobpwnftw
Ovyron wrote: Sun Jul 28, 2019 12:22 pm
noobpwnftw wrote: Sat Jul 27, 2019 11:54 pm analyzed by Stockfish with no less than 22 plies at terminal node
Interesting, my private database uses depth 22 as well, looks like we found it to be optimal (depth 21 having considerably less quality, depth 23 being consirerably more slow) independently?
Ferdy wrote: Sun Jul 28, 2019 4:31 amI tried to probe from startpos with the following result.

[d]rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Code: Select all

    Move   Score  Rank       Note  winrate%
0   e2e4  15 (8)     2  ! (20-04)     50.61
1   d2d4  15 (4)     2  ! (20-03)     50.30
2   g1f3  15 (2)     2  ! (20-04)     50.15
3   g2g3  10 (2)     2  ! (20-07)     50.15
4   c2c4  10 (2)     2  ! (20-04)     50.15
5   d2d3       0     1  * (20-12)     50.00
6   c2c3       0     1  * (20-08)     50.00
7   e2e3       0     1  * (20-10)     50.00
8   b2b3       0     1  * (20-10)     50.00
9   b1c3       0     1  * (20-04)     50.00
10  a2a3       0     1  * (20-09)     50.00
11  h2h3      -1     1  * (20-09)     49.92
12  f2f4      -4     0  ? (20-14)     49.70
13  a2a4      -5     0  ? (20-11)     49.62
14  b2b4      -6     0  ? (20-11)     49.55
15  g1h3     -41     0  ? (20-05)     46.90
16  b1a3     -51     0  ? (20-01)     46.14
17  h2h4     -57     0  ? (20-01)     45.69
18  f2f3     -82     0  ? (20-01)     43.82
19  g2g4    -103     0  ? (20-01)     42.26
Surprising to see scores that high. Mine has everything at 0.00 except for 1.d4 which is 0.03 (all white tries have been refuted to a 0.00 score otherwise).

...

Oh, three billion means your database is 1000 times larger than mine :shock:

I'd wish for a way to check it online (see https://www.365chess.com/opening.php for an example)
Depth 22 seems to be a good balance between quality and speed.

I have applied penalties to a 0.00 score in back-propagation, maybe that caused it.

For a nice GUI like those I someone would look up the data from my API so that no reinventing wheels is needed.