Database snapshot

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: Database snapshot

Post by noobpwnftw »

Rebel wrote: Sun Jul 28, 2019 10:04 am
noobpwnftw wrote: Sat Jul 27, 2019 11:54 pm For those who want to probe my database locally or for other unspecified reasons, here is a full database snapshot of my book project as of today:

ftp://ftp.chessdb.cn/pub/chessdb/data-s ... 190728.tar

The database contains about 3 billion unique chess positions, mostly connected to startpos, analyzed by Stockfish with no less than 22 plies at terminal node and has a very wide multi-pv exploration, the scores been back-propagated using a weighted averaging function, also for most of the positions there is a special field(encoded as 'a0a0') marking known shortest distance of the position from startpos.

Using this database snapshot is as simple as putting the data files under your database folder and launch the server, yet still, I'd recommend you to use the online API and make feature requests if you need any, since it is getting updated constantly and I have no plans to make such kind of snapshots very frequently(while waiting for a contributor to make incremental snapshots possible).

This database snapshot is released into the public domain.
By accident, can you offer those 3 billion in EPD with SF score and depth, or a util that converts your database to EPD?
In the project code there is a utility script that exports data into plain-text.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Database snapshot

Post by Ovyron »

noobpwnftw wrote: Sun Jul 28, 2019 4:09 pmI have applied penalties to a 0.00 score in back-propagation, maybe that caused it.
Ah, nice. A problem I've been having is that after all lines have been refuted to 0.00, say, 1.e4, 1.c4 and 1.Nf3 look the same, even though it's much harder for black to equalize against 1.e4, and the 0.00 lines on 1.e4 are considerably less than in the others, so if they get counter-refuted 1.e4 is much more likely to end with positive score for white than the others, it'd make sense for 1.e4 to show better score than the others in the starting position.

The problem with penalties for a 0.00 score (like Houdini 6 does) is that it'd hide real unpenalized variations (in this case, if the penalized mainline for d4 is 0.04, but there's a line that is 0.03, the database would show as best line the one that goes to 0.00 but it's shown as 0.04, even though the line that goes to 0.03 is better.)

A solution for this is to reserve some scores for penalized variations (...like Houdini 6 does.) Say, scores from -0.10 to 0.10 always lead to a penalized 0.00 score. For the rest of positions, if black has the edge it has subtracted -0.10 from score, and if white does it has 0.10 added to score. That way if the mainline of 1.d4 leads to unpenalized 0.03, the database would give it a 0.13 score, and the user would know at a glance that the score comes from actual advantage and not a penalized draw.

My suggestion would be to add granularity to scores, showing sub 0.01 scores, that way you can reserve -0.009 to 0.009 for penalized draws, and subtract/add this to everything else. The advantage would be that you could also use this reserved part of scores to differentiate from lines that have the same back-propagated score to denote how hard/easy is for one side to maintain it (say a position 0.14 that is very sharp and that refuting it would lead to lower score could be shown as 0.140, while another that has many lines that reach 0.14 so you'd need to refute them all to reduce the score could be shown as 0.149, while the interval 0.140-0.149 would be used for intermediate cases.)

We still don't have an online chess tree resource where people can take a look at a database and find back-propagated scores of chess engines, with a snapshot like this hopefully it's possible.
Your beliefs create your reality, so be careful what you wish for.
noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: Database snapshot

Post by noobpwnftw »

My weighted averaging function also factors in the number of good reply moves so that it can tell the difference between different 0.00 lines and apply penalties accordingly, also gradually degrade sharp scores to prevent the back-propagation becoming fully min-maxed.

Regardless of how you want to apply those back-propagation algorithms, what matters are all the scores of terminal nodes and the shape of the tree, which both take much effort to construct and now available.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Database snapshot

Post by Ovyron »

noobpwnftw wrote: Sun Jul 28, 2019 6:35 pm My weighted averaging function also factors in the number of good reply moves so that it can tell the difference between different 0.00 lines and apply penalties accordingly, also gradually degrade sharp scores to prevent the back-propagation becoming fully min-maxed.
Thanks. Do you have documented how do you do it somewhere? It might also be helpful for people to apply these weighted averages/penalties to their own databases, to get an accurate score at the root (along with correspondence chess players using these concepts to choose moves, when even engines give 0.00 scores to non-transposing lines).
Your beliefs create your reality, so be careful what you wish for.
Nelson Hernandez
Posts: 101
Joined: Sun Nov 14, 2010 9:36 pm
Location: U.S.

Re: Database snapshot

Post by Nelson Hernandez »

Nice work, Noob. Sounds like you've got 35-40 million games in that database. Any word on the composition, i.e. % human, % engine?

This work would be greatly improved with a custom GUI that included your evaluation data, the empirical WLD data (flawed though it may be) and information on who played moves first. But I'm sure you've already thought of that!
noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: Database snapshot

Post by noobpwnftw »

Nelson Hernandez wrote: Sun Jul 28, 2019 11:56 pm Nice work, Noob. Sounds like you've got 35-40 million games in that database. Any word on the composition, i.e. % human, % engine?

This work would be greatly improved with a custom GUI that included your evaluation data, the empirical WLD data (flawed though it may be) and information on who played moves first. But I'm sure you've already thought of that!
It is not built from any existing games, rather, it is built recursively by self-play, 100% engine.
Any GUI that is capable of probing my API would also become a data source providing information that needs to be explored, aside from automated process.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Database snapshot

Post by Ferdy »

noobpwnftw wrote: Sat Jul 27, 2019 11:54 pm For those who want to probe my database locally or for other unspecified reasons, here is a full database snapshot of my book project as of today:
Your sample probing code is very helpful, able to probe book moves using python.

Image


For others who are interested here is a python script. Need to install request via pip install requests.

Code: Select all

#!/usr/bin/env python3
"""
book_probe.py

Probe book moves in ChessDB

Requirements:
    * Python 3
    * Requests
        pip install requests

"""

import requests
import json
import time


def probe_book(fen):
    data = []
    
    pieces = fen.split()[0]
    turn = fen.split()[1]
    castle = fen.split()[2]
    ep = fen.split()[3]
    hmvc = fen.split()[4]
    fmvn = fen.split()[5]
    sep = '%20'
    
    base = 'https://www.chessdb.cn/cdb.php?action=queryall&json=1&board='
    url = base + pieces + sep + turn + sep + castle + \
           sep + ep + sep + str(hmvc) + sep + str(fmvn)
    r = requests.get(url)
    jdata = r.text
    
    d = json.loads(jdata)
    if d['status'] != 'unknown':        
        for n in d['moves']:
            data.append([n['san'], int(n['score']), float(n['winrate'])])
    else:
        print('Position is not found!')
        
    if len(data) > 0:
        for d in data:
            print('{:6s}  {:<5d}  {:0.2f}'.format(d[0], d[1], d[2]))
    
def main():    
    while True:
        fen = input('enter fen? ')
        probe_book(fen)
        time.sleep(2)
    

if __name__ == '__main__':
    main()

Sample run

Code: Select all

enter fen? rnbqkbnr/pppppppp/8/8/3P4/8/PPP1PPPP/RNBQKBNR b KQkq - 0 1
Nf6     -5     49.62
d5      -6     49.55
e6      -9     49.32
c6      -20    48.49
g6      -38    47.12
a6      -38    47.12
d6      -40    46.97
f5      -51    46.14
c5      -65    45.09
Nc6     -66    45.02
h6      -75    44.34
a5      -79    44.04
b6      -81    43.89
Na6     -100   42.48
b5      -106   42.04
Nh6     -107   41.96
h5      -118   41.15
e5      -128   40.42
f6      -155   38.47
g5      -227   33.45

enter fen?
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Database snapshot

Post by Ovyron »

I take there's no Chess GUI that currently exists that is able to probe the database (without knowing programming/how to use scripts)?
Your beliefs create your reality, so be careful what you wish for.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Database snapshot

Post by Ferdy »

Exe file to probe opening book moves in ChessDB. It will just ask for FEN.
https://drive.google.com/file/d/1_KM89V ... sp=sharing
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Database snapshot

Post by Ovyron »

Thanks, I downloaded your exe, ran it in some directory and pasted a fen. I got this error:

Traceback (most recent call last):
File "book_probe2.py", line 55, in <module>
File "book_probe2.py", line 50, in main
File "book_probe2.py", line 26, in probe_book
IndexError: list index out of range
[172] Failed to execute script book_probe2
Your beliefs create your reality, so be careful what you wish for.