Page 1 of 3

ChessDBCN

Posted: Mon Sep 09, 2019 2:50 am
by noobpwnftw
There has been a few threads and related discussions about my work on chess, now it has came to some fruition.

I have built and documented unified APIs for both chess and Xiangqi, with probing interfaces to demonstrate their usage.

https://www.chessdb.cn/queryc_en/

The APIs will provide online position analysis as well as EGTB results where available, also recommendations to GUI makers and engines on how to move on to the age of the clouds without having end-users to download terabytes of EGTB data locally.

For starters, the APIs will also provide you with a move on ANY legal position: it may be a move from the database, a move from one of the strongest engines or a move from the EGTB.

All the code and data, if can be legally copyrighted my me, are hereby released into the public domain, the rest follow their original licences.
Note that for Xiangqi there are proprietary parts that I've obtained permanent licenses to provide public service.

Full database snapshots are available upon request, all EGTB files are hosted at ftp://ftp.chessdb.cn/pub/.
Data collection(automatic learning) from probing requests can be opt-out as per documentation, with a penalty of only one best move per position will be returned by the API. Fair-use rules apply, with 100k probing requests per IP in a period of 24 hours to discourage excessive crawling.

Thanks to everyone who involved in the process of making this happen, specially:

Ferdinand Mosca - for his work on making a probing interface before I made my own.
Niklas Fiekas - for his work on chess libraries which powered many fundamental parts of the system.
Andrew Grant - for teaching me the word "fruition" and ideas on getting the shortest distance of a position from the starting position.

Re: ChessDBCN

Posted: Tue Sep 10, 2019 3:40 am
by tmokonen
Very cool resource. Thank you for all the effort you put into it.

Re: ChessDBCN

Posted: Tue Sep 10, 2019 7:49 am
by Guenther
noobpwnftw wrote:
Mon Sep 09, 2019 2:50 am
There has been a few threads and related discussions about my work on chess, now it has came to some fruition.

I have built and documented unified APIs for both chess and Xiangqi, with probing interfaces to demonstrate their usage.

https://www.chessdb.cn/queryc_en/

The APIs will provide online position analysis as well as EGTB results where available, also recommendations to GUI makers and engines on how to move on to the age of the clouds without having end-users to download terabytes of EGTB data locally.

For starters, the APIs will also provide you with a move on ANY legal position: it may be a move from the database, a move from one of the strongest engines or a move from the EGTB.

All the code and data, if can be legally copyrighted my me, are hereby released into the public domain, the rest follow their original licences.
Note that for Xiangqi there are proprietary parts that I've obtained permanent licenses to provide public service.

Full database snapshots are available upon request, all EGTB files are hosted at ftp://ftp.chessdb.cn/pub/.
Data collection(automatic learning) from probing requests can be opt-out as per documentation, with a penalty of only one best move per position will be returned by the API. Fair-use rules apply, with 100k probing requests per IP in a period of 24 hours to discourage excessive crawling.

Thanks to everyone who involved in the process of making this happen, specially:

Ferdinand Mosca - for his work on making a probing interface before I made my own.
Niklas Fiekas - for his work on chess libraries which powered many fundamental parts of the system.
Andrew Grant - for teaching me the word "fruition" and ideas on getting the shortest distance of a position from the starting position.
Very nice work. Could you tell a bit more about the 'large cluster' which runs the analysis engine?
Also which version of SF is used and will it change with time?

Re: ChessDBCN

Posted: Tue Sep 10, 2019 8:46 am
by noobpwnftw
Guenther wrote:
Tue Sep 10, 2019 7:49 am
Very nice work. Could you tell a bit more about the 'large cluster' which runs the analysis engine?
Also which version of SF is used and will it change with time?
There are two queues: one is for move exploration, each position is evaluated by a shallow SF search on multi-PV(depth 12), sieving from all moves for a minimum of 5 and within 200cp of the best score, these moves are then sent to the second queue; the second queue evaluates each position after making those moves by performing a normal depth 22 search, then without clearing the hash, searches the original position again and store the results to the database. Some back propagation work is done both during queries(look up to 20 plies deep), upon analysis request(100 plies deep) or in background.

Currently there are about 200 cores on the first queue, 4500 cores on the second and a few machines generating self-play games with various engines. Parameters are chosen from experience, engine code is at https://github.com/noobpwnftw/Stockfish/tree/siever. Due to the nature of such brute-forcing method, it does not require up-to-date everything to make it work, yet using the latest engine code can be more efficient.

Re: ChessDBCN

Posted: Tue Sep 10, 2019 9:29 am
by Guenther
noobpwnftw wrote:
Tue Sep 10, 2019 8:46 am
Guenther wrote:
Tue Sep 10, 2019 7:49 am
Very nice work. Could you tell a bit more about the 'large cluster' which runs the analysis engine?
Also which version of SF is used and will it change with time?
There are two queues: one is for move exploration, each position is evaluated by a shallow SF search on multi-PV(depth 12), sieving from all moves for a minimum of 5 and within 200cp of the best score, these moves are then sent to the second queue; the second queue evaluates each position after making those moves by performing a normal depth 22 search, then without clearing the hash, searches the original position again and store the results to the database. Some back propagation work is done both during queries(look up to 20 plies deep), upon analysis request(100 plies deep) or in background.

Currently there are about 200 cores on the first queue, 4500 cores on the second and a few machines generating self-play games with various engines. Parameters are chosen from experience, engine code is at https://github.com/noobpwnftw/Stockfish/tree/siever. Due to the nature of such brute-forcing method, it does not require up-to-date everything to make it work, yet using the latest engine code can be more efficient.
Thanks for the extra info!

Re: ChessDBCN

Posted: Wed Sep 11, 2019 3:56 am
by gladius
This is an incredibly cool project! Could get close to a soft-solve. Although engines prune so much, there are still positions they make serious mistakes, but they are fewer and fewer over time. Congrats, it's really well put together.

Re: ChessDBCN

Posted: Fri Sep 13, 2019 10:38 pm
by cdani
Very nice work!! Congratulations!!

Re: ChessDBCN

Posted: Fri Sep 13, 2019 10:45 pm
by Ovyron
Agreed. You specially want to feed it stuff and come back after a few hours, it's a kind of live database powered by AI, you need to let it chew in the positions so the bad lines fall to the bottom of the ranking.

What I found curious is that back when I started using it, the DB size was already at 146GB, and the queue was growing without bounds, at that rate the DB would have doubled its size in two weeks :shock: Now the DB has shrunk in size to 130GB, and the queue remains stable at around 8GB, so I wonder what happened.

Re: ChessDBCN

Posted: Sat Sep 14, 2019 5:40 am
by noobpwnftw
Ovyron wrote:
Fri Sep 13, 2019 10:45 pm
Agreed. You specially want to feed it stuff and come back after a few hours, it's a kind of live database powered by AI, you need to let it chew in the positions so the bad lines fall to the bottom of the ranking.

What I found curious is that back when I started using it, the DB size was already at 146GB, and the queue was growing without bounds, at that rate the DB would have doubled its size in two weeks :shock: Now the DB has shrunk in size to 130GB, and the queue remains stable at around 8GB, so I wonder what happened.
The main DB is built upon RocksDB with a custom merge operator for performance considerations: it has very intense random write workloads due to having that many workers. All writes are append-only: before compaction, reads would need to merge them on-the-fly, then after there are substantial amount of outstanding merges, compaction triggers and all merges are combined. Refer to https://github.com/facebook/rocksdb/wik ... Compaction and https://github.com/facebook/rocksdb/wiki/Merge-Operator for details.

The queue DB however is another thing: there are two busy priority queues, one for each stage of the processing, intermediate results update the queues very often, while sorting has to be done to ensure best user experience when submitting analysis requests, so that they get prioritized over automatic explorations. Also there are times I'm importing Lichess database to the queue, so it can grow "without bounds". Later I found it not very useful anyway so I stopped importing but kept what's done. This DB is backed by WiredTiger B-Tree from MongoDB, it does in-place updates and has a greedy allocation scheme on disk files, but should be freed up when the queue is dry.

Re: ChessDBCN

Posted: Sat Sep 14, 2019 6:25 am
by Ovyron
Thanks, I appreciate that my requests really seem prioritized, and that after I've made the DB aware of some variation, it's explored in depth in a timely fashion. Specially when there seems to be like 60 million positions in queue, I was afraid it'd take a long while to get on it, but one always gets fast results.