stegemma wrote:I've implemented a simple opening book in Satana. I miss some math here, from a statistical point of view, how can we compute the probability of win, giving a different score for any book entry? In my book, I store any position as an entry with the white/black score. Since some position are played often and some rarely, just computing a % like this one isn't always good:
Code: Select all
perc = white_score / (white_score + black_score)
If I compare two positions, for sample, one played 100000 times with a white % of win of 53% with another played only 10 times with a % of win for white of 70%, just comparing % would let the engine prefers the rare continuation.
I think that I should compute a more complex % but I don't know why (my math is poor, in statistical).
For sample, this is my engine output for the starting position:
Code: Select all
hash white_score black_score %
# 26B24248B0B213A7 Ng1-h3 3 1 75.00%
# A6F5BE14996CE34B Ng1-f3 141781 105371 57.30%
# A9CD43D265BB27C1 Nb1-c3 8670 8386 50.80%
# A2D57A5C5D03A1B4 Nb1-a3 ???
# A65545D9385B5A3B Ph2-h3 2 0 100.00%
# A9D5495D9934AFB0 Pg2-g3 4782 3596 57.00%
# A6CABC12D080A7CD Pf2-f3 2 0 100.00%
# A6CB76195BE75793 Pe2-e3 33 59 35.80%
# B6D55CBAA963D796 Pd2-d3 40 32 55.50%
# B6CD5B84A78F4849 Pc2-c3 32 32 50.00%
# 46D5AEC46E7F602A Pb2-b3 11438 11100 50.70%
# E6B53D23459CEBA5 Pa2-a3 ???
# 9FCA61F1E64C0CF0 Ph2-h4 ???
# 804C915C8920E580 Pg2-g4 ???
# 9DD7B0A819627060 Pf2-f4 13996 15814 46.90%
# 98A8776C1A9B3E65 Pe2-e4 1271383 1108327 53.40%
# 80DCEC6D58853A63 Pd2-d4 686201 550747 55.40%
# 9DE3F5BD01E05A54 Pc2-c4 198730 162554 55.00%
# 9B31EA80413BAD98 Pb2-b4 3965 4551 46.50%
# 9EEDB481C8A96A61 Pa2-a4 ???
The best entry would be:
Code: Select all
# A65545D9385B5A3B Ph2-h3 2 0 100.00%
but, of course, that's not good!
I know that I can filter entry with a low bound (I already do it, in fact) but I would like to know a more correct way.
First, if you have less than 32 games, the statistical significance of wins and losses is so small you should ignore it and rely purely on the computer evaluation. If the computer evaluation is so shallow that at the current time control you can out-think it, then you are officially out of book.
I think the question you are asking is : How to know what the breakpoint is to trust the data?
Clearly, this depends on how good your data is.
For instance from human games, is the data from correspondence chess between world championship candidates? Is it from FICS games between 'beanhead' and 'gizmo' with Elo of 800 and 750?
Is the data from bullet games? Is the data from TCEC games?
If you separate the data you collect into wins/losses and draws by type, then you can gain a lot more from it.
You should also store your own engine's wins/losses/draws from the given position. Even if it is a good position, you may want to avoid it if your engine does not play it well.
So, I think that the best way to form a book is to use as much carefully focused data as possible. Separate the games into their sources and you can statistically calibrate the value of each source. Consider the time control and also the strength of the players/engines.
When you have finely divided data, perform experiments exactly as you would for chess engine evaluation parameters (try different weights and run SPRT tests).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.