Maybe you can help me with this statistics question related to opening books and chess databases.

I have n moves. For each resulting position I have empirical information from past games: I have a list of game-results (win,loss,draw) and accordingly the Elo values of both players. Is there any objective way to rank the moves based on maximizing the likelihood to win?

I was thinking of something similar to ELO values for moves or even LOS statistics.

Thanks for your input!

## Ranking moves based on empirical information

**Moderators:** bob, hgm, Harvey Williamson

**Forum rules**

This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

- Eelco de Groot
**Posts:**4161**Joined:**Sun Mar 12, 2006 1:40 am**Location:**Groningen

### Re: Ranking moves based on empirical information

I think that for each of the moves you can compute a TPR based on the games played with the move and elo of the opposition. But the quality of the data will vary wildly; for some moves there will be only one game and for others 100s of games. Clearly, moves that are played a lot deserve a bit higher ranking than moves with the same TPR but very few games. One simple approach would be I think to have some minimum number of games required, say 50. If a move has only 5 games, you add 45 games with the average score of all the moves in this position against the average opposition (In the position before you make the move. So you have to pool all the games from this position, calculate the win percentage and average opposition). I think that would be a bit better than for instance add 45 draws against the opposition encountered by this move alone. Or you could before this procedure first exclude the moves that seem to be blunders in this position, those are not reasonable moves to compare the other moves with (but on the other hand it dampens the score of moves with few games if you include everything and then you don't need an arbitrary rule to decide which are the blunders in the position). The rule that makes the best book wins

Regards

Eelco

Regards

Eelco

Debugging is twice as hard as writing the code in the first

place. Therefore, if you write the code as cleverly as possible, you

are, by definition, not smart enough to debug it.

-- Brian W. Kernighan

place. Therefore, if you write the code as cleverly as possible, you

are, by definition, not smart enough to debug it.

-- Brian W. Kernighan

### Re: Ranking moves based on empirical information

I had a read into TPR, however your dealing with different quantities seems pretty arbitrary to me.

My idea was to incorporate the sample-size into the variance:

For a move I calculate the average Result and convert it to a ELO value, then for all occurences I can calculate the "Elo of the move" by (Elo_result - Elodelta_players), then I take the average and variance of these datapoints and use the t-distribution to calculate the probability that the average elo of one of the moves is higher.

My idea was to incorporate the sample-size into the variance:

For a move I calculate the average Result and convert it to a ELO value, then for all occurences I can calculate the "Elo of the move" by (Elo_result - Elodelta_players), then I take the average and variance of these datapoints and use the t-distribution to calculate the probability that the average elo of one of the moves is higher.

- Eelco de Groot
**Posts:**4161**Joined:**Sun Mar 12, 2006 1:40 am**Location:**Groningen

### Re: Ranking moves based on empirical information

I think that what I mean by TPR is about the same what you are calculating as Elo of the move; it is the elo based on the results you have available as if the move had played some tournament. I proposed some arbitrary seeming rules because it does not seem so clear to me, what statistics you would want. There are errormargins up and down for every move and the margins do not seem of equal importance to you as the user, where the statistics, without further modification, do not make any distinction. If a simple goal would be to increase as much the accuracy of all the moves in the database, you would start playing those move that you have the least data for. If you base statistics on the largest possible upward margin, the effect would be about the same, those moves with the least data have a large upward margin. Such rules would be fine if you have the luxury of improving your database first by playing many testgames, even if just some low quality but massive numbers of Monte Carlo games. But once you would like to start using the database for games that matter, you would have to almost reverse the strategy. Moves with a large downward error margin are now to be avoided at all costs, because they are way too risky. The chance of giving the game away already in the opening is way too large. You are better off playing moves that at least have a large number of games (and theory) behind them, even if their performance is not so great. So you would need to tailor the statistics to the way you want to use the database, maintenance, testing and expanding the database, or immediate performance, and some simple rules may actually give better results than rather complicated t-tests and such.Edmund wrote:I had a read into TPR, however your dealing with different quantities seems pretty arbitrary to me.

My idea was to incorporate the sample-size into the variance:

For a move I calculate the average Result and convert it to a ELO value, then for all occurences I can calculate the "Elo of the move" by (Elo_result - Elodelta_players), then I take the average and variance of these datapoints and use the t-distribution to calculate the probability that the average elo of one of the moves is higher.

For instance if you don't have the "Elo of the move" yet, because too few games with that move, the "Elo of the position" seems a good substitute. If the position as a whole also has few games played behind it, probably better to choose a different opening. The statistical rules you would need to come up with seem very complicated, but some rules of thumb, are not so hard to devise.

Eelco

Debugging is twice as hard as writing the code in the first

place. Therefore, if you write the code as cleverly as possible, you

are, by definition, not smart enough to debug it.

-- Brian W. Kernighan

place. Therefore, if you write the code as cleverly as possible, you

are, by definition, not smart enough to debug it.

-- Brian W. Kernighan

### Re: Ranking moves based on empirical information

For this instance I am mainly interested in the statistics to be displayed to the analyzing user. So I am especially interested in the error margins. The way this information is displayed in current guis doesn't satisfy me much (usually just #of games and win/draw percentages). This a) fails to cover the relative strength of the two opponents and b) fails to make the moves objectively comparable.

What you can do with the information for automated engine games is a different issue. A straight forward approach could be after leaving the main-book to set a certain threshold and prune every move that for example has a < 5% chance of succeeding against the candidate move. You could also set a certain risk-factor in the options of the engine: 10% risk will choose the move, where the ELO reached with at least 90% certainty is the highest. Furthermore you can use this statistic as a threshold for deciding when to leave the book and start searching.

A lot of unexplored territory. Given the ever increasing databases I think more sound approaches for automatic analysis are worth the effort.

What you can do with the information for automated engine games is a different issue. A straight forward approach could be after leaving the main-book to set a certain threshold and prune every move that for example has a < 5% chance of succeeding against the candidate move. You could also set a certain risk-factor in the options of the engine: 10% risk will choose the move, where the ELO reached with at least 90% certainty is the highest. Furthermore you can use this statistic as a threshold for deciding when to leave the book and start searching.

A lot of unexplored territory. Given the ever increasing databases I think more sound approaches for automatic analysis are worth the effort.