Fixed nodes games and the pentanomial model.
Posted: Sat Dec 29, 2018 9:59 am
I was going to experiment with this on fishtest but I am currently a bit busy (moreover it is a tradition, started by Marco Costalba, to forcibly deride _any_ kind of research on fishtest even for those who contribute more games than they use, so it is always an uphill battle).
I wonder if it has been tried to combine self testing, the pentanomial model and fixed node games. In the case of self testing of closely related engines with fixed nodes the reversed color return games will (supposedly) be heavily correlated to the original games. This will compress the score towards 50% but the pentanomial variance will also be heavily reduced. So in the end it could be a win in normalized elo which is all that matters for the efficiency of engine testing. Here is a note on normalized elo: http://hardy.uhasselt.be/Toga/normalized_elo.pdf (the pentanomial model is discussed in section 4). It seems not possible to predict theoretically if this would be a win or not since elo models do not take into account correlation.
A similar idea could be applied when testing against a third party engine. It is common wisdom that in that case one needs 4 times the number of games to attain the same resolution. However this is only true if the games against the third engine are uncorrelated. If we use fixed nodes and the same openings then presumably we create correlation and we should again use the pentanomial model (this time for the score differences) to get a more accurate variance for the score differences.
I wonder if it has been tried to combine self testing, the pentanomial model and fixed node games. In the case of self testing of closely related engines with fixed nodes the reversed color return games will (supposedly) be heavily correlated to the original games. This will compress the score towards 50% but the pentanomial variance will also be heavily reduced. So in the end it could be a win in normalized elo which is all that matters for the efficiency of engine testing. Here is a note on normalized elo: http://hardy.uhasselt.be/Toga/normalized_elo.pdf (the pentanomial model is discussed in section 4). It seems not possible to predict theoretically if this would be a win or not since elo models do not take into account correlation.
A similar idea could be applied when testing against a third party engine. It is common wisdom that in that case one needs 4 times the number of games to attain the same resolution. However this is only true if the games against the third engine are uncorrelated. If we use fixed nodes and the same openings then presumably we create correlation and we should again use the pentanomial model (this time for the score differences) to get a more accurate variance for the score differences.