It seems like +4 is not conclusive as well. Sometimes some tweaked version has +4, then I reload the self-game and this tweaked version does not win the very next game.abulmo2 wrote: ↑Mon Sep 22, 2025 8:23 amNo, for small number you have to compute directly the trinomial probabilities. The formula 40%/sqrt(N) holds for big numbers (N>30).jaroslav.tavgen wrote: ↑Sun Sep 21, 2025 11:21 pmThank you for clarifying!hgm wrote: ↑Sun Sep 21, 2025 9:04 pm The standard deviation in the score of an N-game match of independent games should be about 40%/sqrt(N). Which for 150 games means about 3%, or 4.5 points. So a deviation of 21 or 11 should be very unlikely.
The explanation is probably that the games are not independent. Bij starting all games from the same position, many games could be identical until after the point where a decisive error is made. To avoid this you could start all games from a different position.
Now I do the following: I play 4 games from different positions (starting position, after 1. e4 e5, after 1. d4 d5 and after 1. c4 e5) also alternating between "tweaked" version being white or black.
40%/sqrt(N) = 0.4 / 2 = 0.2. 4 x 0.2 = 0.8 or 1 point difference.
So I assume that if the "tweaked" version has +3 points after playing those 4 games then I assume that it is stronger and now this "tweaked" version is the new "original" vers
Is this reasonable?
Assuming for 2 engines of equal strength a probabilty of winning Pw = 30%, of drawing Pd = 40% and of losing Pl = 30%; with 4 games, you can compute the following probabilities of having x points:
P(4) = 0.0081
P(3.5)=0.0432
P(3)=0.1188
P(2.5)=0.2064
P(2)=0.247
...(the probabilities are symmetrical).
Thus, the probability of having P(x>=3)=17,01% is too important to conclude anything else that the engines are of equal strength. Only a result of +4 can lead to a conclusion of a superior engine.
How do you organize a self-play?
Moderator: Ras
-
- Posts: 19
- Joined: Fri Oct 25, 2019 2:51 pm
- Full name: Jaroslav Tavgen
Re: How do you organize a self-play?
-
- Posts: 28391
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: How do you organize a self-play?
The formula actually holds for any number, even for N=1. (With the caveat that the 40% is a bit dependent on the fraction of draws, and assumes 32% draw rate.) It is just that you cannot assume the resulting distribution can be approximated by a gaussian. And the situation is not as bad as you say: to have +3 in 4 games the result has to be 3.5-0.5, and P(x>=3.5) according to your table is only about 5%.