It seems Delphil 2.9g x64 1CPU cannot reach the Rating of 32bit Version: wrong ?
I'm not sure what you are asking. When you play a very short match the results cannot be trusted - the longer the match the more confidence you can have in the final result. This is not just about who is stronger, but what is the relative difference between them.
This is because who wins a game has much randomness built in. If you and I play a game and you are 50 ELO stronger, I still have a good chance of beating you. 50 ELO isn't much, all it means is that you are slightly more likely to win.
It doesn't mean the result are wrong, it only means that you should not place too much confidence in the answer.
In my high school, if you played a game against someone and beat them, it was assumed that you were the strongest player. A 1 match sample doesn't prove anything. However that doesn't mean you are not stronger, it just means you need a lot more games to prove it.
I like tennis. Same thing. Sometimes a match goes 5 sets. The player who wins the match did not win every game or even every set.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Don wrote:
You may notice that in some cases
someone will make a minor modification to some open source program and
based on some 100 game match declare a breakthrough.
This is SO true !
The world is swamped with derivatives of IvanHoe and Stockfish, that claim to be some kind of breakthrough, because they solved this position faster than the original, or whatever futile criterion they may find.
But proper testing always confirms that the derivatives are weaker than the original. It takes a lot more work and humility to improve significantly on something like IvanHoe or Stockfish...
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
That looks quite close to what it should look like, which is the quantile function of the logistic distribution.
That also looks a lot like the cumulative of the normal distribution with standard deviation 14 Elo points, i.e erf[z/14], or even better normalized (1+erf(z/14))/2. It shows that the results in Don's test are pretty normally distributed.
The world is swamped with derivatives of IvanHoe and Stockfish, that claim to be some kind of breakthrough, because they solved this position faster than the original, or whatever futile criterion they may find.
Yes, unfortunately.
Some creative guys spit 3000+ engines like a volcano, those who never had to debug their move generator of course; no problem as long as they comply with the license, sure.
What I don't understand however is that these get room in some tournaments (yes I'm referring to CCRL - PS I wonder why there's still the grey area - quite pointless nowadays IMHO
So basically they test x versions of the same engine, just renamed and "improved". All that remains is to test hexedited masterpieces.
There is no much need to test this, as there is a theoretical background on these normal distributions. If here A=3003 is the average Elo, N=120 the number bins, S=14 points = standard deviation for each bin (300 games), then one needs only to plot the function N*(erf[(x - A)/S] + 1)/2.
Yet, it was another way to demonstrate the result of a match is itself a random variable, and that short matches have more variability than longer matches.
That looks quite close to what it should look like, which is the quantile function of the logistic distribution.
That also looks a lot like the cumulative of the normal distribution with standard deviation 14 Elo points, i.e erf[z/14], or even better normalized (1+erf(z/14))/2. It shows that the results in Don's test are pretty normally distributed.
Kai
Yeah, I agree. I had the Elo model on my brain, and so I wrote logistic distribution. Though, the logit and probit functions are quite much alike.
That looks quite close to what it should look like, which is the quantile function of the logistic distribution.
That also looks a lot like the cumulative of the normal distribution with standard deviation 14 Elo points, i.e erf[z/14], or even better normalized (1+erf(z/14))/2. It shows that the results in Don's test are pretty normally distributed.
I do want to point out that I usually plot using gnuplot with curve smoothing. "smooth bezier."
The lines look the same, they are just a little more jagged.
Kai
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
There is no much need to test this, as there is a theoretical background on these normal distributions. If here A=3003 is the average Elo, N=120 the number bins, S=14 points = standard deviation for each bin (300 games), then one needs only to plot the function N*(erf[(x - A)/S] + 1)/2.
The reason I actually ran the numbers is because of the audience I was trying to appeal to - they won't necessarily respect theory but need to see data from real games.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
There is no much need to test this, as there is a theoretical background on these normal distributions. If here A=3003 is the average Elo, N=120 the number bins, S=14 points = standard deviation for each bin (300 games), then one needs only to plot the function N*(erf[(x - A)/S] + 1)/2.
The reason I actually ran the numbers is because of the audience I was trying to appeal to - they won't necessarily respect theory but need to see data from real games.
Ok, I understand, though you are fighting a loosing battle with the audience, they won't listen