Daniel Shawul wrote:You didn't do regressions on the 3 data sets but took numerical average to get the average a=2. Atleast, your later test followed my suggestion to fix a=2 and a=1, which is better but still doesn't fill up a glaring hole that this is no more than just an excersice in data selection to fit a favourite model. I am not sure that we would see the same difference, if I used half for training and half for prediction. I can even see from your one other plot that it is just 1-data point at the top that is making all the 'difference'. You need to predict results (draw/win ratios) from calculated elos, which is why the deltas are important. Your methodology is seriously flawed.
Please don't call your 'data' data. You have what, like 6 data points, to make your regressions with, and you call that more clinical?? And then they are all correlated? And then you didn't do predictions with your draw model, instead you just computed by how much it fits it. That is completely meaningless.
Bayeselo uses three different draw models now. Ordo didn't know shit about draw models before we even showed that Davidson was the better model for computer games. Then the bandwagoners jumped in how ordo is this or that...spare me.
What you are bragging about? You again seem to not understand that I am not using Elos, LogisticElos, BayesElos, whatever. a=1 and a=2 was my first comparison between models, before your smart "advice" (check the other thread in "tournaments"). I do NOT have to train anything, I am NOT using ratings, do you understand that?
My database is smaller, some 50,000 games for some 50 points to fit. But you are again mistaken that the middle points are doing the difference, I was very careful to include the tails. My Elo span in games between engines is 1,500 Elo points in thousands of games, how many of those in CCRL or CEGT databases? Each my point has 1,000 games, how many games each database point has in CCRL?
Rao-Kupper (BayesElo) is completely ruled out by my tests, as chi-square tests show. Do you know how to interpret those tests? Overfitting again?
Then, until I have revealed that Davidson (1 win + 1 loss = 2 draws) does apply to computer chess draw models, you and Remi stayed silent as rats that BayesElo uses a wrong draw model, although you for months knew of that. I am very happy that BayeseElo from now on will have 3 draw models, and your advice (on which we agree) would be: use Davidson. BayesElo could finally offer a good challenge to Ordo. I will check the new BayesElo, Davidson should offer pretty different results from Rao-Kupper, previously used.