I took a look at your original post suggesting D/[µ*(1 - µ)] when you posted it, and I must admit that I liked the idea.
My numeric method works very bad if µ is too close to 0 or 1. My factor correction µ*(1 - µ) was chosen by random at the time I was writing my original post, so it is expected that this factor (and also k) is a source of errors... I do not know if a more adecuate factor could be used instead of µ*(1 - µ), surely yes... but which one? Please let me explain a little more with a extreme case, where my model cracks:
Code: Select all
(+99 =1 -0): µ = 0.995; D = 0.01.
k*µ*(1 - µ) = 1*0.995*0.005 = 0.004975 (µ*(1 - µ) tends to 0 if µ tends to 0 or 1).
------------
(+99 =0 -1): µ = 0.99; D = 0.
k*µ*(1 - µ) = 0*0.99*0.01 = 0.
Code: Select all
(+99 =1 -0): µ = 0.995; D = 0.01.
D/[µ*(1 - µ)] = 0.01/(0.995*0.005) ~ 2.01005.
------------
(+99 =0 -1): µ = 0.99; D = 0.
D/[µ*(1 - µ)] = 0/(0.99*0.01) = 0.
At least I posted that my parameter k*µ*(1 - µ) could be useful in logical Round Robin tournaments where there are not huge Elo differences... a score of 90% means a difference of many Elo between the winner engine (I should suppose that 90% is enough to win unless there are more good engines, making a 'GM vs. novices' senseless Round Robin tournament) but of course I can not say anything in such unbalanced matches.
------------------------
Laskos wrote:I didn't quite get this k factor, and I use only score*(1-score), or µ(1-µ)) in the notation of Jesus when comparing to draw ratio. The problem with the assumption that k*score*(1-score) is somehow constant is evidenced by this:
Score = 0.5, Draw Ratio = d = 0.40
Then k=0.4
k*s*(1-s)=0.1 assumed to be a constant for an engine
Same engine:
Score = 0.9
k is smaller than 1 by definition
Then k*s*(1-s) is smaller than 0.09
There is no k to match the old k*s*(1-s) of 0.1, and even the maximum k=1 is unrealistic, as there would be lots of wins and draws, but no losses.
Good point, but I will try to defend myself a little: I think that the 'draw aversion' not only depends on the engine but also in the opponents (so, it is not constant for every engine in every tournament IMHO). For example, taking SF in a tournament where µ_SF = 50% (and for example D_SF = 40%), it should mean that the rest of engines were already strong for holding against SF; OTOH, if µ_SF = 90% with enough games, I understand that the rest of engines were clearly weaker than SF, and of course SF will avoid more draws because the other engines are not so strong as SF and will blunder more often, fact that SF will take advantage of.
Good try, but your method can have errors when you try to estimate the draw ratio in this way. Please take a look in this slightly modified example exposed by you:Laskos wrote:On the other hand, if the factor is d / s*(1-s) then
Score = 0.5, Draw Ratio = d = 0.40
d / s*(1-s) = 1.6
Same engine:
Score = 0.9
The prediction for the same 1.6 is that d / 0.1*0.9 =! 1.6
Then d=0.09*1.6=0.144
So it predicts a result of 82.8% wins, 14.4% draws, and 2.8% losses, which is pretty realistic.
Therefore I think that d / s*(1-s) a more useful quantity than k*s*(1-s).
Code: Select all
µ = 0.5, D = 0.6 (perfectly possible: +20% =60% -20%).
D/[µ*(1 - µ)] = 0.6/(0.5)² = 2.4
------------
µ' = 0.9; here you suppose D/[µ*(1 - µ)] = D'/[(µ')*(1 - µ')] = 2.4 = constant.
Prediction: D' = 2.4*(µ')*(1 - µ') = 2.4*0.9*0.1 = 0.216 = 21.6%.
But (D')_max = 2*min(µ', 1 - µ') = 2*0.1 = 0.2 < D'.
Code: Select all
w = wins; d = draws; l = loses; n = games = w + d + l.
(Rating difference) = 400*log{[1 + (w - l)/n]/[1 - (w - l)/n]} by definition (just toy with variables).
(Rating performance): win ---> rating + 400; draw ---> rating; lose ---> rating - 400.
(Average rating performance) = <rating> + 400*(w - l)/n; (average opponent's rating: <rating>).
(Rating difference) = (average rating performance) - <rating> = 400*(w - l)/n.
Code: Select all
µ_min. ===> w = 0, l = 2n/3, d = n/3: µ = d/(2n) = 1/6.
µ_max. ===> w = 2n/3, l = 0, d = n/3: µ = (w + d/2)/n = 5/6.
[µ_min., µ_max.] = [1/6, 5/6]; for not being so strict: [0.15, 0.85].
------------------------
I compute my dubious number k*µ*(1 - µ) rounded up to 0.0001, just for comparison (a higher k*µ*(1 - µ) means less 'draw aversion'). I hope no typos:Laskos wrote:Anyway, I ran another test test with adjusted for strength engines (not perfectly adjusted):
Code: Select all
Program Score % Elo Draws 1 Stockfish 2.3.1 : 520.5/834 62.4 3073 32.5 % 2 Komodo 5 : 421.5/842 50.1 3000 35.0 % 3 Rybka 4.1 : 404.5/819 49.4 2996 34.8 % 4 Hiarcs 14 : 415.0/845 49.1 2995 29.1 % 5 Houdini 3 : 394.0/840 46.9 2982 33.6 % 6 Junior 13 : 353.5/838 42.2 2954 26.1 %
And the draw averseness (smaller - more averse) is:
Code: Select all
Engine d / s*(1-s) Junior 13 1.07 Hiarcs 14 1.16 Houdini 3 1.35 Stockfish 2.3.1 1.39 Rybka 4.1 1.39 Komodo 5 1.40
Again, "older style" engines seem more draw-averse.
Kai
Code: Select all
Engine d / s*(1-s) k*µ*(1 - µ)
Junior 13 1.07 0.1791
Hiarcs 14 1.16 0.1508
Houdini 3 1.35 0.19
Stockfish 2.3.1 1.39 0.2697
Rybka 4.1 1.39 0.1783
Komodo 5 1.40 0.1756
Both models (yours and mine) have some drawbacks with non-trivial solutions. I encourage people to find/suggest these solutions because I am sure that I will not get anything better in the case I would think more about it.
It would be nice to compare our results with Adam's, which has worked a lot. 'Draw aversion' is an interesting topic to investigate but it is difficult to reach a plausible criterium!
Regards from Spain.
Ajedrecista.