I did. Balanced is still best according the model I am using although the difference is small.
When I have time I will redo the computation for Davidson. Do you know what is a sensible value for d/sqrt(w*l) ? I have no experience with Davidson.
Say 6 would be fine. I am not sure about the model, you seem to use drawelo for both draw model and unbalanced positions model, and in high values of drawelo, where the model becomes dubious.
I did. Balanced is still best according the model I am using although the difference is small.
When I have time I will redo the computation for Davidson. Do you know what is a sensible value for d/sqrt(w*l) ? I have no experience with Davidson.
Say 6 would be fine. I am not sure about the model, you seem to use drawelo for both draw model and unbalanced positions model, and in high values of drawelo, where the model becomes dubious.
Thanks! I'll try 6 then.
I am just using the standard BE model. The "Advantage" parameter which is supposed to measure the white advantage in the opening position is now used to measure the unbalancedness of the positions that are being used.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Thanks!
Even if I trust more Davidson and even if it is correctly describing the draw model, the opening positions model is still drawelo based here. As these things seem to be model dependent, I took a database of 30,000 games of closely related recent Stockfishes, and performed manipulations with the python tool provided several months ago by Ferdinand. The initial openings are from 2moves_v2.pgn used in SF testing framework.
I can confirm your findings although I do not quite understand your numbers. Perhaps I made an arithmetic mistake since I did it quickly before going to work. Here is what I get (with hopefully self explanatory notations)
you were giving the games where the bias was respectively in [0.00,0.40] and [-0.40,0.00] for SF1. But now I see that this is impossible since the total number of games is only 30,000.
However if it concerns the games where the bias is in the interval [-0.40,0.40] then we should have wins(SF1)=losses(SF2) which is not satisfied.
Can you clarify?
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
you were giving the games where the bias was respectively in [0.00,0.40] and [-0.40,0.00] for SF1. But now I see that this is impossible since the total number of games is only 30,000.
However if it concerns the games where the bias is in the interval [-0.40,0.40] then we should have wins(SF1)=losses(SF2) which is not satisfied.
Can you clarify?
Actually, Gcnt shows how many times a position with SF1 eval (respectively SF2 eval) in the interval [0.00,0.40] occurred in 30,000 "real" games between moves [6,8]. If the eval of [0.00,0.40] occurs at each of the moves 6,7,8 of a game, it will count inside Gcnt all 3 times as 3 independent positions. Evals of [-0.40,00] do not appear at all. Then, for each picked position inside Gcnt it assigns the result of that game where this position occurred.
Ok. I understand. I was assuming the eval was provided by a referee engine to avoid bias. But what you are doing might be ok.
But if you are really talking consecutive positions from the same game and counting them as different games then there are too may unknowns I think to do a valid statistical analysis. For one thing the computation I did to check that the result is significant would no longer be valid.
The (theoretical) statistical analysis I did is for an eng1-eng2 match with the unbalancedness of the end of book position being +-A with equal probability. (*)
If we want to refute the theoretical result we should do it under those conditions.
(*) The reason for these conditions is that the analysis can be tested empirically without reference to any elo model as the outcome of the match is controlled by a trinomial distribution and under the null hypothesis we have w=l. +-A is not so important but care should be taking that the average bias is zero from the POV of eng1, for example by assigning sides randomly. Replaying games with different colors is also ok (with the caveat that has already been discussed).
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
2) Humans still play the game (without consulting the engines).
3) There are many, many paths to all three results.
Exhibit A: Magnus Carlsen - has made his way to World Champion and Number 1 Rating often using "sub-optimal" moves in the opening. He has stated he wants to just "play chess", and I would say thus far has proven very capable at achieving exactly that.
The only problem I can foresee is a savant with a photographic memory who is capable of memorizing billions of computer evals and is also GM strength in natural chess playing ability. That seems pretty unlikely, but even if someone like that were to come along.... fair play to them.