That is probably a bit better approach that I am using at the moment. However, you probably want to factor in both win/lose ratio as well as percent of total games where this was played so that you don't get off into oddball opening systems which might have a number of draws because of weak opponents or whatever. Another issue is the type of correlation you would see between positions that are different, but only barely so. For example, one where white has played a3 and one where he has not. In thinking about your idea, I think there are three components that have to factor in.Tony wrote:Ah, yes.bob wrote:The issue was "independent or non-correlated results." In a 2-game match on an unbalanced position, the two results are correlated. Think about the extreme point. 100 positions, all unbalanced. So you get 100 wins and 100 losses whatever you change. Now take 200 positions, 100 unbalanced, 100 pretty even. Changes you make are not going to affect the unbalanced results, but will affect the other 100 games. Which set will give the most useful information???Tony wrote:Is this true ?bob wrote:OK. Here you go. First a direct quote from karl:hgm wrote:This is still absolute bullshit. Karl stated that the results would be farther from the truth when you used fewer positions. But they would have been closer to each other, as they used the same small set of positions. Karl's temark that being closer to the truth necessary implies that they were closer to each other was even plain wrong, as my counter-example shows.bob wrote:It now appears that there was a correlation issue, but not one anyone seemed to grasp until Karl came along.
============================================================
can lead to different moves and even different game outcomes. However, we
are doing _almost_ the same thing in each repetition, so although the
results of the 64 repetitions are not perfectly correlated, they are highly
correlated, and far from mathematically independent.
When we do the calculation of the standard deviation, we will not be
understating it by a full factor of 8 as we did in the case of Trials C & D,
but we will still be understating it by almost that much, enough to explain
away the supposed mathematical impossibility. Note that I am specifically
not assuming that whatever changed between Trials E & F gave a systematic
disadvantage to Crafty. I am allowing that the change had a random effect
that sometimes helped and sometimes hurt. My assumption is merely that the
random effect didn't apply to each playout independently, but rather
affected each block of 64 playouts in coordinated fashion.
============================================================
Now, based on that, either (a) "bullshit" is simply the first idea you get whenever you read a post here or (b) you wouldn't recognize bullshit if you stepped in it.
He said _exactly_ what I said he said. Notice the "enough to explain away..." This quote followed the first one I posted from him last week when we started this discussion.
Again, don't buy it at all. If a position is so unbalanced, the two outcomes will be perfectly correlated and cancel out. A single game per position gives twice as many games, hopefully twice as many that are not too unbalanced.This is also wrong. Unbalanced positions are bad no matter if you pair them or not. It becomes more difficult to express a small improvement in a game that you are almost certainly going to lose anyway. The improvement then usually only means you can delay the inevitable somewhat longer.And based on the results so far, his idea of eliminating the black/white pairs may also be a good one, since a pair of games, same players, same position, is going to produce a significant correlation between the positions that are not absolutely equal, or which are not equal with respect to the two opponents.
With equal strength (50% winchance)
1 unbalanced position, played twice : => 1 - 1
1 unbalanced, 1 balanced => 1.5 - 0.5
perfect world result 1 - 1
With unequal strength (100% winchance for 1):
1 unbalanced position, played twice : => 1 - 1
1 unbalanced, 1 balanced 2 possibilities
stronger gets winning position => 2 - 0
weaker gets winning position => 1 - 1
perfect world result 2-0
Tony
Playing 2 matches per position might bring us closer to the "real" elo, but we're only interested in the relative results.
New proposel: How about varying the starting positions ?
Put enormous.pgn in a game database, and play with the 10000 positions that scored closest to 50% (as black and white). Add these games, take the positions closest to 50 % etc.
That way, we improve the chance for "random" positions. ( ie we filter out unbalanced positions.
We could even do this on a per opponent base, to make sure that certain kind of positions a certain opponent handles bad, don't get over valued.
Tony
1. some sort of "chess hamming-distance" which is a measure of how different the positions are, favoring positions with more significant differences.
2. some sort of popularity measure so that you test on mainstream rather than oddball openings.
3. result of the game, so that you pick openings that are pretty balanced rather than one that leads to a quick win or loss every time.
How to do that is a completely different question I obviously have the w/l/d data for each game in PGN form, by counting I could discover how many times it was played compared to other openings, and a basic hamming-distance approach would work although the degree of "difference is not so easy to think about. Some positions with slight differences in piece placement might be extremely different in terms of how they are played. e4/e5 or d4/d5 are two minor changes but major in terms of the ensuing game.
Needs some thought and discussion. And there is also the issue of should you include all openings, or just the ones you are going to actually play in games?