testing: reducing the computational workload

hgm · Post by **hgm** » Mon Aug 18, 2008 7:53 pm

Strange. I was already to doubt my eyesight, but also the browser "find on this page" function can only find the term "major changes" in this last post of yours...

So you must be very happy with my answer, then, which exactly addresses this case. No comment on that? Or did you overlook it in your desperate zeal to fnd something you could be nasty about?

xsadar · Post by **xsadar** » Mon Aug 18, 2008 9:41 pm

bob wrote:OK, different topic. I have about 4,000 positions that I have been using recently, chosen randomly from games played in a large PGN collection, after white and black had both played 10 moves (all positions are currently WTM). The games were played at least 3 or more times so that oddballs were excluded.

now, how to reduce this? I can test most anything, so the issue is to find a good idea and then I can test to watch the fur fly...

Possibilities:

1. play 1 game per position, rather than 2. With alternating colors to balance black vs white effects. Easy to do. This reduces the workload 50%, to about 20,000 games (keeping 5 opponents for new versions of Crafty to play against, 5 x 4,000).

2. play a black/white game but only play each position once. Opponnent 1 would play position 1, opponent 2 would play position 2, etc. This would reduce the workload by 80%, 4,000 positions x 2 games per position x 1 opponent per position or 8,000 games.

3. I could take the first N positions since they are in a fairly random (FEN strings in lexical order) or I could randomly choose N positions from the set. With N=1,000 that reduces the workload by 75%. But this might invite remnants of the old correlation effect back in.

From the options you listed, I would recommend 1 or 2. The reduction in correlation from 1 or 2 may help counteract the reduction in games, even if only slightly. I think number 3 could introduce additional correlation, but only because your set of positions likely contains positions that are similar to each other as well as unbalanced positions. However, even after removing these issues from your set, I think 1 and 2 would still be better options, because they work to remove correlation.

bob · Post by **bob** » Tue Aug 19, 2008 12:32 am

hgm wrote:Strange. I was already to doubt my eyesight, but also the browser "find on this page" function can only find the term "major changes" in this last post of yours...

So you must be very happy with my answer, then, which exactly addresses this case. No comment on that? Or did you overlook it in your desperate zeal to fnd something you could be nasty about?

OK, I can't find it either. But the quote, in one of these threads, went something like this:

I want to find out how many positions it takes to evaluate major changes, minor changes and tiny changes. Major change might be search extensions. Minor change might be null-move or reduction searches. Tiny changes are simple eval changes.

I do not know where it is, but it is buried in one of these threads. It ought to be easier to test major things with fewer games. for example, the oft-asked question "how many elo does parallel search on N processors add to the Elo using just one?" That is a major change since going from 1 to 2 is almost double the speed, ditto for 2 to 4. Doubling is generally considered 50-70 Elo, so running on 8 ought to be doable with far fewer games than what is needed for tiny changes. Yes, my main interest is in tiny changes. And if, by using fewer positions (but still a huge number) then X +/-N where X is pretty constant, even if N is larger, would be of some use, at least for quick-and-dirtys...

BubbaTough · Post by **BubbaTough** » Tue Aug 19, 2008 12:37 am

Just to add a little more confusion to the mix, I thought I bring up that there are not only correlations derived from similar positions (with an independence benefit from having a large/diverse pool of positions) there are result correlations between similar opponents (with an independence benefit from having a large/diverse pool of opponents).

When I have increased the size of my test-runs (on those occasions I actually use them) I never played positions more than twice per opponent even though I admit my results are not perfectly repeatable, but always just added more positions and/or more opponents resulting in more total games. This always seemed good enough independent of theoretical models, because my program seemed to dramatically improve over time hinting that whatever method I was using to throw out bad changes and keep good changes must be working sufficiently.

Efficiency of testing is still a big interest though, as I find it very boring waiting for test runs to finish. Since efficiency seems to be the point of this thread, I thought I would query if any of you smart folks have/can/will develop a methodology / heuristic for determining when to add positions vs. opponents to a testbed (assuming doing either has the same effect on the number of games played). It might make sense if the technique was an equation where you popped in variable (which the user would have to make up based on their "experience") about how correlated a position would be with other entries (for example, results after 1.d4 are reasonable correlated with results after 1. d4 d5) or how correlated a new opponent would be expected to be with the current pool (for example, Crafty 21.5 is probably pretty correlated with Crafty 21.6 in terms of results against my program).

-Sam

testing: reducing the computational workload

Re: testing: reducing the computational workload

Re: testing: reducing the computational workload

Re: testing: reducing the computational workload

Re: testing: reducing the computational workload