Testing using many computers and architectures

AndrewGrant · Post by **AndrewGrant** » Wed Sep 14, 2016 5:17 pm

Up until now I have only had one computer running tests. Now, I have an additional computer running tests. However, the results from each computer vary greatly.

I am scaling the time control based on a benchmark test run using a version of Ethereal. Note that my suite plays against a handful of engines, not just the current version of my program.

I see a few possible issues...

My benchmark is not sufficient
I need a benchmark for each engine
I could play only against myself, but my engine is too weak to get good results

Does anyone have a similar experience when they started adding additional machines to their testing setup?

bob · Post by **bob** » Wed Sep 14, 2016 6:48 pm

AndrewGrant wrote:Up until now I have only had one computer running tests. Now, I have an additional computer running tests. However, the results from each computer vary greatly.

I am scaling the time control based on a benchmark test run using a version of Ethereal. Note that my suite plays against a handful of engines, not just the current version of my program.

I see a few possible issues...

My benchmark is not sufficient
I need a benchmark for each engine
I could play only against myself, but my engine is too weak to get good results

Does anyone have a similar experience when they started adding additional machines to their testing setup?

I've been testing like this for several years. I have simply normalized CPU speeds by modifying the time control for each architecture class. I don't see this "vary greatly". Unless you are only playing 500 games and then looking at the results. There will always be some variability, but then again, there will be variability if you use identical hardware or even the exact same machine.

AndrewGrant · Post by **AndrewGrant** » Wed Sep 14, 2016 7:28 pm

So are you testing only against other versions of Crafty? Or with a pool of other engines? I'm trying to see if the following is an issue:

My opponent's engine runs faster on some intel chip, but slower on some AMD chip.

The solution to that problem would be to normalize CPU speeds with some benchmark, but for each engine.

bob · Post by **bob** » Wed Sep 14, 2016 7:54 pm

AndrewGrant wrote:So are you testing only against other versions of Crafty? Or with a pool of other engines? I'm trying to see if the following is an issue:

My opponent's engine runs faster on some intel chip, but slower on some AMD chip.

The solution to that problem would be to normalize CPU speeds with some benchmark, but for each engine.

No, I do not trust that kind of testing. I test against a gauntlet of other programs. If you are worried about the minor influence this will have, why not use a deterministic scheduling algorithm so that the same games and opponents are ALWAYS played on the same machine? Then the runs will be more consistent, although I would not expect a big difference anyway. My recent testing has been using wildly variable hardware, i.e. a cluster with a new IBM power chip, an Intel 2660, and some older processors. But when I started I also ran on just one machine, same number of games (30K per test) and I didn't see any more variability that what I saw when running two matches on the same hardware...

Testing using many computers and architectures

Testing using many computers and architectures

Re: Testing using many computers and architectures

Re: Testing using many computers and architectures

Re: Testing using many computers and architectures