Testing using many computers and architectures

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

AndrewGrant
Posts: 1754
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Testing using many computers and architectures

Post by AndrewGrant »

Up until now I have only had one computer running tests. Now, I have an additional computer running tests. However, the results from each computer vary greatly.

I am scaling the time control based on a benchmark test run using a version of Ethereal. Note that my suite plays against a handful of engines, not just the current version of my program.

I see a few possible issues...

My benchmark is not sufficient
I need a benchmark for each engine
I could play only against myself, but my engine is too weak to get good results

Does anyone have a similar experience when they started adding additional machines to their testing setup?
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Testing using many computers and architectures

Post by bob »

AndrewGrant wrote:Up until now I have only had one computer running tests. Now, I have an additional computer running tests. However, the results from each computer vary greatly.

I am scaling the time control based on a benchmark test run using a version of Ethereal. Note that my suite plays against a handful of engines, not just the current version of my program.

I see a few possible issues...

My benchmark is not sufficient
I need a benchmark for each engine
I could play only against myself, but my engine is too weak to get good results

Does anyone have a similar experience when they started adding additional machines to their testing setup?
I've been testing like this for several years. I have simply normalized CPU speeds by modifying the time control for each architecture class. I don't see this "vary greatly". Unless you are only playing 500 games and then looking at the results. There will always be some variability, but then again, there will be variability if you use identical hardware or even the exact same machine.
AndrewGrant
Posts: 1754
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Testing using many computers and architectures

Post by AndrewGrant »

So are you testing only against other versions of Crafty? Or with a pool of other engines? I'm trying to see if the following is an issue:

My opponent's engine runs faster on some intel chip, but slower on some AMD chip.

The solution to that problem would be to normalize CPU speeds with some benchmark, but for each engine.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Testing using many computers and architectures

Post by bob »

AndrewGrant wrote:So are you testing only against other versions of Crafty? Or with a pool of other engines? I'm trying to see if the following is an issue:

My opponent's engine runs faster on some intel chip, but slower on some AMD chip.

The solution to that problem would be to normalize CPU speeds with some benchmark, but for each engine.
No, I do not trust that kind of testing. I test against a gauntlet of other programs. If you are worried about the minor influence this will have, why not use a deterministic scheduling algorithm so that the same games and opponents are ALWAYS played on the same machine? Then the runs will be more consistent, although I would not expect a big difference anyway. My recent testing has been using wildly variable hardware, i.e. a cluster with a new IBM power chip, an Intel 2660, and some older processors. But when I started I also ran on just one machine, same number of games (30K per test) and I didn't see any more variability that what I saw when running two matches on the same hardware...