Up until now I have only had one computer running tests. Now, I have an additional computer running tests. However, the results from each computer vary greatly.
I am scaling the time control based on a benchmark test run using a version of Ethereal. Note that my suite plays against a handful of engines, not just the current version of my program.
I see a few possible issues...
My benchmark is not sufficient
I need a benchmark for each engine
I could play only against myself, but my engine is too weak to get good results
Does anyone have a similar experience when they started adding additional machines to their testing setup?
Testing using many computers and architectures
Moderators: hgm, Rebel, chrisw
-
- Posts: 1754
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Testing using many computers and architectures
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Testing using many computers and architectures
I've been testing like this for several years. I have simply normalized CPU speeds by modifying the time control for each architecture class. I don't see this "vary greatly". Unless you are only playing 500 games and then looking at the results. There will always be some variability, but then again, there will be variability if you use identical hardware or even the exact same machine.AndrewGrant wrote:Up until now I have only had one computer running tests. Now, I have an additional computer running tests. However, the results from each computer vary greatly.
I am scaling the time control based on a benchmark test run using a version of Ethereal. Note that my suite plays against a handful of engines, not just the current version of my program.
I see a few possible issues...
My benchmark is not sufficient
I need a benchmark for each engine
I could play only against myself, but my engine is too weak to get good results
Does anyone have a similar experience when they started adding additional machines to their testing setup?
-
- Posts: 1754
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: Testing using many computers and architectures
So are you testing only against other versions of Crafty? Or with a pool of other engines? I'm trying to see if the following is an issue:
My opponent's engine runs faster on some intel chip, but slower on some AMD chip.
The solution to that problem would be to normalize CPU speeds with some benchmark, but for each engine.
My opponent's engine runs faster on some intel chip, but slower on some AMD chip.
The solution to that problem would be to normalize CPU speeds with some benchmark, but for each engine.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Testing using many computers and architectures
No, I do not trust that kind of testing. I test against a gauntlet of other programs. If you are worried about the minor influence this will have, why not use a deterministic scheduling algorithm so that the same games and opponents are ALWAYS played on the same machine? Then the runs will be more consistent, although I would not expect a big difference anyway. My recent testing has been using wildly variable hardware, i.e. a cluster with a new IBM power chip, an Intel 2660, and some older processors. But when I started I also ran on just one machine, same number of games (30K per test) and I didn't see any more variability that what I saw when running two matches on the same hardware...AndrewGrant wrote:So are you testing only against other versions of Crafty? Or with a pool of other engines? I'm trying to see if the following is an issue:
My opponent's engine runs faster on some intel chip, but slower on some AMD chip.
The solution to that problem would be to normalize CPU speeds with some benchmark, but for each engine.