another interesting cluster test result

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
xsadar
Posts: 147
Joined: Wed Jun 06, 2007 10:01 am
Location: United States
Full name: Mike Leany

Re: another interesting cluster test result

Post by xsadar »

bob wrote:
xsadar wrote:
bob wrote:The current set of opponents is Stockfish, glaurung 2.x (last released version whatever that is), Toga (most recent version), fruit 2.something and glaurung 1.
I have always been under the impression that you were using 5 unrelated engines rather than a few versions of 2 unrelated engines. Wouldn't you expect there to be a correlation between the games against the three different versions of glaurung/stockfish? And wouldn't you expect that to affect your overall results? Of course the fact that you use 4000 starting positions helps a lot, but it still makes me wonder if your results are as accurate as you think they are.

It seems a little to me like doing cancer research with 500 participants where 300 are related to me and 200 are related to you, then trying to generalize the results to everybody. It doesn't make sense, and I can't imagine any scientist ever doing that. They want the participants to be as diverse as possible.
Possibly. Fruit and toga play significantly differently. Glaurung 1 and 2 are significantly different. And stockfish plays nothing like the version of G2 I am using.

My primary concern is that the program(s) I use have to be reliable. A few can't deal with fast time controls. A few misbehave in other ways. I would be more concerned with just using one program of course. And I have a few others I have thrown in to the mix from time to time, but I do not want too many that are significantly weaker than Crafty as that doesn't provide much useful information. The results to date have clearly shown improvement on every testing tournament I have seen. so optimal? Probably not. But working? Yep.

Finding reliable opponents that work correctly on unix is a problem. Most programs are windows-based, and I can't run windows applications on our linux cluster. I have used arasan, gnuchess, the infamous ippolit, etc. ippolit would be a good opponent but it is beyond unreliable and I completely removed it.
That makes sense I suppose. If you can't find enough reliable opponents in the appropriate strength range, I guess you have to make do with the ones you have, even if they're related.