fixed nodes testing

bob · Post by **bob** » Thu Jul 21, 2011 9:03 pm

Don wrote:
bob wrote:
Don wrote:Larry and I are experimenting with doing some of our testing at fixed nodes. It seems that time testing is not giving us very consistent results. Fixed depth testing is very consistent but has it's own problems and issues and is not how program naturally play.

I fixed Komodo up to support fixed node levels and now it does this perfectly, searching EXACTLY the number of nodes that the user sets and stopping.

One problem is that the strong programs that we need for testing have lousy support for this. Some don't have this support at all, and others such as stockfish may exceed the nodes by an order of magnitude or more, or else abort much earlier. Because of this, we seem to be reduced to self-testing and we prefer testing against a variety of opponents.

The UCI standard says, "search x nodes only" so I would like to suggest that the authors implement this level correctly. The next release of Komodo will have this working correctly of course. Komodo does require at least a 1 ply search to complete before it will abort, primarily to ensure that an actual move is returned and the result not be totally random.

Does anyone have any experience with this kind of testing? For the most part I think I am aware of the pros and cons but I would like to hear from others on this.
I have played with it some. One down-side is that you eliminate any time-control influence, which means that things like "fail low" or "easy move" become meaningless.

I understand this, but it's our hope that we can deal with time control issues separately using time control games. Our plan is to use this sparingly, when we think it makes most sense.

We have easy move and trigger conditions for searching longer, but presumably these have only the most minor interactions with minor evaluation changes for instance. In other words if we measure an improvement we probably don't need to see if the improvement interacts badly with the easy move algorithm but we know it's possible, for instance it can imagine it happening with certain kinds of aggressive extensions.

Other issue is how many nodes for each program. The same number will give repeatable results, but probably won't accurately reflect the strength difference between the two programs if one searches way slower, but they both get the same number of nodes.

We rarely use the tester to measure other programs (or where we stand relative to them.) Our tester allows us to set time control independently for each entity. We can even mix fixed depth with fischer, etc.

So generally we see how some baseline version tests against 3 foreign opponents, then we test other versions of Komodo to compare how they do under the same conditions.

Another issue is "time skew". If your program speeds up (or slows down) as the game progresses, then you can get bad results.

Yes, I am aware of that problem. In real games we spend less time per move on endings. Since we speedup up in the endings, hopefully it won't be too skewed, but I'm sure it won't be the same.

For example, suppose your program is something like Ferret, which had a 4x-5x speedup as the endgame approaches. That speedup disappears in fixed node testing, you just move faster but search the same sized trees. So in testing with a program like that, you lose significantly in the endgame, because your faster speed does not help you in fixed node testing. If your endgame skills depend on that speed gain, you could well tune the wrong way, because you would want to be more aggressive and do what you can in the middlegame, before you get to the endgame where you effectively slow down and lose more games than expected...
I have not done a careful measurement of how much we speed up in the endgame, if any. If it's modest, then as I said it will hopefully not be too far off. But if the speedup is dramatic, it could be a big issue. I'll check on this.

Thanks for the feedback Bob.

For reference, I believe Crafty is around 2x faster in endgames... Ferret had the most pronounced gain I ever saw...

hgm · Post by **hgm** » Thu Jul 21, 2011 9:04 pm

Well, I would always prefer functionality over performance, and performance over elegance. No matter how elegant a protocol is, if it cannot do what I want, it is utterly useless to me. But that does not mean you can't have all, of course.

Don · Post by **Don** » Thu Jul 21, 2011 11:33 pm

hgm wrote:Well, I would always prefer functionality over performance, and performance over elegance. No matter how elegant a protocol is, if it cannot do what I want, it is utterly useless to me. But that does not mean you can't have all, of course.

Once has to strike some sort of balance of course. A top playing chess program can only be so elegant ...

Onno Garms · Post by **Onno Garms** » Sat Jul 23, 2011 11:13 am

Don wrote: One problem is that the strong programs that we need for testing have lousy support for this.

ACK. I modified some programs (Toga, Strelka) to support fixed nodes. Also some of the strong programs those days (Shredder, Rybka) do support fixed nodes. I filed a feature request to Mark for HIARCS, but he ignored it.

Rybka sometimes hangs on fixed node search. I used a 1 second timeout to work around this. Rybka still responds to the "stop" command.

You can also use weaker engines if you give them more nodes. You will need engine specific node number anyway, for example because of Rybka's node count obfuscation.

The UCI standard says, "search x nodes only" so I would like to suggest that the authors implement this level correctly.

Well, implementing it abolutely correctly would have been difficult in my case. In multi-CPU-mode this would even have been costy in terms of runtime. I considered it sufficient to exceed the given number of nodes just slightly (by less than 100 nodes) and stop deterministically in single-CPU-mode.

Does anyone have any experience with this kind of testing?

I did a lot of testing this way. But I always used both testing by node count and testing by time control.

Your post gives me the chance to promote my fixed node testing tool:
http://www.onnochess.com/ufte.html
I gave some pros of fixed node testing on the website.

Don · Post by **Don** » Sat Jul 23, 2011 1:54 pm

Onno Garms wrote:
Don wrote: One problem is that the strong programs that we need for testing have lousy support for this.
ACK. I modified some programs (Toga, Strelka) to support fixed nodes. Also some of the strong programs those days (Shredder, Rybka) do support fixed nodes. I filed a feature request to Mark for HIARCS, but he ignored it.

Rybka sometimes hangs on fixed node search. I used a 1 second timeout to work around this. Rybka still responds to the "stop" command.

You can also use weaker engines if you give them more nodes. You will need engine specific node number anyway, for example because of Rybka's node count obfuscation.

The UCI standard says, "search x nodes only" so I would like to suggest that the authors implement this level correctly.
Well, implementing it abolutely correctly would have been difficult in my case. In multi-CPU-mode this would even have been costy in terms of runtime. I considered it sufficient to exceed the given number of nodes just slightly (by less than 100 nodes) and stop deterministically in single-CPU-mode.

Does anyone have any experience with this kind of testing?
I did a lot of testing this way. But I always used both testing by node count and testing by time control.

Your post gives me the chance to promote my fixed node testing tool:
http://www.onnochess.com/ufte.html
I gave some pros of fixed node testing on the website.

I also have my own automated tester which I plan to release to the public. It is written in java and is capable of 25 games per second on my i7-980x (6 core) machine. It only works with UCI programs. Although it normally plays all programs against all programs, you can assign a family name to each program and family never plays family. It sets up a separate process to manage each game.

Your list of advantages is very comprehensive, I agree there are many advantages to this. The bulk of our testing is done with time games, but this mode can be very useful.

fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing