A testing mystery

lkaufman · Post by **lkaufman** » Fri Dec 16, 2011 6:22 pm

I observed something in testing today that is unlike anything I've ever observed. I tested a new version of Komodo (call it Y) against our normal version (call it X). In fixed-depth testing at several depths Y was between 3 and 4% faster than X, with a corresponding Elo loss. But in timed play averaging about the same depths as the fixed depth tests, the average depth reached by Y was about .15 ply more, which implies roughly a 10% speedup.
How can a change that is only 3-4% faster in fixed depth be 10% faster in timed play at similar levels? Any theories are welcome.

bob · Post by **bob** » Fri Dec 16, 2011 6:27 pm

lkaufman wrote:I observed something in testing today that is unlike anything I've ever observed. I tested a new version of Komodo (call it Y) against our normal version (call it X). In fixed-depth testing at several depths Y was between 3 and 4% faster than X, with a corresponding Elo loss. But in timed play averaging about the same depths as the fixed depth tests, the average depth reached by Y was about .15 ply more, which implies roughly a 10% speedup.
How can a change that is only 3-4% faster in fixed depth be 10% faster in timed play at similar levels? Any theories are welcome.

You can change the effective branching factor as well as the speed. 3-4% speed increase + 2-3-4% reduction in EBF and you are at that "10% faster to a specific depth" figure.

I don't see this as mysterious at all. It is the classic trade-off. Sometimes you reduce knowledge, increase speed, and improve the EBF, and show a gain. This has happened to me many times. Often, however, there is a faster way to implement the knowledge so that you don't see a speed loss, and then you have to choose between slightly smarter with slightly bigger EBF, or slightly dumber with slightly smaller EBF. When they offset, that's an interesting choice.

lkaufman · Post by **lkaufman** » Fri Dec 16, 2011 9:34 pm

Unfortunately your explanation does not seem applicable in this case, because the speedup seemed to be fairly constant at different depths. Therefore the effective branching factor was almost identical between the two versions.
My current hypothesis is that the change affected the endgame much more than the middlegame, and since the endgame takes up little time in fixed-depth play the average percentage speedup was not properly measured by the percentage speedup in total time taken.

Ron Langeveld · Post by **Ron Langeveld** » Fri Dec 16, 2011 9:53 pm

lkaufman wrote:I observed something in testing today that is unlike anything I've ever observed. I tested a new version of Komodo (call it Y) against our normal version (call it X). In fixed-depth testing at several depths Y was between 3 and 4% faster than X, with a corresponding Elo loss. But in timed play averaging about the same depths as the fixed depth tests, the average depth reached by Y was about .15 ply more, which implies roughly a 10% speedup.
How can a change that is only 3-4% faster in fixed depth be 10% faster in timed play at similar levels? Any theories are welcome.

Not being a chess programmer I don't know whether or not it is appropriate to speculate about a possible explanation. Maybe an outsider idea can point you in the right direction. This is just a basic thought of mine and not intended to start a debate about it :

Could it be that fixed depth play (for example x=12 plies) does not skip some sort of pruning logic when it is close to reaching the fixed depth, whereas timed play would trigger this 'skip-pruning' logic because it is already at depth x minus 1 or 2 plies ? As a result fixed play would do some irrelevant pruning (maybe aborted) whereas timed play would only do the pruning within the set constraints.

lkaufman · Post by **lkaufman** » Fri Dec 16, 2011 10:09 pm

Ron Langeveld wrote:
lkaufman wrote:I observed something in testing today that is unlike anything I've ever observed. I tested a new version of Komodo (call it Y) against our normal version (call it X). In fixed-depth testing at several depths Y was between 3 and 4% faster than X, with a corresponding Elo loss. But in timed play averaging about the same depths as the fixed depth tests, the average depth reached by Y was about .15 ply more, which implies roughly a 10% speedup.
How can a change that is only 3-4% faster in fixed depth be 10% faster in timed play at similar levels? Any theories are welcome.
Not being a chess programmer I don't know whether or not it is appropriate to speculate about a possible explanation. Maybe an outsider idea can point you in the right direction. This is just a basic thought of mine and not intended to start a debate about it :

Could it be that fixed depth play (for example x=12 plies) does not skip some sort of pruning logic when it is close to reaching the fixed depth, whereas timed play would trigger this 'skip-pruning' logic because it is already at depth x minus 1 or 2 plies ? As a result fixed play would do some irrelevant pruning (maybe aborted) whereas timed play would only do the pruning within the set constraints.

Actually that could happen, though it is probably not relevant to what we were doing. For example, suppose Singular Extension only kicks in at depth 9 plies. If we change the algorithm to extend more, and test at 9 ply, it would show no slowdown (but also no benefit), whereas in timed play averaging 9 ply it would show a slowdown in terms of average depth reached. So your answer is not a silly one, but as we were not testing a change to singular extension (or anything with a similar cutoff), it seems unlikely. It could be that our change interacted with singular extension and therefore your idea could have some validity, but I cannot imagine that this second-order effect could cause a near-tripling of the speedup.

bob · Post by **bob** » Sat Dec 17, 2011 12:19 am

lkaufman wrote:Unfortunately your explanation does not seem applicable in this case, because the speedup seemed to be fairly constant at different depths. Therefore the effective branching factor was almost identical between the two versions.
My current hypothesis is that the change affected the endgame much more than the middlegame, and since the endgame takes up little time in fixed-depth play the average percentage speedup was not properly measured by the percentage speedup in total time taken.

I'm not sure why the "speedup" would vary. If you simply speed code up, as measured by NPS, the change should be constant unless you change something that only affects the opening or maybe endgame. If you improve the EBF, fixed depth won't show the improvement. Which is why I detest that kind of testing... If you improve EBF and you measure speedup as time to depth rather than NPS, then the improved version should go deeper, given the same amount of time...

Fixed depth testing is not something I would consider, ever. Perhaps fixed number of nodes would work, but then that fails if all you did was speed up the NPS, because you don't gain from the faster speed with a node count limit...

rbarreira · Post by **rbarreira** » Sat Dec 17, 2011 1:01 am

I'm curious why you use fixed depth tests with all its problems such as the drastic one you highlighted regarding different time taken in middlegame vs endgame.

I don't see a single advantage of fixed depth when compared to a fixed time/fixed nodes testing strategy (using fixed nodes only when NPS doesn't change or when you want to see the raw impact of a new eval feature which is not optimized yet... and even then it has the problem of also potentially speeding up in the endgame).

lkaufman · Post by **lkaufman** » Sat Dec 17, 2011 4:04 am

rbarreira wrote:I'm curious why you use fixed depth tests with all its problems such as the drastic one you highlighted regarding different time taken in middlegame vs endgame.

I don't see a single advantage of fixed depth when compared to a fixed time/fixed nodes testing strategy (using fixed nodes only when NPS doesn't change or when you want to see the raw impact of a new eval feature which is not optimized yet... and even then it has the problem of also potentially speeding up in the endgame).

Mostly we just use fixed depth testing to see the properties of a change, to make sure it does what we expect. For example if we do more severe pruning we expect the time to go down as well as the Elo. However in view of what happened today we will probably do even less of it. I think it is fine for eval changes that have minimal impact on the search. Fixed time or nodes has the drawback of being "wasteful", in that you spend time starting iterations and cutting off without finishing; for example the score drops but you don't take the time to find a better move. So there are pros and cons.

lkaufman · Post by **lkaufman** » Sat Dec 17, 2011 4:08 am

bob wrote:
lkaufman wrote:Unfortunately your explanation does not seem applicable in this case, because the speedup seemed to be fairly constant at different depths. Therefore the effective branching factor was almost identical between the two versions.
My current hypothesis is that the change affected the endgame much more than the middlegame, and since the endgame takes up little time in fixed-depth play the average percentage speedup was not properly measured by the percentage speedup in total time taken.
I'm not sure why the "speedup" would vary. If you simply speed code up, as measured by NPS, the change should be constant unless you change something that only affects the opening or maybe endgame. If you improve the EBF, fixed depth won't show the improvement. Which is why I detest that kind of testing... If you improve EBF and you measure speedup as time to depth rather than NPS, then the improved version should go deeper, given the same amount of time...

Fixed depth testing is not something I would consider, ever. Perhaps fixed number of nodes would work, but then that fails if all you did was speed up the NPS, because you don't gain from the faster speed with a node count limit...

I was using speedup in time per iteration. It does show a change in EBF, for example if our new version is 3% faster at 9 ply and 5% faster at 10 ply, then the EBF (in that region at least) is 2% less. I think it does have merit, and Komodo has reached #2 status with fairly heavy reliance on it. But after what I saw today, I'm not so keen on it anymore.

A testing mystery

A testing mystery

Re: A testing mystery

Re: A testing mystery

Re: A testing mystery

Re: A testing mystery

Re: A testing mystery

Re: A testing mystery

Re: A testing mystery

Re: A testing mystery