A reason for testing at fixed number of nodes.

bob · Post by **bob** » Tue Nov 10, 2009 7:46 pm

michiguel wrote:
bob wrote:
hgm wrote:Decoupling the measurement of strength and speed is very useful. To know if a change improves my engine in time-based play, I would be obliged to implement the change in the maximally optimized way of the most clever algorithm. That would require a lot of effort, and it might all be wasted, because the idea might even be a bust in node-based play.

Testing first in node-based play does allow me to use the most quick and dirty solution I can imagine, I just hack it in without having to pay any attention to efficiency at all. Then the node-based play will tell me how much the idea is worth independent of the quality of the implementation.
Or not, as I have previously explained, because all programs are not constant in their NPS over the course of a game. If you eval change pushes the game toward positions where you are slower, or where your opponent is faster, you get a time advantage you did not think about. Which makes the change look good when it might be better or worse in real timed games. Vice-versa as well.

In experimental science many preliminary experiments are performed not to collect data, but to have an idea what other (if any) experiments should follow.

Testing with nodes may fall in the first category. You keep finding the defects of preliminary experiments, when they are not necessarily supposed to be perfect (if there is such a thing in experimental science...).

Miguel

And from that info I can the get the estimate how much a speed hit would be affordable on the implementation of that idea. And that would give me a pretty clear impression whether that is feasible or not. But most of the time you don't even get to that stage. So it saves tons of time.

What I am trying to point out is that you learn _more_ from using actual time controls, than you learn from fixed node searches. Except perhaps for _very_ simple-minded programs that have a fairly constant speed throughout the game, and when using a set of opponents that have that same characteristic. But if a program varies its speed by 2x or 3x over the course of a game, fixed node searches _will_ introduce an unexpected (and nearly undetectable) effect for the reasons I have given.

I don't really care if an idea is good or bad if it can't be implemented in a way that makes it useful, speed-wise. That just doubles the work, to find that it looks good in a somewhat defective testing methodology, only to find it fails miserably in timed matches because it is too slow. Who really writes code without regard to speed, in the computer chess world? He who does, continually rewrites. Good design up front on an idea addresses both correctness and speed issues at the same time. No point in wasting test time if you already know it can't be done efficiently.

Don · Post by **Don** » Tue Nov 10, 2009 9:43 pm

bob wrote:
michiguel wrote:
bob wrote:
hgm wrote:Decoupling the measurement of strength and speed is very useful. To know if a change improves my engine in time-based play, I would be obliged to implement the change in the maximally optimized way of the most clever algorithm. That would require a lot of effort, and it might all be wasted, because the idea might even be a bust in node-based play.

Testing first in node-based play does allow me to use the most quick and dirty solution I can imagine, I just hack it in without having to pay any attention to efficiency at all. Then the node-based play will tell me how much the idea is worth independent of the quality of the implementation.
Or not, as I have previously explained, because all programs are not constant in their NPS over the course of a game. If you eval change pushes the game toward positions where you are slower, or where your opponent is faster, you get a time advantage you did not think about. Which makes the change look good when it might be better or worse in real timed games. Vice-versa as well.

In experimental science many preliminary experiments are performed not to collect data, but to have an idea what other (if any) experiments should follow.

Testing with nodes may fall in the first category. You keep finding the defects of preliminary experiments, when they are not necessarily supposed to be perfect (if there is such a thing in experimental science...).

Miguel

And from that info I can the get the estimate how much a speed hit would be affordable on the implementation of that idea. And that would give me a pretty clear impression whether that is feasible or not. But most of the time you don't even get to that stage. So it saves tons of time.
What I am trying to point out is that you learn _more_ from using actual time controls, than you learn from fixed node searches. Except perhaps for _very_ simple-minded programs that have a fairly constant speed throughout the game, and when using a set of opponents that have that same characteristic. But if a program varies its speed by 2x or 3x over the course of a game, fixed node searches _will_ introduce an unexpected (and nearly undetectable) effect for the reasons I have given.

I don't really care if an idea is good or bad if it can't be implemented in a way that makes it useful, speed-wise. That just doubles the work, to find that it looks good in a somewhat defective testing methodology, only to find it fails miserably in timed matches because it is too slow. Who really writes code without regard to speed, in the computer chess world? He who does, continually rewrites. Good design up front on an idea addresses both correctness and speed issues at the same time. No point in wasting test time if you already know it can't be done efficiently.

Good design up front is a myth. How many years have you been working on Crafty? Why didn't you just design it correctly in the first place and had it done in a week or two of coding? You've been fooling with that thing for years!

Miguel got it exactly right. If you are good engineer you don't just implement and test, but you are concerned with actually UNDERSTANDING the thing you are experimenting with because it has an impact on how you will proceed. What do you do when one of your tests fail? Do you move on to something else entirely and just give up? Don't you care why it failed? Or do you just try to fix it without knowing why it failed in the first place?

Can you imagine NASA taking this approach to putting a man on the moon?

- Don

hgm · Post by **hgm** » Tue Nov 10, 2009 10:12 pm

bob wrote:What I am trying to point out is that you learn _more_ from using actual time controls, than you learn from fixed node searches.

But that is an irrelevant metruc. It does not matter how much you learn, but how much you larn per unit of invested effort. And in that metric testing by nodes wins big time...

Except perhaps for _very_ simple-minded programs that have a fairly constant speed throughout the game, and when using a set of opponents that have that same characteristic. But if a program varies its speed by 2x or 3x over the course of a game, fixed node searches _will_ introduce an unexpected (and nearly undetectable) effect for the reasons I have given.

Not "and", but "or". If the speed is constant I can use any opponent, as I let the opponent play by time, not by nodes. And if the program is not "simple minded" enough, and speeds up just like all the others, I can play both by nodes, and the effect would cancell out if they had the same characteristic, no matter wat that characteristic was.

Besides, if I had a program that would vary wildly in nps, I would simply make it ly about its node count in nps mode, and the problem would have gone away.

bob · Post by **bob** » Tue Nov 10, 2009 10:51 pm

Don wrote:
bob wrote:
michiguel wrote:
bob wrote:
hgm wrote:Decoupling the measurement of strength and speed is very useful. To know if a change improves my engine in time-based play, I would be obliged to implement the change in the maximally optimized way of the most clever algorithm. That would require a lot of effort, and it might all be wasted, because the idea might even be a bust in node-based play.

Testing first in node-based play does allow me to use the most quick and dirty solution I can imagine, I just hack it in without having to pay any attention to efficiency at all. Then the node-based play will tell me how much the idea is worth independent of the quality of the implementation.
Or not, as I have previously explained, because all programs are not constant in their NPS over the course of a game. If you eval change pushes the game toward positions where you are slower, or where your opponent is faster, you get a time advantage you did not think about. Which makes the change look good when it might be better or worse in real timed games. Vice-versa as well.

In experimental science many preliminary experiments are performed not to collect data, but to have an idea what other (if any) experiments should follow.

Testing with nodes may fall in the first category. You keep finding the defects of preliminary experiments, when they are not necessarily supposed to be perfect (if there is such a thing in experimental science...).

Miguel

And from that info I can the get the estimate how much a speed hit would be affordable on the implementation of that idea. And that would give me a pretty clear impression whether that is feasible or not. But most of the time you don't even get to that stage. So it saves tons of time.
What I am trying to point out is that you learn _more_ from using actual time controls, than you learn from fixed node searches. Except perhaps for _very_ simple-minded programs that have a fairly constant speed throughout the game, and when using a set of opponents that have that same characteristic. But if a program varies its speed by 2x or 3x over the course of a game, fixed node searches _will_ introduce an unexpected (and nearly undetectable) effect for the reasons I have given.

I don't really care if an idea is good or bad if it can't be implemented in a way that makes it useful, speed-wise. That just doubles the work, to find that it looks good in a somewhat defective testing methodology, only to find it fails miserably in timed matches because it is too slow. Who really writes code without regard to speed, in the computer chess world? He who does, continually rewrites. Good design up front on an idea addresses both correctness and speed issues at the same time. No point in wasting test time if you already know it can't be done efficiently.
Good design up front is a myth. How many years have you been working on Crafty? Why didn't you just design it correctly in the first place and had it done in a week or two of coding? You've been fooling with that thing for years!

There we disagree. The basic structure of Crafty has not changed in 15 years. It is still a bitboard-based approach. It has evolved from rotated bitboards to magic bitboards, as one very small change. Small because those parts of the program were designed so that they are encapsulated in a way that makes the bitboard attack generation really independent from the users of that information.

One thing I can tell you I didn't do, is to design it sloppy from the get-go, just to get it working. I spent time on each and every part so that the original implementations were as good as I could make 'em, I could understand what interacted with what and how I could make those interactions more efficient, and so forth.

Good design certainly doesn't happen by accident, I agree. But it _does_ happen. No way one could have predicted the development of magic bitboard operations, rotated bitboards were unheard of at the time and represented a significant jump in bitboard knowledge. But by designing things properly, changing to magic took all of 30 minutes or less. Because of program design.

Miguel got it exactly right. If you are good engineer you don't just implement and test, but you are concerned with actually UNDERSTANDING the thing you are experimenting with because it has an impact on how you will proceed. What do you do when one of your tests fail? Do you move on to something else entirely and just give up? Don't you care why it failed? Or do you just try to fix it without knowing why it failed in the first place?

I am not sure where this is supposed to be heading. I _never_ write code that I don't understand. I never write code without considering speed/performance issues and how they can be addressed. I'm not going to build my first airplane out of concrete because that is the simplest material to work with. None of your questions above make any sense at all to me, in the context of testing. Fixed-node testing, IMHO, provides no useful information that is not subject to significant hidden bias. As a result, I am not interested. Time testing is easy to manage, easy to understand, and only requires that you don't add crappy code to test an idea. And I do not buy the idea that one first wants to write crappy code to see if the idea is good, before writing real code to make it more efficient. Otherwise, your are left with that concrete airplane that won't ever fly, even though the principle flight relies on is known to be valid.

If a test fails, I do my best to understand why. But fixed nodes serves no useful purpose to further that goal that I can see. I know what the code was supposed to address, why it needed addressing, and then it becomes a matter of understanding why it failed. Quite often the basic idea is flawed (search extensions, where too much is not better, or reductions based on history counters which doesn't work well at all) in some basic way that a little analysis can explain.

I don't get your continual implication that we make changes in a vacuum with no idea of whether they are good or not, or if they are bad, why? We often have to iterate on an idea before it works. For the most part, intuition is more than good enough to make us look deeper when something fails yet we thought it was better.

As far as the "do you try to fix it without knowing why it failed?" question, that sounds like a suggestion from a freshman CS student. How can you fix something you can't understand? Put the code in a roomful of monkeys and let 'em make random changes and hope you find something better? We certainly don't develop like that, and never have.

Can you imagine NASA taking this approach to putting a man on the moon?

- Don

Yes I can. They actually did it in fact.

At least they did things very similar to what I have been doing. Nothing happens inside Crafty when testing that we don't understand before moving on. that would defeat the very purpose of our involved testing methodology. And it would make no sense at all.

As far as NASA goes, you ought to watch the "moon or bust" series. They didn't iterate over and over on most of their hardware. They designed it with a specific goal in mind from the get-go. And they designed it such that it would work from the get-go as well.

bob · Post by **bob** » Tue Nov 10, 2009 10:54 pm

hgm wrote:
bob wrote:What I am trying to point out is that you learn _more_ from using actual time controls, than you learn from fixed node searches.
But that is an irrelevant metruc. It does not matter how much you learn, but how much you larn per unit of invested effort. And in that metric testing by nodes wins big time...

We simply do not agree here. Crafty is complex enough that its NPS varies significantly. Making changes and measuring them in a way that discounts the speed issue leads to wrong conclusions. The opponents I use exhibit the same variable speed and are influenced by fixed-node searches as well. I don't want to have to play 40,000 games at fixed nodes, then repeat using a timed match to make sure my change is better when timed, whether or not it was better at fixed nodes. That's 2x the work, not 1/2.

Except perhaps for _very_ simple-minded programs that have a fairly constant speed throughout the game, and when using a set of opponents that have that same characteristic. But if a program varies its speed by 2x or 3x over the course of a game, fixed node searches _will_ introduce an unexpected (and nearly undetectable) effect for the reasons I have given.
Not "and", but "or". If the speed is constant I can use any opponent, as I let the opponent play by time, not by nodes. And if the program is not "simple minded" enough, and speeds up just like all the others, I can play both by nodes, and the effect would cancell out if they had the same characteristic, no matter wat that characteristic was.

Besides, if I had a program that would vary wildly in nps, I would simply make it ly about its node count in nps mode, and the problem would have gone away.

hgm · Post by **hgm** » Tue Nov 10, 2009 11:03 pm

bob wrote:I'm not going to build my first airplane out of concrete because that is the simplest material to work with.

Bad mistake. If you want to do wind-tunnel testing, just to map out the airflow around the rudders it is exactly what you should do. NASA always does this. When they test the lauch vehicle (not sure if it will explode or crash in the sea) they put a dummy on top, never an expensive spacecraft. When they train divers to recover the astronauts after a splash-down, they do not use a command module with electronics in it, or with expensive heat-shield material on the outside. They call that "mock-ups". The space shuttle Enterprise that was used for testing the landing procedure could never fly in space, because they did not bother to glue on the ceramic heat-shield tiles needed for re-entry. (They would have had no effect on the aeodynamics.) When they were testing the Apollo space craft, they were not using the rocket that could propel it to the Moon, because they could obtain the information they wnted in near-earth orbit just as well.

There are really zillions of exampled from the space program alone that refute your philosophy.

Don · Post by **Don** » Tue Nov 10, 2009 11:50 pm

bob wrote:
Don wrote:
bob wrote:
michiguel wrote:
bob wrote:
hgm wrote:Decoupling the measurement of strength and speed is very useful. To know if a change improves my engine in time-based play, I would be obliged to implement the change in the maximally optimized way of the most clever algorithm. That would require a lot of effort, and it might all be wasted, because the idea might even be a bust in node-based play.

Testing first in node-based play does allow me to use the most quick and dirty solution I can imagine, I just hack it in without having to pay any attention to efficiency at all. Then the node-based play will tell me how much the idea is worth independent of the quality of the implementation.
Or not, as I have previously explained, because all programs are not constant in their NPS over the course of a game. If you eval change pushes the game toward positions where you are slower, or where your opponent is faster, you get a time advantage you did not think about. Which makes the change look good when it might be better or worse in real timed games. Vice-versa as well.

In experimental science many preliminary experiments are performed not to collect data, but to have an idea what other (if any) experiments should follow.

Testing with nodes may fall in the first category. You keep finding the defects of preliminary experiments, when they are not necessarily supposed to be perfect (if there is such a thing in experimental science...).

Miguel

And from that info I can the get the estimate how much a speed hit would be affordable on the implementation of that idea. And that would give me a pretty clear impression whether that is feasible or not. But most of the time you don't even get to that stage. So it saves tons of time.
What I am trying to point out is that you learn _more_ from using actual time controls, than you learn from fixed node searches. Except perhaps for _very_ simple-minded programs that have a fairly constant speed throughout the game, and when using a set of opponents that have that same characteristic. But if a program varies its speed by 2x or 3x over the course of a game, fixed node searches _will_ introduce an unexpected (and nearly undetectable) effect for the reasons I have given.

I don't really care if an idea is good or bad if it can't be implemented in a way that makes it useful, speed-wise. That just doubles the work, to find that it looks good in a somewhat defective testing methodology, only to find it fails miserably in timed matches because it is too slow. Who really writes code without regard to speed, in the computer chess world? He who does, continually rewrites. Good design up front on an idea addresses both correctness and speed issues at the same time. No point in wasting test time if you already know it can't be done efficiently.
Good design up front is a myth. How many years have you been working on Crafty? Why didn't you just design it correctly in the first place and had it done in a week or two of coding? You've been fooling with that thing for years!
There we disagree. The basic structure of Crafty has not changed in 15 years. It is still a bitboard-based approach. It has evolved from rotated bitboards to magic bitboards, as one very small change. Small because those parts of the program were designed so that they are encapsulated in a way that makes the bitboard attack generation really independent from the users of that information.

One thing I can tell you I didn't do, is to design it sloppy from the get-go, just to get it working. I spent time on each and every part so that the original implementations were as good as I could make 'em, I could understand what interacted with what and how I could make those interactions more efficient, and so forth.

Good design certainly doesn't happen by accident, I agree. But it _does_ happen. No way one could have predicted the development of magic bitboard operations, rotated bitboards were unheard of at the time and represented a significant jump in bitboard knowledge. But by designing things properly, changing to magic took all of 30 minutes or less. Because of program design.

Miguel got it exactly right. If you are good engineer you don't just implement and test, but you are concerned with actually UNDERSTANDING the thing you are experimenting with because it has an impact on how you will proceed. What do you do when one of your tests fail? Do you move on to something else entirely and just give up? Don't you care why it failed? Or do you just try to fix it without knowing why it failed in the first place?
I am not sure where this is supposed to be heading. I _never_ write code that I don't understand. I never write code without considering speed/performance issues and how they can be addressed.

You are completely ignoring my point. I never claimed you didn't understand the CODE, but you certainly don't understand how fast it is or what impact it has on the strength of the program. If you were this "all knowing" then you would not have to test it. But my accusation is that you don't really know even AFTER testing it why it failed. By your own testimony you DO NOT CARE if it was because the idea was bad or the implementation was slow. I think a good engineer would WANT to know that.

You took what I said, and put a completely different twist on it.

I'm not going to build my first airplane out of concrete because that is the simplest material to work with. None of your questions above make any sense at all to me, in the context of testing. Fixed-node testing, IMHO, provides no useful information that is not subject to significant hidden bias. As a result, I am not interested. Time testing is easy to manage, easy to understand, and only requires that you don't add crappy code to test an idea. And I do not buy the idea that one first wants to write crappy code to see if the idea is good, before writing real code to make it more efficient.

I never advocated writing crappy code! When I test most ideas the code I write is the same code I keep and it's tight. But if the idea is really sophisticated and I anticipate that I will be spending a lot of time getting it perfect, then I write BUG-FREE simple code as a first pass. This is the opposite of crappy code. When the idea is debugged then I optimize it. If you feel like that is stupid and foolish, then you need to go back to school and resign your professorship.

This is actually a well established principle. Ever heard of premature optimization? Now I realize that we are talking about "pedal to the metal" highly optimized chess programs, but it's still a valid idea. When I wrote my move generator I started with a super simple bug-free move generator, used perft to make sure it was right, then I wrote it all over again in an optimized way. I do this with almost every routine in the program that is non-trivial. I use the first routine to check the optimized routine. I even did this with zobrist hashing - I have a slow function that computes the hash from scratch and I used this routine for a while to make sure the real hash agreed with the zobrist hash.

I have to say it seems really odd to have a computer science professor preach that you should always write the most optimized version on the first pass and berate any student for not getting it right the very first time.

Otherwise, your are left with that concrete airplane that won't ever fly, even though the principle flight relies on is known to be valid.

If a test fails, I do my best to understand why. But fixed nodes serves no useful purpose to further that goal that I can see. I know what the code was supposed to address, why it needed addressing, and then it becomes a matter of understanding why it failed.

This is what fixed depth testing does for me. It separates the idea from the implementation so that I know if the implementation was slow or the idea was bad. This just seems to me like elementary detective work that one should want to do. I don't care if you have a different way of doing the same thing, but you have not admitted to it until now. Before it was that it doesn't matter that it failed, time control tells you all you need, and now all of a sudden it matters after all.

Quite often the basic idea is flawed (search extensions, where too much is not better, or reductions based on history counters which doesn't work well at all) in some basic way that a little analysis can explain.

I don't get your continual implication that we make changes in a vacuum with no idea of whether they are good or not, or if they are bad, why? We often have to iterate on an idea before it works. For the most part, intuition is more than good enough to make us look deeper when something fails yet we thought it was better.

As far as the "do you try to fix it without knowing why it failed?" question, that sounds like a suggestion from a freshman CS student. How can you fix something you can't understand? Put the code in a roomful of monkeys and let 'em make random changes and hope you find something better? We certainly don't develop like that, and never have.

You certainly make it seem like that. You described your process as coding up an idea carefully the "right way the first time" and then running time control games to see if it works, and in several emails you said loudly something to the effect of, "I don't care if an idea is good or bad if ....." and this is always in the context of only needing to run time control games because other tests are a big waste of your time.

Can you imagine NASA taking this approach to putting a man on the moon?

- Don
Yes I can. They actually did it in fact. At least they did things very similar to what I have been doing. Nothing happens inside Crafty when testing that we don't understand before moving on. that would defeat the very purpose of our involved testing methodology. And it would make no sense at all.

As far as NASA goes, you ought to watch the "moon or bust" series. They didn't iterate over and over on most of their hardware. They designed it with a specific goal in mind from the get-go. And they designed it such that it would work from the get-go as well.

bob · Post by **bob** » Thu Nov 12, 2009 2:35 am

hgm wrote:
bob wrote:I'm not going to build my first airplane out of concrete because that is the simplest material to work with.
Bad mistake. If you want to do wind-tunnel testing, just to map out the airflow around the rudders it is exactly what you should do. NASA always does this. When they test the lauch vehicle (not sure if it will explode or crash in the sea) they put a dummy on top, never an expensive spacecraft. When they train divers to recover the astronauts after a splash-down, they do not use a command module with electronics in it, or with expensive heat-shield material on the outside. They call that "mock-ups". The space shuttle Enterprise that was used for testing the landing procedure could never fly in space, because they did not bother to glue on the ceramic heat-shield tiles needed for re-entry. (They would have had no effect on the aeodynamics.) When they were testing the Apollo space craft, they were not using the rocket that could propel it to the Moon, because they could obtain the information they wnted in near-earth orbit just as well.

There are really zillions of exampled from the space program alone that refute your philosophy.

Hardly. First, the models are not made out of concrete. Second, the models are not crude approximations, but are incredibly accurate with respect to various features such as airfoil thickness and shape, location of center of pressure, drag coefficients, and such.

Your "zillions of examples" show a pretty serious lack of understanding. I've worked with quite a few fluid dynamics folks at Mississippi State doing experiments in a multi-mach wind tunnel.

yes they test models first, But not for the reasons you suggest. There's no need to go to the expense of making a full-sized prototype complete with engines and instruments. But when they are testing flight characteristics in a wind tunnel, the shape being tested is _not_ a "rough approximation". And that's exactly how I test inside Crafty. Except most of the ideas we test are not hundreds or thousands of lines of code. So we don't need to do a rapid prototype to test soundness. We want a real implementation.

bob · Post by **bob** » Thu Nov 12, 2009 2:58 am

Don wrote:
bob wrote:
Don wrote:
bob wrote:
michiguel wrote:
bob wrote:
hgm wrote:Decoupling the measurement of strength and speed is very useful. To know if a change improves my engine in time-based play, I would be obliged to implement the change in the maximally optimized way of the most clever algorithm. That would require a lot of effort, and it might all be wasted, because the idea might even be a bust in node-based play.

Testing first in node-based play does allow me to use the most quick and dirty solution I can imagine, I just hack it in without having to pay any attention to efficiency at all. Then the node-based play will tell me how much the idea is worth independent of the quality of the implementation.
Or not, as I have previously explained, because all programs are not constant in their NPS over the course of a game. If you eval change pushes the game toward positions where you are slower, or where your opponent is faster, you get a time advantage you did not think about. Which makes the change look good when it might be better or worse in real timed games. Vice-versa as well.

In experimental science many preliminary experiments are performed not to collect data, but to have an idea what other (if any) experiments should follow.

Testing with nodes may fall in the first category. You keep finding the defects of preliminary experiments, when they are not necessarily supposed to be perfect (if there is such a thing in experimental science...).

Miguel

And from that info I can the get the estimate how much a speed hit would be affordable on the implementation of that idea. And that would give me a pretty clear impression whether that is feasible or not. But most of the time you don't even get to that stage. So it saves tons of time.
What I am trying to point out is that you learn _more_ from using actual time controls, than you learn from fixed node searches. Except perhaps for _very_ simple-minded programs that have a fairly constant speed throughout the game, and when using a set of opponents that have that same characteristic. But if a program varies its speed by 2x or 3x over the course of a game, fixed node searches _will_ introduce an unexpected (and nearly undetectable) effect for the reasons I have given.

I don't really care if an idea is good or bad if it can't be implemented in a way that makes it useful, speed-wise. That just doubles the work, to find that it looks good in a somewhat defective testing methodology, only to find it fails miserably in timed matches because it is too slow. Who really writes code without regard to speed, in the computer chess world? He who does, continually rewrites. Good design up front on an idea addresses both correctness and speed issues at the same time. No point in wasting test time if you already know it can't be done efficiently.
Good design up front is a myth. How many years have you been working on Crafty? Why didn't you just design it correctly in the first place and had it done in a week or two of coding? You've been fooling with that thing for years!
There we disagree. The basic structure of Crafty has not changed in 15 years. It is still a bitboard-based approach. It has evolved from rotated bitboards to magic bitboards, as one very small change. Small because those parts of the program were designed so that they are encapsulated in a way that makes the bitboard attack generation really independent from the users of that information.

One thing I can tell you I didn't do, is to design it sloppy from the get-go, just to get it working. I spent time on each and every part so that the original implementations were as good as I could make 'em, I could understand what interacted with what and how I could make those interactions more efficient, and so forth.

Good design certainly doesn't happen by accident, I agree. But it _does_ happen. No way one could have predicted the development of magic bitboard operations, rotated bitboards were unheard of at the time and represented a significant jump in bitboard knowledge. But by designing things properly, changing to magic took all of 30 minutes or less. Because of program design.

Miguel got it exactly right. If you are good engineer you don't just implement and test, but you are concerned with actually UNDERSTANDING the thing you are experimenting with because it has an impact on how you will proceed. What do you do when one of your tests fail? Do you move on to something else entirely and just give up? Don't you care why it failed? Or do you just try to fix it without knowing why it failed in the first place?
I am not sure where this is supposed to be heading. I _never_ write code that I don't understand. I never write code without considering speed/performance issues and how they can be addressed.

You are completely ignoring my point. I never claimed you didn't understand the CODE, but you certainly don't understand how fast it is or what impact it has on the strength of the program. If you were this "all knowing" then you would not have to test it. But my accusation is that you don't really know even AFTER testing it why it failed. By your own testimony you DO NOT CARE if it was because the idea was bad or the implementation was slow. I think a good engineer would WANT to know that.

You took what I said, and put a completely different twist on it.

As did you on my explanations. I _never_ walk away from a change that is worse without knowing why it was worse. _never_. And I have never said otherwise. The tests tell us whether to keep the change or not. If it is a good result, we keep it. If not, we _do_ work to understand why. you make it sound like this is just a group of hackers working together and randomly inserting lines of code to see if we can improve the score. No change we ever make is tried unless we believe it will improve things. If it doesn't, we absolutely figure out why before moving on, as on occasion the idea can be improved.

Absolutely none of this has a single thing to do with fixed node testing however. I do not like that kind of testing, for reasons already explained a dozen times. We run go/no-go tests all the time. But each and every no-go is explained before we move on so that we know why it didn't work (usually a surprise effect we did not expect/anticipate) and we can consider alternatives that address the surprises if possible. Or we say "aha, here's what we overlooked..." and we move on. We do this so that we don't continually make changes that fail for the same reason.

To imply we just change and test shows complete ignorance on what we are doing and how we do it, and I have explained this process a dozen times over the past few years of reporting results.

I'm not going to build my first airplane out of concrete because that is the simplest material to work with. None of your questions above make any sense at all to me, in the context of testing. Fixed-node testing, IMHO, provides no useful information that is not subject to significant hidden bias. As a result, I am not interested. Time testing is easy to manage, easy to understand, and only requires that you don't add crappy code to test an idea. And I do not buy the idea that one first wants to write crappy code to see if the idea is good, before writing real code to make it more efficient.

I never advocated writing crappy code! When I test most ideas the code I write is the same code I keep and it's tight. But if the idea is really sophisticated and I anticipate that I will be spending a lot of time getting it perfect, then I write BUG-FREE simple code as a first pass. This is the opposite of crappy code. When the idea is debugged then I optimize it. If you feel like that is stupid and foolish, then you need to go back to school and resign your professorship.

I never write crappy code either. Crafty is mature enough that there are no "major revisions" done that require a "quick-and-ugly" followed by a "clean-up" if the idea shows improvement. When I did the rewrite a year or two ago to get rid of all the black/white duplication, I did _zero_ testing. Because we were not improving the playing skill, we were improving the readability and maintainability of the code. However, from the days of Cray Blitz until today, I can't think of a single case where I did a "quick and dirty" implementation for anything at all.

I certainly, on occasion, look for ways to optimize things, perhaps incremental update or whatever. And no, I don't always do those up front. But optimization is a different aspect and I don't test after optimization either. I _know_ faster is better if nothing changes.

As far as the hyperbole about "you should go back to school and resign your professorship" that is simply a stupid comment. you know it. I know it. And I know you know it.

now if you want to tell me that you are going to do a "quick-and-dirty" and then test it with fixed nodes, go for it. And then so long as you do a _real_ test with time limits, things will be OK. But if you base decisions on that fixed node search, trouble _will_ show up.

This is actually a well established principle. Ever heard of premature optimization? Now I realize that we are talking about "pedal to the metal" highly optimized chess programs, but it's still a valid idea. When I wrote my move generator I started with a super simple bug-free move generator, used perft to make sure it was right, then I wrote it all over again in an optimized way. I do this with almost every routine in the program that is non-trivial. I use the first routine to check the optimized routine. I even did this with zobrist hashing - I have a slow function that computes the hash from scratch and I used this routine for a while to make sure the real hash agreed with the zobrist hash.

That is all well and good. I _have_ been programming since 1968, so one might think I know how to program, both efficiently and in a way that the code can be maintained. But the above has _nothing_ to do with what we are talking about, which is about making a change and then drawing a conclusion based on fixed node count searches. I was not testing my move generator by playing games and looking at results. I did not test any of the basic mechanics of my program that way. Different problem. Now I am testing different evaluation ideas, changing values, adding terms, removing terms, etc. And those I _do_ care about with respect to testing.

This has _way_ too much hyperbole in it, IMHO.

I have to say it seems really odd to have a computer science professor preach that you should always write the most optimized version on the first pass and berate any student for not getting it right the very first time.

It is equally strange to see a former (current?) grad student making statements that are completely unfounded. I've _never_ mentioned a student here at all, so why/where would I berate them? And who would suggest that I would? Got any students to back up such an asinine statement? Where did I say I write "the most optimized version?" I simply said I write code that is not sloppily slow. Nothing I add to the evaluation at present would add a fraction of 1% to the total search time. Mobility was the most time consuming thing we added and we can't even measure the difference in speed with and without the way we wrote this stuff. So rather than claiming to write "perfectly optimized" we _actually_ write "reasonable code" which is all we need to test whether the idea is good or not.

Otherwise, your are left with that concrete airplane that won't ever fly, even though the principle flight relies on is known to be valid.

If a test fails, I do my best to understand why. But fixed nodes serves no useful purpose to further that goal that I can see. I know what the code was supposed to address, why it needed addressing, and then it becomes a matter of understanding why it failed.
This is what fixed depth testing does for me. It separates the idea from the implementation so that I know if the implementation was slow or the idea was bad. This just seems to me like elementary detective work that one should want to do. I don't care if you have a different way of doing the same thing, but you have not admitted to it until now. Before it was that it doesn't matter that it failed, time control tells you all you need, and now all of a sudden it matters after all.

Quite often the basic idea is flawed (search extensions, where too much is not better, or reductions based on history counters which doesn't work well at all) in some basic way that a little analysis can explain.

I don't get your continual implication that we make changes in a vacuum with no idea of whether they are good or not, or if they are bad, why? We often have to iterate on an idea before it works. For the most part, intuition is more than good enough to make us look deeper when something fails yet we thought it was better.

As far as the "do you try to fix it without knowing why it failed?" question, that sounds like a suggestion from a freshman CS student. How can you fix something you can't understand? Put the code in a roomful of monkeys and let 'em make random changes and hope you find something better? We certainly don't develop like that, and never have.
You certainly make it seem like that. You described your process as coding up an idea carefully the "right way the first time" and then running time control games to see if it works, and in several emails you said loudly something to the effect of, "I don't care if an idea is good or bad if ....." and this is always in the context of only needing to run time control games because other tests are a big waste of your time.
Does not compute. I have not ever said "I don't care ...". If I didn't, I wouldn't be testing. So that makes no sense and I have no idea where it comes from. We (the group of 4 that are working on Crafty and testing various changes) have continual discussions about what to try, why something failed, what might be tried as an alternative, etc...

Can you imagine NASA taking this approach to putting a man on the moon?

- Don
Yes I can. They actually did it in fact. At least they did things very similar to what I have been doing. Nothing happens inside Crafty when testing that we don't understand before moving on. that would defeat the very purpose of our involved testing methodology. And it would make no sense at all.

As far as NASA goes, you ought to watch the "moon or bust" series. They didn't iterate over and over on most of their hardware. They designed it with a specific goal in mind from the get-go. And they designed it such that it would work from the get-go as well.

michiguel · Post by **michiguel** » Thu Nov 12, 2009 6:59 am

bob wrote:
Don wrote:
bob wrote:
Don wrote:
bob wrote:
michiguel wrote:
bob wrote:
hgm wrote:Decoupling the measurement of strength and speed is very useful. To know if a change improves my engine in time-based play, I would be obliged to implement the change in the maximally optimized way of the most clever algorithm. That would require a lot of effort, and it might all be wasted, because the idea might even be a bust in node-based play.

Testing first in node-based play does allow me to use the most quick and dirty solution I can imagine, I just hack it in without having to pay any attention to efficiency at all. Then the node-based play will tell me how much the idea is worth independent of the quality of the implementation.
Or not, as I have previously explained, because all programs are not constant in their NPS over the course of a game. If you eval change pushes the game toward positions where you are slower, or where your opponent is faster, you get a time advantage you did not think about. Which makes the change look good when it might be better or worse in real timed games. Vice-versa as well.

In experimental science many preliminary experiments are performed not to collect data, but to have an idea what other (if any) experiments should follow.

Testing with nodes may fall in the first category. You keep finding the defects of preliminary experiments, when they are not necessarily supposed to be perfect (if there is such a thing in experimental science...).

Miguel

And from that info I can the get the estimate how much a speed hit would be affordable on the implementation of that idea. And that would give me a pretty clear impression whether that is feasible or not. But most of the time you don't even get to that stage. So it saves tons of time.
What I am trying to point out is that you learn _more_ from using actual time controls, than you learn from fixed node searches. Except perhaps for _very_ simple-minded programs that have a fairly constant speed throughout the game, and when using a set of opponents that have that same characteristic. But if a program varies its speed by 2x or 3x over the course of a game, fixed node searches _will_ introduce an unexpected (and nearly undetectable) effect for the reasons I have given.

I don't really care if an idea is good or bad if it can't be implemented in a way that makes it useful, speed-wise. That just doubles the work, to find that it looks good in a somewhat defective testing methodology, only to find it fails miserably in timed matches because it is too slow. Who really writes code without regard to speed, in the computer chess world? He who does, continually rewrites. Good design up front on an idea addresses both correctness and speed issues at the same time. No point in wasting test time if you already know it can't be done efficiently.
Good design up front is a myth. How many years have you been working on Crafty? Why didn't you just design it correctly in the first place and had it done in a week or two of coding? You've been fooling with that thing for years!
There we disagree. The basic structure of Crafty has not changed in 15 years. It is still a bitboard-based approach. It has evolved from rotated bitboards to magic bitboards, as one very small change. Small because those parts of the program were designed so that they are encapsulated in a way that makes the bitboard attack generation really independent from the users of that information.

One thing I can tell you I didn't do, is to design it sloppy from the get-go, just to get it working. I spent time on each and every part so that the original implementations were as good as I could make 'em, I could understand what interacted with what and how I could make those interactions more efficient, and so forth.

Good design certainly doesn't happen by accident, I agree. But it _does_ happen. No way one could have predicted the development of magic bitboard operations, rotated bitboards were unheard of at the time and represented a significant jump in bitboard knowledge. But by designing things properly, changing to magic took all of 30 minutes or less. Because of program design.

Miguel got it exactly right. If you are good engineer you don't just implement and test, but you are concerned with actually UNDERSTANDING the thing you are experimenting with because it has an impact on how you will proceed. What do you do when one of your tests fail? Do you move on to something else entirely and just give up? Don't you care why it failed? Or do you just try to fix it without knowing why it failed in the first place?
I am not sure where this is supposed to be heading. I _never_ write code that I don't understand. I never write code without considering speed/performance issues and how they can be addressed.

You are completely ignoring my point. I never claimed you didn't understand the CODE, but you certainly don't understand how fast it is or what impact it has on the strength of the program. If you were this "all knowing" then you would not have to test it. But my accusation is that you don't really know even AFTER testing it why it failed. By your own testimony you DO NOT CARE if it was because the idea was bad or the implementation was slow. I think a good engineer would WANT to know that.

You took what I said, and put a completely different twist on it.
As did you on my explanations. I _never_ walk away from a change that is worse without knowing why it was worse. _never_. And I have never said otherwise. The tests tell us whether to keep the change or not. If it is a good result, we keep it. If not, we _do_ work to understand why. you make it sound like this is just a group of hackers working together and randomly inserting lines of code to see if we can improve the score. No change we ever make is tried unless we believe it will improve things. If it doesn't, we absolutely figure out why before moving on, as on occasion the idea can be improved.

Absolutely none of this has a single thing to do with fixed node testing however. I do not like that kind of testing, for reasons already explained a dozen times. We run go/no-go tests all the time. But each and every no-go is explained before we move on so that we know why it didn't work (usually a surprise effect we did not expect/anticipate) and we can consider alternatives that address the surprises if possible. Or we say "aha, here's what we overlooked..." and we move on. We do this so that we don't continually make changes that fail for the same reason.

To imply we just change and test shows complete ignorance on what we are doing and how we do it, and I have explained this process a dozen times over the past few years of reporting results.

I'm not going to build my first airplane out of concrete because that is the simplest material to work with. None of your questions above make any sense at all to me, in the context of testing. Fixed-node testing, IMHO, provides no useful information that is not subject to significant hidden bias. As a result, I am not interested. Time testing is easy to manage, easy to understand, and only requires that you don't add crappy code to test an idea. And I do not buy the idea that one first wants to write crappy code to see if the idea is good, before writing real code to make it more efficient.

I never advocated writing crappy code! When I test most ideas the code I write is the same code I keep and it's tight. But if the idea is really sophisticated and I anticipate that I will be spending a lot of time getting it perfect, then I write BUG-FREE simple code as a first pass. This is the opposite of crappy code. When the idea is debugged then I optimize it. If you feel like that is stupid and foolish, then you need to go back to school and resign your professorship.
I never write crappy code either. Crafty is mature enough that there are no "major revisions" done that require a "quick-and-ugly" followed by a "clean-up" if the idea shows improvement. When I did the rewrite a year or two ago to get rid of all the black/white duplication, I did _zero_ testing. Because we were not improving the playing skill, we were improving the readability and maintainability of the code. However, from the days of Cray Blitz until today, I can't think of a single case where I did a "quick and dirty" implementation for anything at all.

I certainly, on occasion, look for ways to optimize things, perhaps incremental update or whatever. And no, I don't always do those up front. But optimization is a different aspect and I don't test after optimization either. I _know_ faster is better if nothing changes.

As far as the hyperbole about "you should go back to school and resign your professorship" that is simply a stupid comment. you know it. I know it. And I know you know it.

now if you want to tell me that you are going to do a "quick-and-dirty" and then test it with fixed nodes, go for it. And then so long as you do a _real_ test with time limits, things will be OK. But if you base decisions on that fixed node search, trouble _will_ show up.

This is not the point...
The idea is to gather information, not necessarily the same information you get with time-based testing.

Miguel

A reason for testing at fixed number of nodes.

Re: A reason for testing at fixed number of nodes.

Re: A reason for testing at fixed number of nodes.

Re: A reason for testing at fixed number of nodes.

Re: A reason for testing at fixed number of nodes.

Re: A reason for testing at fixed number of nodes.

Re: A reason for testing at fixed number of nodes.

Re: A reason for testing at fixed number of nodes.

Re: A reason for testing at fixed number of nodes.

Re: A reason for testing at fixed number of nodes.

Re: A reason for testing at fixed number of nodes.