Repetition detection structure.

Uri Blass · Post by **Uri Blass** » Mon Feb 25, 2008 6:10 am

bob wrote:
hgm wrote:Seems to me you are trying to argue that white is black, here. Your last statement is just plain nonsense: perhaps the return can be small, but the return on investment can be infinite if the investment is infinitesimal. So the "regardless the investment" does not belong there.
I am just following standard software development practice, which says to spend time optimizing the parts of the program that consume the most compute cycles, _not_ the ones that consume fractions of a percentage point. This has always been, and will always be the sane way to optimize programs...

I have no idea why you don't get that. Fortunately, most of the rest of the programming world does. Feel free to show me a quote from any programming book that would suggest rewriting and debugging a procedure that takes 0.5% of tht total execution time, when there are other procedures that take 30% and up...

My comments about it.

When I agree that it is not always good to spend time on parts that take more time because the investment is also important I still think that I will not spend significant time about making the program 0.5% faster.

speed is not everything and there are better things to do then to look for every small speed improvement that you can get.

It may be better to optimize a part that takes 20% of the time and not to optimize a part that takes 30% of the time because the job of optimizing the part that takes 30% of the time is harder but I think that it is better not to care at all about part that takes 0.5% of the time unless there are many parts like that and I do not think that chess programs has many parts that take 0.5% of the time.

Another point is that I am not sure if it is possible to give percentage of the time that the program does task X because it is possible that the program does task X and task Y at the same time and cannot continue from task X to the next task before finishing task Y.

In that case if task Y takes more time than task X you will not get a speed improvement for making task X faster so it may be better to optimize task Y first.

Uri

pijl · Post by **pijl** » Mon Feb 25, 2008 10:28 am

mathmoi wrote:Is it possible to use something else, like a small hash table or event a binary tree?

I guess that what I'm asking is. Is there other ways to look for repetition detection or does everyone use a simple array?

In the Baron I am using repetition checks not only in the regular nodes, but also in quiescense and during ETC (if you are producing the hashkeys for transposition table queries, why not try to get easy cuts by checking for repetitions as well)
I noticed that the initial low cost of repetition checking became a matter of some concern so I did the following:
In most cases there is no repetition. So what if you change the repetition check to a two stage approach, where the first stage would remove 95% of the non-repetitions, so that a proper check could be done for the remaining cases.
What I did was to make a small table, indexed by the hash key. Whenever makemove entered a position, the element of the table belonging to the new position was incremented. With takeback the opposite was done: the element belonging to the old position was decremented.
Now the first thing to do in repetition detection was to check whether the element for the current position had a value of at least 2. If not, there cannot be a repetition.
Main challenge is when to clear this table. As I did not want to restore the table in search, I decided to forget about that, and only clear the table when actual inreversible moves were played. So not within search. In that case you only have to worry about reconstruction of this small table when doing a takeback in a game (which usually doesn't happen). As the second (slower) stage would check any potential repetitions in a robust way, that is not a problem.
In practice this meant that the cost of repetition checks dropped to almost zero.
Richard.

hgm · Post by **hgm** » Mon Feb 25, 2008 1:47 pm

bob wrote:I am just following standard software development practice, which says to spend time optimizing the parts of the program that consume the most compute cycles, _not_ the ones that consume fractions of a percentage point. This has always been, and will always be the sane way to optimize programs...

That is only a relevant criterion in so far the differences in execution time are caused by the various parts being executed a different number of times. Not if they are executed equally often, and the time difference is caused by the code section being much larger.

I have no idea why you don't get that.

Could the fact that it is not true perhaps have something to do with it?

Fortunately, most of the rest of the programming world does. Feel free to show me a quote from any programming book that would suggest rewriting and debugging a procedure that takes 0.5% of tht total execution time, when there are other procedures that take 30% and up...

Ah, so for you wisdom comes from books? For me it flows from brains. That might also explain the difference.

The "regardless of the investment" does make one assumption, that you want to improve your program the most, for the least effort.

Seems to me that that is exactly the assumption it does _not_ make. If you don't equate 'investment' to 'effort', then what does 'investment' mean for you? "least effort" puts conditions on the effort, while "regardless of effort" dismisses any such conditions there might be. But feel free to explain how your understanding of the English language differs from mine...

If that is not your goal, then the conversation has taken a twist I am not interested in. Chances are good that a couple of hours spent to rewrite and debug a 0.5% piece of code would pay off far more spent modifying a 50% piece of code. It is just common sense (to me at least)...

"Chances are"... But prudent people don't leave such things to chance. They apply their effort where the _know_ it will pay off most! If that is not common, too bad. But it definitely makes more sense.

It does not require a calculus degree to see that starting with a near-infinitely massive job, just because it gives slightly bigger returns than an easy job, is a self-defeating strategy, which slows down your engine development immensely. So I am not going to argue this point any further.
I would hope not, as the advice I gave has been given for years now in books discussing optimization principles. You can go off into never-never-land whenever you want, ...

The more relevant question is: can you get out of it?

but advising anyone to spend time working on a 1/2 percent usage module over one that is much larger is just silly. And poor advice. No matter how you cut it. Everyone understands Amdahl's Law. And almost everyone uses it in making their decisions on what to optimize and what to leave as is. If you don't want to follow that, that's fine. But it is the _right_ way to improve a program's speed, rather than trying to speed up parts that won't affect the final time at all...

It of course depends a bit on your definition of 'right'. Apparently you consider it 'right' because it is in books. But I judge it merely on the basis of how much speed improvement I get per mhour of invested programming work. And by that standard it seems very _wrong_...

Even if you speed it up an infinite amount, at a cost of only one hour total, that is still an hour that was wasted as you can't measure a 0.5% speed improvement in terms of increased playing strength, when compared to a small improvement in a 50% module (my evaluation).

Yes, but that of course is due to the fact that your evaluation is not quantitative, and little more than gut feeling. If the 'small improvement' in the 50% module reduced its profile time from 50% to 49.5%, and it had taken you 2 hours to implement it, you would just have wasted an hour compared to the one who went for the infinite improvement of the 0.5%. Only the ratio speedGained / timeSpent determines what is smarter to do. In themselves the quantities mean _nothing_.

Why do you try to take indefensible positions like this and carry on an argument that is foolish to anyone that actually writes software???

The question is more why _you_ do that...

As for the SMP stuff: Joker is not SMP, but this discussion came up here before. The conclusion then was that copying data structures on thread start-up was not really a competative method, and that it is more efficient to have each thread update its own data structures based on as hort list of moves passed to it. And then the issue becomes just the same as for update during the normal search.
They are not the same at all. I only need to copy a small fraction of the repetition list. I need to duplicate the entire repetition hash table for each thread that searches at any split point. It isn't free. And since it is hardly clear that the hashing approach is any faster at all, why waste time rewriting something that will hurt when the SMP search is developed, and there is no tangible return in terms of speed either???

Why do you need to duplicate the entire repetition hash table at every split point? As explained above, I certainly would not do that. I would only duplicate it at the root, (together with the board and piece list), and then differentially update the lot by only entering the positions in it on the branch between the root and the split point. There are plenty cases where this would involve fewer entries than that you have to copy. (Namely when the branch has no irreversible moves in it, and the game history ended with a large number of reversible moves as well).

Zach Wegner · Post by **Zach Wegner** » Mon Feb 25, 2008 4:13 pm

hgm wrote:
bob wrote: They are not the same at all. I only need to copy a small fraction of the repetition list. I need to duplicate the entire repetition hash table for each thread that searches at any split point. It isn't free. And since it is hardly clear that the hashing approach is any faster at all, why waste time rewriting something that will hurt when the SMP search is developed, and there is no tangible return in terms of speed either???
Why do you need to duplicate the entire repetition hash table at every split point? As explained above, I certainly would not do that. I would only duplicate it at the root, (together with the board and piece list), and then differentially update the lot by only entering the positions in it on the branch between the root and the split point. There are plenty cases where this would involve fewer entries than that you have to copy. (Namely when the branch has no irreversible moves in it, and the game history ended with a large number of reversible moves as well).

There's two reasons I disagree:

When you copy the move list at a split point, which I agree is the better way, you are already copying the list for the repetition check. When you use a hashtable, the difference in copying and updating is quite possibly moot because of the overhead of updating an extra data structure. I don't have any extra overhead on a split, because I don't need any extra data structures for my repetition check. I store the complete game, and I store the hashkeys for undoing moves.

This leads to my second reason, complexity. I'd rather not add another complete data structure in my program for an unnoticeable speed gain, no matter how easy it is to implement.

hgm · Post by **hgm** » Mon Feb 25, 2008 6:33 pm

I am not sure I completely understand what you are saying, but your SMP implementation is probably quite different from what I have in mind. I won't even have to copy the move list. All my search threads will be walking the tree under their own power, updating their private data structures (board, piece list, history) differentially in the normal way. Either searching new parts of the tree, or, when they become idle, walking along the currently active part of the tree (as defined in the main hash) in search for split points to attach to. And, during most of the search, split points will only be two moves (make or unmake) away from the previous split point they were attached to.

Zach Wegner · Post by **Zach Wegner** » Mon Feb 25, 2008 7:34 pm

Mine is pretty similar to what you described in your last post. On a split, the list of moves from the root to the split point is copied. When another processor attaches, it gets all these moves and makes them. When it detaches, it unmakes moves back to the root. So when each of these actions occur, with your repetition structure, the table needs to be updated several times, so the difference in copying and updating is not so much as you would expect.

On the other hand, in your SMP implementation, it wouldn't be so bad. Making and unmaking a few moves isn't much. I am interested in how that will turn out, speedup-wise. But I think in general, typical splitting approaches (basically everything but what you describe

) will make a separate repetition structure still have a slowdown on splits.

hgm · Post by **hgm** » Mon Feb 25, 2008 8:14 pm

It seems the main difference then is that you go all the way back to the root all the time, to swicth split point, while I just let them migrate to the first splitpoint they encounter.

My idea was that the search almost entirely consists of refutation trees, where every all-node is a possible split point. So typically, there will be a split point at each even or odd level. So as soon as you run out of unsearched moves in the splitpoint you are currently in, just follow move two levels closer to the root, or two levels deeper in the tree, and you are likely to find another one. So why backup all the way to the root, only to return to nearly the same spot in the tree?

hgm · Post by **hgm** » Mon Feb 25, 2008 8:17 pm

It seems the main difference then is that you go all the way back to the root all the time, to swicth split point, while I just let them migrate to the first splitpoint they encounter.

My idea was that the search almost entirely consists of refutation trees, where every all-node is a possible split point. So typically, there will be a split point at each even or odd level. So as soon as you run out of unsearched moves in the splitpoint you are currently in, just follow move two levels closer to the root, or two levels deeper in the tree, and you are likely to find another one. So why backup all the way to the root, only to return to nearly the same spot in the tree?

bob · Post by **bob** » Mon Feb 25, 2008 8:19 pm

hgm wrote:
bob wrote:I am just following standard software development practice, which says to spend time optimizing the parts of the program that consume the most compute cycles, _not_ the ones that consume fractions of a percentage point. This has always been, and will always be the sane way to optimize programs...
That is only a relevant criterion in so far the differences in execution time are caused by the various parts being executed a different number of times. Not if they are executed equally often, and the time difference is caused by the code section being much larger.

Sheer baloney in this context. Chess is a pretty interesting game, because nearly every function gets called about the same number of times within a factor of 10 or so. You generate moves, then you make 'em, then you evaluate 'em, and after making each one, you do a hash probe and/or repetition check.

But even if one module used 50% of the time and was called 1,000x more often than the next largest time user, that would _still_ be the correct place to start optimizing.

This is simple software engineering/optimizing 101...

I have no idea why you don't get that.
Could the fact that it is not true perhaps have something to do with it?

If that were the case, perhaps. But it clearly is not, so?

Fortunately, most of the rest of the programming world does. Feel free to show me a quote from any programming book that would suggest rewriting and debugging a procedure that takes 0.5% of tht total execution time, when there are other procedures that take 30% and up...
Ah, so for you wisdom comes from books? For me it flows from brains. That might also explain the difference.

SO we do have serious differences in how we approach science it seems. Yes I rely on books, and confirm by experiment and then build experience from doing so. But the underlying concepts certainly come from books. I doubt you developed alpha/beta on your own, as an example???

The "regardless of the investment" does make one assumption, that you want to improve your program the most, for the least effort.
Seems to me that that is exactly the assumption it does _not_ make. If you don't equate 'investment' to 'effort', then what does 'investment' mean for you? "least effort" puts conditions on the effort, while "regardless of effort" dismisses any such conditions there might be. But feel free to explain how your understanding of the English language differs from mine...
If that is not your goal, then the conversation has taken a twist I am not interested in. Chances are good that a couple of hours spent to rewrite and debug a 0.5% piece of code would pay off far more spent modifying a 50% piece of code. It is just common sense (to me at least)...
"Chances are"... But prudent people don't leave such things to chance. They apply their effort where the _know_ it will pay off most! If that is not common, too bad. But it definitely makes more sense.

That is about the most difficult-to-follow reasoning I have seen here. Spending significant amounts of time on a 0.5% profiled module makes no sense when compared to spending that same amount of time on a 50% usage module. It just doesn't make sense, and no "normal" computer scientist is going to think otherwise. It is just one of those "intuitively obvious to the casual observer" examples we face regularly.

It does not require a calculus degree to see that starting with a near-infinitely massive job, just because it gives slightly bigger returns than an easy job, is a self-defeating strategy, which slows down your engine development immensely. So I am not going to argue this point any further.
I would hope not, as the advice I gave has been given for years now in books discussing optimization principles. You can go off into never-never-land whenever you want, ...
The more relevant question is: can you get out of it?

SInce I don't go there, there is nothing to escape from....

but advising anyone to spend time working on a 1/2 percent usage module over one that is much larger is just silly. And poor advice. No matter how you cut it. Everyone understands Amdahl's Law. And almost everyone uses it in making their decisions on what to optimize and what to leave as is. If you don't want to follow that, that's fine. But it is the _right_ way to improve a program's speed, rather than trying to speed up parts that won't affect the final time at all...
It of course depends a bit on your definition of 'right'. Apparently you consider it 'right' because it is in books. But I judge it merely on the basis of how much speed improvement I get per mhour of invested programming work. And by that standard it seems very _wrong_...
Even if you speed it up an infinite amount, at a cost of only one hour total, that is still an hour that was wasted as you can't measure a 0.5% speed improvement in terms of increased playing strength, when compared to a small improvement in a 50% module (my evaluation).
Yes, but that of course is due to the fact that your evaluation is not quantitative, and little more than gut feeling. If the 'small improvement' in the 50% module reduced its profile time from 50% to 49.5%, and it had taken you 2 hours to implement it, you would just have wasted an hour compared to the one who went for the infinite improvement of the 0.5%. Only the ratio speedGained / timeSpent determines what is smarter to do. In themselves the quantities mean _nothing_.
Why do you try to take indefensible positions like this and carry on an argument that is foolish to anyone that actually writes software???
The question is more why _you_ do that...

For this reason. Let's start a citation war here. I will list 10 well-known and recognized references that explain optimization as proposed by me. Then you see if you can find 10 well-known and recognized references that explain optimization as proposed by you.

Or does past research and literature supporting it mean nothing at all to you?

As for the SMP stuff: Joker is not SMP, but this discussion came up here before. The conclusion then was that copying data structures on thread start-up was not really a competative method, and that it is more efficient to have each thread update its own data structures based on as hort list of moves passed to it. And then the issue becomes just the same as for update during the normal search.
They are not the same at all. I only need to copy a small fraction of the repetition list. I need to duplicate the entire repetition hash table for each thread that searches at any split point. It isn't free. And since it is hardly clear that the hashing approach is any faster at all, why waste time rewriting something that will hurt when the SMP search is developed, and there is no tangible return in terms of speed either???
Why do you need to duplicate the entire repetition hash table at every split point? As explained above, I certainly would not do that. I would only duplicate it at the root, (together with the board and piece list), and then differentially update the lot by only entering the positions in it on the branch between the root and the split point. There are plenty cases where this would involve fewer entries than that you have to copy. (Namely when the branch has no irreversible moves in it, and the game history ended with a large number of reversible moves as well).

Apparently you don't understand parallel searching very well either. How many of those "duplicate blocks" are you going to create at the root and update? Two won't cut it because of splitting and re-splitting. On an 8-way box, I usually need 256 split blocks and occasionally will run out (doesn't make me crash, but slightly hurts efficiency). So you are going to create 256 of those, and update them all each time? And that is somehow going to be faster than the repetition list where I copy a tiny fraction of the whole thing? Remember, I have _done_ the hash-repetition approach already. It is more complex than the simple repetition list approach, it has to be done in more than one place in Search(), and the parallel issues are non-trivial, where the repetition list approach can be done blind-folded.

So you have an approach that is no faster (at least in my implementation, which was similar to what Bruce did in Ferret, the hash table approaach was no faster than the current repetition list with the 50-move limit trick (you only search backward for 50-move counter moves, not back thru the entire list)), and the list approach is _far_ simpler. My repetition check is a simple for/if loop, a couple of lines of code. If it is smaller, _and_ as fast or faster, and is easier to use in an SMP search, what's the point of going more complex, for no speed gain, and also to make life more difficult when it becomes time to tackle a parellel implementation???

bob · Post by **bob** » Mon Feb 25, 2008 8:20 pm

hgm wrote:
bob wrote:I am just following standard software development practice, which says to spend time optimizing the parts of the program that consume the most compute cycles, _not_ the ones that consume fractions of a percentage point. This has always been, and will always be the sane way to optimize programs...
That is only a relevant criterion in so far the differences in execution time are caused by the various parts being executed a different number of times. Not if they are executed equally often, and the time difference is caused by the code section being much larger.

Sheer baloney in this context. Chess is a pretty interesting game, because nearly every function gets called about the same number of times within a factor of 10 or so. You generate moves, then you make 'em, then you evaluate 'em, and after making each one, you do a hash probe and/or repetition check.

But even if one module used 50% of the time and was called 1,000x more often than the next largest time user, that would _still_ be the correct place to start optimizing.

This is simple software engineering/optimizing 101...

I have no idea why you don't get that.
Could the fact that it is not true perhaps have something to do with it?

If that were the case, perhaps. But it clearly is not, so?

Fortunately, most of the rest of the programming world does. Feel free to show me a quote from any programming book that would suggest rewriting and debugging a procedure that takes 0.5% of tht total execution time, when there are other procedures that take 30% and up...
Ah, so for you wisdom comes from books? For me it flows from brains. That might also explain the difference.

SO we do have serious differences in how we approach science it seems. Yes I rely on books, and confirm by experiment and then build experience from doing so. But the underlying concepts certainly come from books. I doubt you developed alpha/beta on your own, as an example???

The "regardless of the investment" does make one assumption, that you want to improve your program the most, for the least effort.
Seems to me that that is exactly the assumption it does _not_ make. If you don't equate 'investment' to 'effort', then what does 'investment' mean for you? "least effort" puts conditions on the effort, while "regardless of effort" dismisses any such conditions there might be. But feel free to explain how your understanding of the English language differs from mine...
If that is not your goal, then the conversation has taken a twist I am not interested in. Chances are good that a couple of hours spent to rewrite and debug a 0.5% piece of code would pay off far more spent modifying a 50% piece of code. It is just common sense (to me at least)...
"Chances are"... But prudent people don't leave such things to chance. They apply their effort where the _know_ it will pay off most! If that is not common, too bad. But it definitely makes more sense.

That is about the most difficult-to-follow reasoning I have seen here. Spending significant amounts of time on a 0.5% profiled module makes no sense when compared to spending that same amount of time on a 50% usage module. It just doesn't make sense, and no "normal" computer scientist is going to think otherwise. It is just one of those "intuitively obvious to the casual observer" examples we face regularly.

It does not require a calculus degree to see that starting with a near-infinitely massive job, just because it gives slightly bigger returns than an easy job, is a self-defeating strategy, which slows down your engine development immensely. So I am not going to argue this point any further.
I would hope not, as the advice I gave has been given for years now in books discussing optimization principles. You can go off into never-never-land whenever you want, ...
The more relevant question is: can you get out of it?

SInce I don't go there, there is nothing to escape from....

but advising anyone to spend time working on a 1/2 percent usage module over one that is much larger is just silly. And poor advice. No matter how you cut it. Everyone understands Amdahl's Law. And almost everyone uses it in making their decisions on what to optimize and what to leave as is. If you don't want to follow that, that's fine. But it is the _right_ way to improve a program's speed, rather than trying to speed up parts that won't affect the final time at all...
It of course depends a bit on your definition of 'right'. Apparently you consider it 'right' because it is in books. But I judge it merely on the basis of how much speed improvement I get per mhour of invested programming work. And by that standard it seems very _wrong_...
Even if you speed it up an infinite amount, at a cost of only one hour total, that is still an hour that was wasted as you can't measure a 0.5% speed improvement in terms of increased playing strength, when compared to a small improvement in a 50% module (my evaluation).
Yes, but that of course is due to the fact that your evaluation is not quantitative, and little more than gut feeling. If the 'small improvement' in the 50% module reduced its profile time from 50% to 49.5%, and it had taken you 2 hours to implement it, you would just have wasted an hour compared to the one who went for the infinite improvement of the 0.5%. Only the ratio speedGained / timeSpent determines what is smarter to do. In themselves the quantities mean _nothing_.
Why do you try to take indefensible positions like this and carry on an argument that is foolish to anyone that actually writes software???
The question is more why _you_ do that...

For this reason. Let's start a citation war here. I will list 10 well-known and recognized references that explain optimization as proposed by me. Then you see if you can find 10 well-known and recognized references that explain optimization as proposed by you.

Or does past research and literature supporting it mean nothing at all to you?

As for the SMP stuff: Joker is not SMP, but this discussion came up here before. The conclusion then was that copying data structures on thread start-up was not really a competative method, and that it is more efficient to have each thread update its own data structures based on as hort list of moves passed to it. And then the issue becomes just the same as for update during the normal search.
They are not the same at all. I only need to copy a small fraction of the repetition list. I need to duplicate the entire repetition hash table for each thread that searches at any split point. It isn't free. And since it is hardly clear that the hashing approach is any faster at all, why waste time rewriting something that will hurt when the SMP search is developed, and there is no tangible return in terms of speed either???
Why do you need to duplicate the entire repetition hash table at every split point? As explained above, I certainly would not do that. I would only duplicate it at the root, (together with the board and piece list), and then differentially update the lot by only entering the positions in it on the branch between the root and the split point. There are plenty cases where this would involve fewer entries than that you have to copy. (Namely when the branch has no irreversible moves in it, and the game history ended with a large number of reversible moves as well).

Apparently you don't understand parallel searching very well either. How many of those "duplicate blocks" are you going to create at the root and update? Two won't cut it because of splitting and re-splitting. On an 8-way box, I usually need 256 split blocks and occasionally will run out (doesn't make me crash, but slightly hurts efficiency). So you are going to create 256 of those, and update them all each time? And that is somehow going to be faster than the repetition list where I copy a tiny fraction of the whole thing? Remember, I have _done_ the hash-repetition approach already. It is more complex than the simple repetition list approach, it has to be done in more than one place in Search(), and the parallel issues are non-trivial, where the repetition list approach can be done blind-folded.

So you have an approach that is no faster (at least in my implementation, which was similar to what Bruce did in Ferret, the hash table approaach was no faster than the current repetition list with the 50-move limit trick (you only search backward for 50-move counter moves, not back thru the entire list)), and the list approach is _far_ simpler. My repetition check is a simple for/if loop, a couple of lines of code. If it is smaller, _and_ as fast or faster, and is easier to use in an SMP search, what's the point of going more complex, for no speed gain, and also to make life more difficult when it becomes time to tackle a parellel implementation???

Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.