Repetition detection structure.

mathmoi · Post by **mathmoi** » Sat Feb 23, 2008 1:33 am

Hi,

I examined Crafty and Fruit code and found that they both use a simple array (accessed like a stack) to store the hash keys for the repetition detection code.

I understand that most of the time only a couple of positions needs to be verified (up to the last irreversible move), but in the case where we approach the 50 moves limit, we will need to scan most of the array at every nodes.

Is it possible to use something else, like a small hash table or event a binary tree?

I guess that what I'm asking is. Is there other ways to look for repetition detection or does everyone use a simple array?

MP

gladius · Post by **gladius** » Sat Feb 23, 2008 2:05 am

Yes, it's certainly possible to use a hashtable for doing rep-draw detection. I'm not certain whether it would be faster or slower than just checking until the last non-reversible move, but my suspicion is that it doesn't matter too much, even in the endgame.

All that being said, I still use it, here is my implementation:

Code: Select all

void Position&#58;&#58;AddSeenPosition&#40;u64 hashKey&#41;
&#123;
	const int mask = SeenPositionCount - 1;
	for &#40;int index = hashKey & mask; ; index = &#40;index + 7&#41; & mask&#41;
	&#123;
		if &#40;this->seenPositionLock&#91;index&#93; == hashKey || this->seenPositions&#91;index&#93; == 0&#41;
		&#123;
			assert&#40;this->seenPositions&#91;index&#93; <= 3&#41;;
			this->seenPositionLock&#91;index&#93; = hashKey;
			this->seenPositions&#91;index&#93;++;
			return;
		&#125;
	&#125;
&#125;

void Position&#58;&#58;RemoveSeenPosition&#40;u64 hashKey&#41;
&#123;
	const int mask = SeenPositionCount - 1;
	for &#40;int index = hashKey & mask; ; index = &#40;index + 7&#41; & mask&#41;
	&#123;
		if &#40;this->seenPositionLock&#91;index&#93; == hashKey&#41;
		&#123;
			assert&#40;this->seenPositions&#91;index&#93; != 0&#41;;
			this->seenPositions&#91;index&#93;--;
			return;
		&#125;
	&#125;
&#125;

u8 Position&#58;&#58;GetSeenPositionCount&#40;u64 hashKey&#41; const
&#123;
	const int mask = SeenPositionCount - 1;
	for &#40;int index = hashKey & mask; ; index = &#40;index + 7&#41; & mask&#41;
	&#123;
		if &#40;this->seenPositionLock&#91;index&#93; == hashKey || this->seenPositions&#91;index&#93; == 0&#41;
		&#123;
			return this->seenPositions&#91;index&#93;;
		&#125;
	&#125;
&#125;

bob · Post by **bob** » Sat Feb 23, 2008 6:49 am

mathmoi wrote:Hi,

I examined Crafty and Fruit code and found that they both use a simple array (accessed like a stack) to store the hash keys for the repetition detection code.

I understand that most of the time only a couple of positions needs to be verified (up to the last irreversible move), but in the case where we approach the 50 moves limit, we will need to scan most of the array at every nodes.

Is it possible to use something else, like a small hash table or event a binary tree?

I guess that what I'm asking is. Is there other ways to look for repetition detection or does everyone use a simple array?

MP

A hash table is doable, but there is an efficiency issue. When you start a node, you have to enter the position, then when you leave the node, you have to remove the position. There are other sticky issues dealing with SMP search, in the repetition hash table has to be a reasonable size to prevent overwriting which would be fatal, and it has to somehow handle an SMP search which makes duplicating it expensive...

repetition list scanning is not a big issue, once you profile an engine to see how little time it actually spends in there, particularly since you don't even probe it in q-search...

Uri Blass · Post by **Uri Blass** » Sat Feb 23, 2008 7:37 am

bob wrote:
mathmoi wrote:Hi,

I examined Crafty and Fruit code and found that they both use a simple array (accessed like a stack) to store the hash keys for the repetition detection code.

I understand that most of the time only a couple of positions needs to be verified (up to the last irreversible move), but in the case where we approach the 50 moves limit, we will need to scan most of the array at every nodes.

Is it possible to use something else, like a small hash table or event a binary tree?

I guess that what I'm asking is. Is there other ways to look for repetition detection or does everyone use a simple array?

MP
A hash table is doable, but there is an efficiency issue. When you start a node, you have to enter the position, then when you leave the node, you have to remove the position. There are other sticky issues dealing with SMP search, in the repetition hash table has to be a reasonable size to prevent overwriting which would be fatal, and it has to somehow handle an SMP search which makes duplicating it expensive...

repetition list scanning is not a big issue, once you profile an engine to see how little time it actually spends in there, particularly since you don't even probe it in q-search...

The last part is correct only for q-search that include only captures but many programmers include also checks and replies to check in the qsearch
I still do not plan to care about more efficient repetition detection
(at least not in the near future) and cases when there are many moves with no conversion are not the majority of cases in games.

Uri

hgm · Post by **hgm** » Sat Feb 23, 2008 11:10 am

Joker uses a small hash table (256 entries) for repetition detection. Collisions are resolved by a simple linear search (wrapping around) starting from the primary entry. But typically you find the entry on the first or second try, as the table is always at least half empty (and typycally 90% empty).

I store a code in the table identifying the number and type of repeats (how many in the game history, how many in the tree), so that the search can take the appropriate decisions. And I store the level at which the position was first encountered in the tree. (And of course the hash key.)

On an irreversible move at game level I clear the entire table.

As the table is so small that it always resides in the L1 cache, this must be faster than the linear-array method.

bob · Post by **bob** » Sat Feb 23, 2008 10:12 pm

hgm wrote:Joker uses a small hash table (256 entries) for repetition detection. Collisions are resolved by a simple linear search (wrapping around) starting from the primary entry. But typically you find the entry on the first or second try, as the table is always at least half empty (and typycally 90% empty).

I store a code in the table identifying the number and type of repeats (how many in the game history, how many in the tree), so that the search can take the appropriate decisions. And I store the level at which the position was first encountered in the tree. (And of course the hash key.)

On an irreversible move at game level I clear the entire table.

As the table is so small that it always resides in the L1 cache, this must be faster than the linear-array method.

here is what you have to beat:

time seconds seconds calls s/call s/call name
0.51 205.48 1.07 113029036 0.00 0.00 RepetitionCheck

I played a few pretty quick games, and picked the first one that ended by the 50 move rule, which is likely the worst-case for repetition list scanning.

RepetitionCheck() accounted for 0.51% of the total cpu time used (1/2 percent) which is pretty insignificant...

here is everything from most expensive down to RepetitionCheck():

Code: Select all

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 16.93     35.40    35.40 499704723     0.00     0.00  Evaluate
  8.60     53.38    17.98 563432081     0.00     0.00  MakeMove
  8.24     70.61    17.24 802244362     0.00     0.00  EvaluatePassedPawns
  8.12     87.59    16.98   115810     0.00     0.00  Search
  7.62    103.52    15.93 1006239194     0.00     0.00  Attacked
  6.11    116.30    12.79 371224516     0.00     0.00  Quiesce
  6.10    129.06    12.76 339036879     0.00     0.00  Swap
  4.73    138.96     9.90 116047645     0.00     0.00  GenerateCaptures
  4.57    148.52     9.56 563432081     0.00     0.00  UnmakeMove
  3.75    156.35     7.84 112735079     0.00     0.00  HashProbe
  2.74    162.09     5.74 413850292     0.00     0.00  NextMove
  2.73    167.80     5.72 49404108     0.00     0.00  EvaluateMobility
  2.66    173.36     5.56 370399537     0.00     0.00  AttacksTo
  2.48    178.54     5.18 421418977     0.00     0.00  SearchControl
  2.25    183.24     4.70 31303371     0.00     0.00  GenerateCheckEvasions
  2.13    187.69     4.45 125674880     0.00     0.00  EvaluateRooks
  1.76    191.36     3.67  6584992     0.00     0.00  EvaluatePawns
  1.51    194.52     3.17 12631503     0.00     0.00  GenerateNonCaptures
  0.97    196.55     2.03 125674880     0.00     0.00  EvaluateKnights
  0.88    198.39     1.84 36017477     0.00     0.00  NextEvasion
  0.86    200.20     1.81 125674880     0.00     0.00  EvaluateKings
  0.83    201.93     1.73 96027327     0.00     0.00  HashStore
  0.61    203.21     1.28 115866513     0.00     0.00  PinnedOnKing
  0.58    204.42     1.21      375     0.00     0.00  InitializeHashTables
  0.51    205.48     1.07 113029036     0.00     0.00  RepetitionCheck

So while it is possible to beat this, the savings are going to be microscopic...

And this is simpler to write/debug...

hgm · Post by **hgm** » Sat Feb 23, 2008 11:34 pm

True, the repetition check takes a pretty insignificant fraction of the time. But that is true for very many snippets of code in the engine. Some of the tasks you mention only take far more time because they are much bigger tasks, that happen to be all done by one routine. If you would not specify time per routine, but, say, time per inner loop, you would have a hundred sub-tasks that all nly use ~ 0.5% of the time each. And if you could all make them twice as fast, all these insignificant gains would add up to 25% more speed...

So the move generator tkes 8% of the time in stead of 0.5%, and doubling its speed would save you 4% in stead of 0.25%. Does that mean it is better to work on doubling the speed of the move generator? Not necessarily, as the move generator is a far larger and more complicated piece of code than the repetion check. So to double its speed will involve far more work. If you would only work on improving as many lines of code of the move generator as the entire repetition check, it is not obvious at all that you would gain more total time than what you gain for improving the repetition check. And if that is the case, improving the speed of the repetition check would give more speed improvement per unit of coding work!

bob · Post by **bob** » Sun Feb 24, 2008 12:06 am

hgm wrote:True, the repetition check takes a pretty insignificant fraction of the time. But that is true for very many snippets of code in the engine. Some of the tasks you mention only take far more time because they are much bigger tasks, that happen to be all done by one routine. If you would not specify time per routine, but, say, time per inner loop, you would have a hundred sub-tasks that all nly use ~ 0.5% of the time each. And if you could all make them twice as fast, all these insignificant gains would add up to 25% more speed...

So the move generator tkes 8% of the time in stead of 0.5%, and doubling its speed would save you 4% in stead of 0.25%. Does that mean it is better to work on doubling the speed of the move generator? Not necessarily, as the move generator is a far larger and more complicated piece of code than the repetion check. So to double its speed will involve far more work. If you would only work on improving as many lines of code of the move generator as the entire repetition check, it is not obvious at all that you would gain more total time than what you gain for improving the repetition check. And if that is the case, improving the speed of the repetition check would give more speed improvement per unit of coding work!

Sorry but that is just backward, IMHO. return on effort is far less important than return, period. Otherwise I would spend all my time getting a fraction of a percent here, a fraction there, because it is always far easier to modify the short/simple parts of the code compared to the biggies. If you diff crafty 21.7 against 22.0 you will see exactly what I mean...The changes were massive.and pervasive, But worth it. The path of least resistance doesn't always lead to the best final destination..

If one module takes 0.5% and another takes 8%, working on the 0.5% module is pointless. It doesn't matter how much effort is involved. If you make it go to 0.0% it has essentially no effect. The 8% module, on the other hand, offers more although I would rarely work on the 8% module unless it is at the top of the profile list. In my case, the evaluation stuff is the killer... When you add it all up it is well over 1/3 of the total time used, and it is getting a good going-over as this has been reduced from the 21.x versions where it was even higher.

I'm not sure the hash approach will be much (if any) faster. I used that approach way back, and the extra work of removing an entry (or decrementing the counter), plus the extra effort of running a chain (and with 256 entries you will get some chains that are longer than you might expect even though the table should never be more than maybe 50% full.) And having to copy the entire table when doing a parallel search is yet another issue that can be partially avoided with a plain list...

So, bottom line, spending any time trying to improve that 0.5% is going to have a very poor return on investment, regardless of how small the time investment is and how much faster the new module becomes.

hgm · Post by **hgm** » Sun Feb 24, 2008 9:26 am

Seems to me you are trying to argue that white is black, here. Your last statement is just plain nonsense: perhaps the return can be small, but the return on investment can be infinite if the investment is infinitesimal. So the "regardless the investment" does not belong there.

It does not require a calculus degree to see that starting with a near-infinitely massive job, just because it gives slightly bigger returns than an easy job, is a self-defeating strategy, which slows down your engine development immensely. So I am not going to argue this point any further.

As for the SMP stuff: Joker is not SMP, but this discussion came up here before. The conclusion then was that copying data structures on thread start-up was not really a competative method, and that it is more efficient to have each thread update its own data structures based on as hort list of moves passed to it. And then the issue becomes just the same as for update during the normal search.

bob · Post by **bob** » Mon Feb 25, 2008 4:55 am

hgm wrote:Seems to me you are trying to argue that white is black, here. Your last statement is just plain nonsense: perhaps the return can be small, but the return on investment can be infinite if the investment is infinitesimal. So the "regardless the investment" does not belong there.

I am just following standard software development practice, which says to spend time optimizing the parts of the program that consume the most compute cycles, _not_ the ones that consume fractions of a percentage point. This has always been, and will always be the sane way to optimize programs...

I have no idea why you don't get that. Fortunately, most of the rest of the programming world does. Feel free to show me a quote from any programming book that would suggest rewriting and debugging a procedure that takes 0.5% of tht total execution time, when there are other procedures that take 30% and up...

The "regardless of the investment" does make one assumption, that you want to improve your program the most, for the least effort. If that is not your goal, then the conversation has taken a twist I am not interested in. Chances are good that a couple of hours spent to rewrite and debug a 0.5% piece of code would pay off far more spent modifying a 50% piece of code. It is just common sense (to me at least)...

It does not require a calculus degree to see that starting with a near-infinitely massive job, just because it gives slightly bigger returns than an easy job, is a self-defeating strategy, which slows down your engine development immensely. So I am not going to argue this point any further.

I would hope not, as the advice I gave has been given for years now in books discussing optimization principles. You can go off into never-never-land whenever you want, but advising anyone to spend time working on a 1/2 percent usage module over one that is much larger is just silly. And poor advice. No matter how you cut it. Everyone understands Amdahl's Law. And almost everyone uses it in making their decisions on what to optimize and what to leave as is. If you don't want to follow that, that's fine. But it is the _right_ way to improve a program's speed, rather than trying to speed up parts that won't affect the final time at all...

Even if you speed it up an infinite amount, at a cost of only one hour total, that is still an hour that was wasted as you can't measure a 0.5% speed improvement in terms of increased playing strength, when compared to a small improvement in a 50% module (my evaluation).

Why do you try to take indefensible positions like this and carry on an argument that is foolish to anyone that actually writes software???

As for the SMP stuff: Joker is not SMP, but this discussion came up here before. The conclusion then was that copying data structures on thread start-up was not really a competative method, and that it is more efficient to have each thread update its own data structures based on as hort list of moves passed to it. And then the issue becomes just the same as for update during the normal search.

They are not the same at all. I only need to copy a small fraction of the repetition list. I need to duplicate the entire repetition hash table for each thread that searches at any split point. It isn't free. And since it is hardly clear that the hashing approach is any faster at all, why waste time rewriting something that will hurt when the SMP search is developed, and there is no tangible return in terms of speed either???

Repetition detection structure.

Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.