TalkChess.com

Posted: **Sun May 22, 2016 11:38 pm**

Many of us have seen this problem when analyzing. You analyze down one variation, back out, analyze down a second variation, back out again and the hash entries for the first variation are overwritten.
A simple way to solve this problem is: if in analysis mode, when deciding which hash entries to be overwritten, treat all entries with exact values as being current. This should fix the problem in virtually all cases. Another idea is to never overwrite exact entries with depths greater than some number in the range of, say, 15-30.

Posted: **Mon May 23, 2016 10:34 am**

Indeed, when I started doing analysis I quickly realized that the whole analysis is basically a single search (the lower levels human-guided). So aging schemes backfire badly. In Shokidoki I just never increase the search number in analysis mode. Aging-based replacement might be a neat trick to get some 2 or 3 Elo extra in games, but it completely wrecks an engine for analysis.

There was another problem I noticed in analysis, which must hurt in games as well, but would never be noticed there if you just look at the average performance over 10,000 games:

When the best move of the previous iteration drops catastrophically in score, it basically takes forever to finish that iteration. (100x longer than the previous iteration is not uncommon, in a search that normally has an EBF of ~3.) The reason is that the refutations of all alternative moves are totally inadequate for pushing the score below the new root alpha.

This problem can be strongly ameliorated by doing IID also in nodes where you do have a hash move, but where the hashed score of that move falls far below the current beta. Trying such a move at full depth will almost certainly be a waste of time (and take very long due to similar problems in cut nodes everywhere in its sub-tree). It is in general much faster to search this jasj move, as well as other moves at low depth to see if there is one between them that will be able to beat the new beta, and then focus on that one.

This still has the problem that you search the remaining root moves to full depth in an uninformed order. So it might even be better to restart ID in the root for the remaining moves. This is what I usually do by hand (for lack of a remedy in the engine): if after a 10-sec iteration I get a dramatically lower score for the PV of the next iteration, and then nothing for 5 minutes, I just restart the analysis by taking the move back and remaking it. Then usually the iteration that got stuck is finished in 10 sec as well.

For IID restarting the iteration on a score drop of the old cut move is not a robust strategy, however. Nominally all moves you already searched at full depth (which were cut moves in a lower iteration, but then failed low at full depth) would be satisfied by hash hits when you redo any lower iteration, so that they would now fail low there too, and give another move a chance to establish itself as cut-move. But in practive hash entries can be overwritten, and you can get in a situation where move A and B both seem good at depth N-1, but bad at depth N, and the search of A destroys the hash entry for move B, and vice versa. Then the search of that node would perpetually alternate between A and B. So if you want to use this IID strategy it is essential that you remember in the node itself what the score/depth of each move in the move list was, (and use those scores when the depth is sufficient), rather than reyling on the hash table for that.

Posted: **Tue May 24, 2016 2:27 pm**

jwes wrote:Many of us have seen this problem when analyzing. You analyze down one variation, back out, analyze down a second variation, back out again and the hash entries for the first variation are overwritten.
A simple way to solve this problem is: if in analysis mode, when deciding which hash entries to be overwritten, treat all entries with exact values as being current. This should fix the problem in virtually all cases. Another idea is to never overwrite exact entries with depths greater than some number in the range of, say, 15-30.

Treating all entries with exact values as current will eventually, in long analysis sessions, lead to the table becoming clogged with exact values for positions that have little value for the current position. Even if all the positions in the table are relevant the reduction in effective size of the table due to entries that never get overwritten becomes a problem. After a time the analysis will become pathological, I have actually had this happen, and the only way to fix it is to clear the table. This defeats the purpose of keeping these entries.

One way to mitigate this is to put a limit on the depth relative to the current root that an exact value can be marked as semi-permanent and therefor considered “current”. This alleviates the need to store the absolute depth in the table entry. If the absolute depth were stored it would require a relatively large number of bits to handle all possible situations. Semi-permanent entries would still age but would age much more slowly, say 1/16th as fast as other entries, which would require an additional 4 bits, for a total of 5 extra bits per entry ( 4 for the slow aging and 1 to mark the entry as semi-permanent when created). The rate of reduced aging and the maximum depth relative to the current root for which an exact value could be marked as semi-permanent could be made user adjustable with 2 UCI parameters.

Something like this could be added to a program like Stockfish relatively easily since they already have the required number of extra bits in each cache line. i.e. a 64 byte cache line holds 6 10 byte entries, leaving 4 bytes unused. 6 time 5 additional bits per entry = 30 extra bits required and 4 unused bytes = 32 bits.

Of course getting them to actually do something like this will be next to impossible. So dream on!

Regards,

Forrest

Posted: **Tue May 24, 2016 3:45 pm**

If you use equi-distributed-draft replacement, this would not be a problem. Because there even entries of the current search get replaced when their number in the table gets more than their fair share. This can never happen to the deepest entries in your table, however; these will always be quite small in number. So you will never lose the results of your most valuable searches.

This also solves the same problem when you really do a single search that hugely overloads the hash table (e.g. because you run it for days), and any age-based algorithm would be totally ineffective. And it doesn't require any extra bits in the hash entry. On the contrary, you could even do without any aging info in the hahs entry, although it might marginally help to have some (if you can use otherwise unused bits for this).

If you think exact entries deserve more protection than others, it would be better to also respect that in the normal replacement algorithm. E.g. by acounting the depth of exact entries double for the purpose of depth-preferred decisions.

Posted: **Tue May 24, 2016 8:52 pm**

hgm wrote:Indeed, when I started doing analysis I quickly realized that the whole analysis is basically a single search (the lower levels human-guided). So aging schemes backfire badly. In Shokidoki I just never increase the search number in analysis mode. Aging-based replacement might be a neat trick to get some 2 or 3 Elo extra in games, but it completely wrecks an engine for analysis.

It should be enough to do this only for exact scores.

hgm wrote:There was another problem I noticed in analysis, which must hurt in games as well, but would never be noticed there if you just look at the average performance over 10,000 games:

When the best move of the previous iteration drops catastrophically in score, it basically takes forever to finish that iteration. (100x longer than the previous iteration is not uncommon, in a search that normally has an EBF of ~3.) The reason is that the refutations of all alternative moves are totally inadequate for pushing the score below the new root alpha.

I did some experimentation with that a while ago and found, if you have a stepped aspiration window widening, you should re-search the same move on the first fail low, but search other moves first on subsequent fail lows (YMMV depending on step size).

hgm wrote:This problem can be strongly ameliorated by doing IID also in nodes where you do have a hash move, but where the hashed score of that move falls far below the current beta. Trying such a move at full depth will almost certainly be a waste of time (and take very long due to similar problems in cut nodes everywhere in its sub-tree). It is in general much faster to search this jasj move, as well as other moves at low depth to see if there is one between them that will be able to beat the new beta, and then focus on that one.

This still has the problem that you search the remaining root moves to full depth in an uninformed order. So it might even be better to restart ID in the root for the remaining moves. This is what I usually do by hand (for lack of a remedy in the engine): if after a 10-sec iteration I get a dramatically lower score for the PV of the next iteration, and then nothing for 5 minutes, I just restart the analysis by taking the move back and remaking it. Then usually the iteration that got stuck is finished in 10 sec as well.

For IID restarting the iteration on a score drop of the old cut move is not a robust strategy, however. Nominally all moves you already searched at full depth (which were cut moves in a lower iteration, but then failed low at full depth) would be satisfied by hash hits when you redo any lower iteration, so that they would now fail low there too, and give another move a chance to establish itself as cut-move. But in practive hash entries can be overwritten, and you can get in a situation where move A and B both seem good at depth N-1, but bad at depth N, and the search of A destroys the hash entry for move B, and vice versa. Then the search of that node would perpetually alternate between A and B. So if you want to use this IID strategy it is essential that you remember in the node itself what the score/depth of each move in the move list was, (and use those scores when the depth is sufficient), rather than reyling on the hash table for that.

One way around this is to remove the move that failed low from the move list while doing the IID.

Posted: **Tue May 24, 2016 9:19 pm**

jwes wrote:One way around this is to remove the move that failed low from the move list while doing the IID.

True, but you would still have to keep a similar administration of which moves you excluded and which are still in. At some point it might turn out that the misery is general, and the other moves do even worse, so that you have to switch back to the original move.

One could consider remembering the score as a sort of gradual/conditional removal, which automatically is undone when the score of other moves drops below that score.

Posted: **Wed May 25, 2016 7:45 am**

hgm wrote:If you use equi-distributed-draft replacement, this would not be a problem. Because there even entries of the current search get replaced when their number in the table gets more than their fair share. This can never happen to the deepest entries in your table, however; these will always be quite small in number. So you will never lose the results of your most valuable searches.

But, if you are modifying an existing program and you want to avoid changing the way the program behaves during tournament play and therefore avoid a lot of extra testing then the way I suggested is best. By setting the UCI parameter to zero depth the behavior of the program would remain exactly as it was before the patch so no additional testing is needed. This way, a program like Stockfish will still remain viable “as-is” for tournament play while addressing an issue related to it's primary use, which is analysis.

hgm wrote:This also solves the same problem when you really do a single search that hugely overloads the hash table (e.g. because you run it for days), and any age-based algorithm would be totally ineffective. And it doesn't require any extra bits in the hash entry. On the contrary, you could even do without any aging info in the hash entry, although it might marginally help to have some (if you can use otherwise unused bits for this).

This might (or might not) be better for a newly designed engine. My way should work on most existing engines WITH OUT modifying the programs behavior for tournament play, since it can be turned off with a UCI parameter.

hgm wrote:If you think exact entries deserve more protection than others, it would be better to also respect that in the normal replacement algorithm. E.g. by acounting the depth of exact entries double for the purpose of depth-preferred decisions.

I think there is no doubt that exact entries are more valuable than upper or lower bound entries. The question is how much more valuable are they and how much longer should they be kept during analysis. I suspect that this depends on the size of the TT, the total time of the analysis, and the number of individual sub-searches conducted during the analysis session. For tournament play only the TT size and the time control need be considered. Your suggestion seems reasonable for analysis. But it's less clear if it would help during tournament play.

Regards,

Forrest

Posted: **Wed May 25, 2016 8:51 am**

Zenmastur wrote:But, if you are modifying an existing program and you want to avoid changing the way the program behaves during tournament play and therefore avoid a lot of extra testing then the way I suggested is best.

By the same logic you cannot afford people to enlarge the hash size, because that would require too much testing to see if the program is really stronger with larger hash. You should of course never test such changes in tournaments.

By setting the UCI parameter to zero depth the behavior of the program would remain exactly as it was before the patch so no additional testing is needed. This way, a program like Stockfish will still remain viable “as-is” for tournament play while addressing an issue related to it's primary use, which is analysis.

You could of course make any change in the hashing scheme that would improve the behavior in analyze mode subject to an option. That holds even more methods that do not require any modification of the hash entry, such as equi-distributed-draft replacement. Just use the old method if the option says so.

But that greatly misses the point. Virtually no one cares the slightest how an engine performs in tournaments. They only care about it when it is a measure for how strong the engine is in analysis, which is where they want to use it for. What you propose is basically the 'Volkswagen solution' to air polution...

I think there is no doubt that exact entries are more valuable than upper or lower bound entries. The question is how much more valuable are they and how much longer should they be kept during analysis. I suspect that this depends on the size of the TT, the total time of the analysis, and the number of individual sub-searches conducted during the analysis session. For tournament play only the TT size and the time control need be considered. Your suggestion seems reasonable for analysis. But it's less clear if it would help during tournament play.

Well, for one, things like that do not have to be guessed, but can be measured with high accuracy. You can keeps stats on the number of nodes you have to search to find exact, upper- and lower-bound scores of each depth.

And that it is not clear that the best method would offer advantages in tournament play, seems a very poor reason for preferring the inferior method. In case of doubt, I would always use what is expected to work best. Unless there would be hard proof that it doesn't. his also has to do with robustness of a design.

Posted: **Wed May 25, 2016 11:00 am**

hgm wrote:
Zenmastur wrote:But, if you are modifying an existing program and you want to avoid changing the way the program behaves during tournament play and therefore avoid a lot of extra testing then the way I suggested is best.
By the same logic you cannot afford people to enlarge the hash size, because that would require too much testing to see if the program is really stronger with larger hash. You should of course never test such changes in tournaments.

The people that develop Stockfish and Komodo use “tournaments” to test their patches. They play the old version against the new version in a contest to see which is better. When you look at SF testing you see lines that look like this:

Code: Select all

24-05-16  Vo   sp-killer  diff   LLR&#58; -2.95 (-2.94,2.94&#41; &#91;0.00,5.00&#93;     sprt @ 10+0.1 th 1    Don't SEE prune main killer. &#40;Sometimes killer moves act like a bait&#41;
                                 Total&#58; 12889 W&#58; 2322 L&#58; 2392 D&#58; 8175

This is in fact the results of a tournament held between two different versions of SF to determine which is better. So, I have to strongly disagree!

When a program like Komodo participate in tournaments like TCEC there is a lot of risk since a bad result could have bad economic consequences for the authors. A lot of buyers look at the programs tournament performance when considering purchasing the program. They want a good value for their money and a poor or very poor performance may make them think twice about a purchase. So the developers may not want to risk their program tournaments prowess on a patch that will only be “good” during long analysis sessions.

hgm wrote:You could of course make any change in the hashing scheme that would improve the behavior in analyze mode subject to an option. That holds even more methods that do not require any modification of the hash entry, such as equi-distributed-draft replacement. Just use the old method if the option says so.

This is true, but the more complex the change the more time, code and testing is required. People with limited resources (which is every chess engine developer on the planet) would be reluctant to make massive changes if there is but little to be gained compared to a simpler implementation. So, unless you want to demonstrate that your method is superior and that the difference is worth the extra effort to implement it, I think most people will choose a simpler option.

hgm wrote:But that greatly misses the point. Virtually no one cares the slightest how an engine performs in tournaments. …

You get Larry Kaufman to post in this thread that he could care less how Komodo does in TCEC then I might take this point seriously. The fact is winning tournaments sells Komodo better than any advertisement ever could. So, I don't think your going to convince me or many other people that “no cares” about tournament reults.

hgm wrote:...They only care about it when it is a measure for how strong the engine is in analysis, which is where they want to use it for. What you propose is basically the 'Volkswagen solution' to air polution...

What I propose is a simple, quick, and easy method to solve an analysis problem without screwing with the programs tournament performance.

hgm wrote: ...Well, for one, things like that do not have to be guessed, but can be measured with high accuracy. You can keeps stats on the number of nodes you have to search to find exact, upper- and lower-bound scores of each depth.

Well, if its so damn easy, why don't you lay some numbers down that supports your claim.

hgm wrote:And that it is not clear that the best method would offer advantages in tournament play, seems a very poor reason for preferring the inferior method. In case of doubt, I would always use what is expected to work best. Unless there would be hard proof that it doesn't. his also has to do with robustness of a design.

Until it's shown to be “inferior” I have no reason to believe that it is. In case of doubt, I would always try the simplest method first. I.E. the “kiss” strategy is a well know strategy for a reason. It works!

Regards,

Zen

Posted: **Wed May 25, 2016 12:27 pm**

Zenmastur wrote:The people that develop Stockfish and Komodo use “tournaments” to test their patches. They play the old version against the new version in a contest to see which is better. When you look at SF testing you see lines that look like this:
Code: Select all
24-05-16  Vo   sp-killer  diff   LLR&#58; -2.95 (-2.94,2.94&#41; &#91;0.00,5.00&#93;     sprt @ 10+0.1 th 1    Don't SEE prune main killer. &#40;Sometimes killer moves act like a bait&#41;
                                 Total&#58; 12889 W&#58; 2322 L&#58; 2392 D&#58; 8175
This is in fact the results of a tournament held between two different versions of SF to determine which is better. So, I have to strongly disagree!

Well, it is pretty stupid to use that for testing modifications in the hashing scheme, as you know that the hash-table size only weakly affects speed, and speed only weekly affects Elo. A doubling of the Hash size typically produces only 6% speedup (the 12th root of 2), which corresponds to 6 Elo. A 10% more efficient use of the hash table would only cause 0.8 Elo. It would take >100,000 games to see that, while the speedup could already be measured accurately from a few hundred test positions (the equivalent of 10 games or so).

When a program like Komodo participate in tournaments like TCEC there is a lot of risk since a bad result could have bad economic consequences for the authors. A lot of buyers look at the programs tournament performance when considering purchasing the program. They want a good value for their money and a poor or very poor performance may make them think twice about a purchase. So the developers may not want to risk their program tournaments prowess on a patch that will only be “good” during long analysis sessions.

Volkswagen also took a lot of risk when they had their cars tested for emission of pollutants, because when they would score badly on that count they would lose a lot of business on people that wanted to buy environmentally friendly cars and governments that offered subsidies for those. So they had the engine run in a different mode during tests as it would run in for the purpose of doing what people actually bought it for: driving on roads from one place to another.

So now they are widely known as frauds and cheaters, who swindled people out of their money with false promises, because that would have better 'economic consequences' for them.

What you propose is that the Komodo developers should likewise defraud their customers, by using methods to beef up their test performance by methods they know would fail badly in actual use. I would not recommend that method of doing business.

This is true, but the more complex the change the more time, code and testing is required. People with limited resources (which is every chess engine developer on the planet) would be reluctant to make massive changes if there is but little to be gained compared to a simpler implementation. So, unless you want to demonstrate that your method is superior and that the difference is worth the extra effort to implement it, I think most people will choose a simpler option.

It is fortunate then that equidistributed-draft replacement is pretty simple to implement.

You get Larry Kaufman to post in this thread that he could care less how Komodo does in TCEC then I might take this point seriously. The fact is winning tournaments sells Komodo better than any advertisement ever could. So, I don't think your going to convince me or many other people that “no cares” about tournament reults.

I was talking about users. Of course a supplier often has interests opposite to customers. He would earn much more by selling them cheap crap for a high price, while the customers want high quality for a low price. So a dishonest supplier could indeed consider it very important to create false expectations with their customers. That doesn't mean, however, that the customers think it equally important that they will have false expectations...

What I propose is a simple, quick, and easy method to solve an analysis problem without screwing with the programs tournament performance.

And what I propose is a simple, safe, fundamentally correct, and robust method to solve the analysis problem and improve tournament performance at the same time.

Well, if its so damn easy, why don't you lay some numbers down that supports your claim.

As I said, there is little to gain from improving hash performance, so for me this doesn't have a very high priority even for my own engines. I am currently working on a tablebase generator, which must be finished before the ICGA event.

Until it's shown to be “inferior” I have no reason to believe that it is.

What you believe or not is of course entirely your business. You could believe that the Earth is flat, for all I care. Technological progress is usually brought about by people with a vision, that do recognize the reasons why some things would work better before it was shown to them that they did. Imagine that Edison would have said "I won't spend any time on building a light bulb, because I have no reason to believe that would work, unless you first show me one that does"...

In fact there are of course many theoretical reasons why some things are inferior to others, and ignoring those as a matter of principle doesn't get you very far. You suggested yourself it would be more important to protect exact entries from overwriting than bounds. And it is easily calculated that a minimal alpha-beta tree has many more nodes in the PV sub-trees than in those hanging from cut- or all-nodes.

In case of doubt, I would always try the simplest method first. I.E. the “kiss” strategy is a well know strategy for a reason. It works!

Well, "simple" doesn't necessarily always translate to "dumb". When in doubt, the first thing I would do is remove the doubt...

TalkChess.com

Suggestion for hash tables and analysis.

Suggestion for hash tables and analysis.

Re: Suggestion for hash tables and analysis.

Re: Suggestion for hash tables and analysis.

Re: Suggestion for hash tables and analysis.

Re: Suggestion for hash tables and analysis.

Re: Suggestion for hash tables and analysis.

Re: Suggestion for hash tables and analysis.

Re: Suggestion for hash tables and analysis.

Re: Suggestion for hash tables and analysis.

Re: Suggestion for hash tables and analysis.