bob wrote:2mb pages don't require any shutdowns in the world of Linux. Nor does 1gb pages. Only problem is, currently, 1gb pages have to be established at boot time and they are forever 1gb pages. But no fragmentation issues that require restarts, still.
I am not convinced that a special table for just the last few plies is worthwhile.
Well, you said yourself that this is where most of the nodes are. The problem is these nodes overwrite entries. You said that you always write data to the TT at every node. So by your own admissions you will be overwriting, and therefor losing, large quantities of data in a heavily oversubscribed TT. There is no replacement policy that can prevent this from happening other than to NOT WRITE new nodes to the TT. If you discard them they do you no good at all. You and several other posters seem to think that there is some policy that can remove the deleterious effects of this “forced” loss of data. And yet, no one has offered any suggestions as to what this policy might be. “Being careful” is not a usable policy. It would be a simple matter to post the policy criteria or even sample code if such a policy existed.bob wrote:If you have a heavily oversubscribed tt, I am also not sure that this is the answer, either. Seems to me that the best approach is always a good overwrite decision more than something simpler such as a separate tt.
My personal opinion is, no one has offered this information because they don't know of any policy that has these properties, and the reason no one knows of such a policy, is because it doesn't exist.
I'm not saying that commonly used policies don't help, but rather, the nature of the help they provide is linear or multiplicative while the nature of the problem is exponential. Of course, there are linear solution to exponential problems that work but only if the problem size is small enough. So, while current policies work for node counts that exceed the TT entry count, as the over-subscription factor increases they are less an less effective at mitigating the effects of data loss. You may not be sure my approach is correct, but I'm positive that a policy change will not be a member of the solution set for the size of problems I'm interested in. But, if you or anyone else, has a miracle policy that you've been hiding all these years please feel free to post it in this thread.
I'm not sure you can transpose just as easily from root to tip. Long transpositions, while clearly possible, seem harder. There are several different types of transpositions. (e.g. transpositions that are of different lengths, transpositions that simply reorder some or all the moves, transpositions in which few or none of the moves in one variation are found in other variations, etc.) Different types occur more frequently in different parts of the game. Some types are easier to construct than others, so I suppose they are more prevalent than other types. Certain types of moves restrict transpositions. Even the complexity of the position influences the occurrence of transpositions.bob wrote:I can transpose near a tip, with a position near the root, just as easily as I can transpose between branches near the tip.
My problem is that I'm not familiar enough with graph theory and how it relates to chess tree's to know or prove that your statement is true or false. My intuitive sense is that your statement is true in some circumstances and false in others. In late middle-game and endgame positions I think your statement is probably true, AND many, many of these transpositions are actually playable. Since so many of them are sound they will be searched and will appear in the TT in very large numbers. This is one of the reasons, although not the only reason, that the TT hit rate goes up so much during endgames. It's not that there is more of them in an endgame, it's that many become playable and even important in an endgame.
The opening and middle game are a completely different story. The types of moves that restrict transpositions seem more prevalent in the opening and middle game. Although there are many transpositions here as well they are much more constrained. I think one of the difference is the complexity of the positions inhibit most transpositions that are possible from being playable. If they're unsound LMR (or other methods) will see they never appear or at least defer them until much deeper searches are being made. I think but can't prove that long transpositions in the opening (later part) and middle game have less influence because they're fewer in number, still fewer are sound, therefore playable, so fewer will be found in the TT.
The most important point:
The most important transposition parameter is not it's length. It's the distance from the ply that completes the transposition to the end of the search. This has a direct impact on me not really caring about long transposition. So, lets take a 45 ply search from the most complicated position we can find, with the transposition starting at ply 1 and ending at ply 42. Ply 42 is the first ply that would be directed into a split TT that has the last four plies of the search directed to it. If this transposition is not found it will cost you, at most, 483 nodes of searching (see the last column of the table I posted earlier in the discussion, these node counts are from the most complicated “real” position I could find on short notice. It's from a correspondence game.) 483 nodes is not a big deal. Now lets shorten it by one ply. The ending ply of this shorter transposition will be found in the L2 TT and it will save about 2,173 nodes of searching. Now, if either of these searches were conducted using a “normal” TT that is heavily over subscribed, say 250 to 1 (I chose this number because any correspondence player that lets their computer run over night or when they're at work will have this happen all most every day of the week that's also why I chose 45 plies as the depth of search.) what's the chances of either of them being found. My suspicion is that it would be less than 1% for each of them. (i.e. these nodes are at the bottom of the search tree and nodes there will be over written as fast as you can process nodes.) So (483 + 2,173 ) * 1% = ~27 nodes saved on average. Now with a split TT the over subscription is reduced. In this case by a factor of 2,173. So, we go from a highly oversubscribed state to an under subscribed state. We will be filling the L2 TT to about 11.5% of capacity. There will still be some overwriting but it will be a fraction of a %. I'm going to take your position and assume that ALL transpositions that cross the L1 TT boundary are missed. So, (0+2,173) * 99% = 2,151 nodes saved on average. Now, if we start moving the end node of the transposition towards the root (i.e. shortening it one ply at a time) the number of nodes saved if the transposition if found goes up by a factor that is approximately the EBF of the engine. The hit rate of the L2 TT will stay about the same regardless because it's under subscribed. The “normal” TT's hit rate will go up as the end node approaches the root because these nodes are given preferential treatment due to search depth. At ply 3 the the savings difference between the two will be almost zero. It's hard to say at which ply the largest savings will occur because it will change with search length, type of position, and complexity. It will tend to be in the middle of the search. For this search it should be between ply 15 and ply 30. The worst savings will be in the least complex positions. The best will be in the most complicated positions like this example.
Yeah... I should have given you a little intro before inviting you in. I was interested in your opinion on NUMA since I knew you had extensive knowledge of the subject. I figured if it works for a single CPU system it would be killer on a NUMA system. We got a bit side tracked on “will it work”.bob wrote:My response was primarily related directly to the question(s) asked of me, starting with NUMA. Which is a completely different issue from what gets overwritten or not...
In the end, I think it's complicated enough that with out trying it there is no way to know for sure. I don't have any illusions about this working on the first attempt. If it's to work as I think it could, it will have to be well thought out. But you opened my eyes on a few details that I hadn't consider, so that's good. I have better ideas about certain aspects, things to look out for, and new things to consider. So, all in all, I think it was constructive. One of the reasons I asked you specifically was I wanted an unbiased opinion from someone that had reason to know these types of things and I think that's what I got. So I'm good with it.
Precisely!syzygy wrote:What do you mean by "last part of the search" ?Zenmastur wrote:You only probe the L1 TT in the last part of the search because all the data from the last part of the search is written to this TT.
Do you mean the last N ply of the search?
Most of the nodes that will be searched are in last N-plies of the search. That is why I chose to split it where I did.syzygy wrote:That's an enormous amount of nodes. The L1 would cover only a uselessly small fraction of it. You'll have practically no hits.
The only way a position from an earlier part of the search can match a position in the later part of the search, is if no pawn moves or captures have been made.
And your point is what exactly?syzygy wrote:The search tree is a tree, not a single branch.
The number of accesses to ANY array is almost irrelevant. The only thing that matters is the number of cache-lines that will be dragged in from memory. This specifically excludes references to cache-lines already in the cache.hgm wrote:Something accessed as infrequently as a hash-table entry would never survive in L1 anyway. There is so much data in a chess engine that is accessed more frequently than once per node, and it would just flush it out of L1. Even PST and Zobrist key tables will have multiple accesses per node (from- and to-square, victim) focused in a smaller area.
The only way to tell is to actually look at the code. Which I haven't done yet. But I'm getting the feeling that you're making a mountain out of a mole hill. If you can give me a specific example of an array or structure that you believe will be accessed enough to drag in sufficient cache-lines it would narrow the search when I go to see whats' what.
Regards,
Forrest