bob wrote:You do realize that in fine 70 q-search is minimal?
What I realize is this:
1.) If you call your q-search 2-million times and only 5% add 1 or more nodes to the count then your total count will be way over 100,000 nodes. This will make a 4,096 entry TT look like it's over-subscribed by more than 20 to 1 when it may not even be full yet.
2.) Many of the nodes counted in an iterative deepening search get counted again with each successive iteration. This is what happens if the evaluation doesn't change from one iteration to the next AND there are no re-searches conducted. If a search fails or has to be re-searched one or more times then the number of times some nodes are counted can grow to more than the iteration count. The lower the EBF is the higher the fraction of nodes that are recounted with every iteration. Fine#70 will have a VERY LOW EBF. This, by itself, is enough to drive the node counts high enough to make it appear that a 4,096 byte TT is highly over-subscribed,when in fact it isn't.
3.) There aren't many legal positions until after the the first pawn is captured. At ply 6 there is only 130 and at ply 12 there are 565 possible positions that can be reached from the starting position. At ply 26 the number is less is less than 5,200. Every single engine I've tested gives huge over-counts at these depths because a node count isn't an accurate reflection of TT subscription rates. PERIOD!
bob wrote:But none of this is really relevant to the discussion about two tables...
Well, let me point something out to you that you either haven't considered or if you did, you didn't consider it very well.
By not probing the TT in the q-search you are throwing away a huge number of transpositions.
bob wrote:...The q-search alone can be 90% of the total nodes searched.
Finally, at least in my code, EVERY node that is searched causes a probe, AND a store when that node is completed. No matter what the outcome, it is stored.
The 90% figure you gave is nothing more than a convenient figure. The fact is that this number varies depending on position complexity and could be much higher than 90% in complicated positions. This being the case, you are throwing away a huge number of transpositions. You have zero chance of catching any of these transpositions because the data isn't even stored. In compensation for this you get a TT that is less over-subscribed than it other wise would be, you cut down the TLB thrashing and avoid a lot of memory wait time.
bob wrote:... If you disallow transpositions between the last N plies and the rest of the search, that hurts performance, no way around it. And that is exactly what you are doing.
I NEVER said I was going to disallow anything. You seem to think that a transposition will be missed
JUST BECAUSE it crosses the boundary between the two TT's. I disagree!
The most common type of transposition is constructed by a simple re-ordering of moves. We'll call these Type I transpositions. Since black and white alternate moves you end up with two ordered lists that are interleaved. One list for black and the other for white. If we represent one sides moves by capital letters and the other sides moves by numbers then we can construct a lists of all possible transpositions of any length. Below is four list of transpositions with lengths up to six plies.
Three plies:
A1B B1A
Four plies:
A1B2 B1A2 A2B1 B2A1
Five Plies:
A1B2C A1C2B B1A2C B1C2A
C1A2B C1B2A A2B1C A2C1B
B2A1C B2C1A C2A1B C2B1A
Six Plies:
A1B2C3 A1B3C2 A2B1C3 A2B3C1 A3B1C2 A3B2C1
A1C2B3 A1C3B2 A2C1B3 A2C3B1 A3C1B2 A3C2B1
B1A2C3 B1A3C2 B2A1C3 B2A3C1 B3A1C2 B3A2C1
B1C2A3 B1C3A2 B2C1A3 B2C3A1 B3C1A2 B3C2A1
C1A2B3 C1A3B2 C2A1B3 C2A3B1 C3A1B2 C3A2B1
C1B2A3 C1B3A2 C2B1A3 C2B3A1 C3B1A2 C3B2A1
I expect a split TT like the one I've described to catch the vast majority of the transpositions listed with high probability. Being able to do this depends on the size of the L1 TT. This will require more than just a 2K entry L1 TT which is precisely why I made these comments:
Zenmastur wrote: By allowing the TT more entries than is required for an N-ply search and allowing the shallowest plies more space than they would normally occupy, they will be preserved longer in the L1 TT ...
Zenmastur wrote:So, by allowing a larger L1 TT than necessary to hold an N-ply search and reserving extra space for the shallowest entries, which are, in fact, the fewest in number, I believe that most transposition will be caught. For those that are not caught, they may expand the tree, but this part of the tree will be entirely contained in the L1 TT....
I've been thinking all along that an 8K entry TT would be sufficient for this purpose. This provides enough room to store a 4-ply search (~2K nodes) and the nodes that are most important to finding transpositions that cross the L1-L2 TT boundary.
But, if you are right about the L1 TT not residing in one of the 3 CPU caches, then there is little reason to keep the L1 TT this small. As long as the number of memory pages it uses is well under the TLB size then the TLB problem is still solved and I don't see any problem with making it larger than 8K entries. (i.e. 128 pages of 4k each would allow a 512K L1 TT to contain 64K entries of 8-bytes each.) This seems more than sufficient to do the job.
Regards,
Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.