Suggestion for hash tables and analysis.

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by bob »

Zenmastur wrote:
bob wrote:Why are you stopping at ply=26. Feel free to go to ply 50.
What would be the point? The root move will not change after ply 26 if your q-search is working properly. The q-search should discover that the f5 pawn is free at ply 26, there is no larger piece that can be captured and no earlier search can force the loss of pawn. Therefore nothing but an erroneous search result can change the root move after ply 26.
bob wrote:Here are the actual counts.

ply=26 hast stores done = 36840 Positions overwritten = 2697
ply=30 stores=66755 overwrites=6817
Just as I suspected. At 26 plies you have 36,840 stores and only 2,697 of them are over-writes. This means that AT MOST, the engine saw 6,793 positions. Not exactly what I would describe as highly over-subscribed TT. In fact, most moves made during TCEC suffer from a greater over-subscription factor than this. (i.e. 4,096/6,793 = ~1.66 )

It's actually very unlikely that the engine saw 6,793 unique nodes. It probably saw about 5,200 just like I said. The rest “appear” to be unique as I explained here:
Zenmastur wrote:...Third, in an over-subscribed TT, entries will be overwritten. If the same position is searched again, a very likely proposition in an iterative deepening search, it will be written to the TT again, but it will appear as if it's unique data when in fact it isn't. This becomes a huge problem for over reporting unique writes...
It probably saw approximately 5,200, just like I said, but only had room for 4,096. About 1,100 or so were over written during the course of 26 iterations and any re-searches done due to a fail-high/low condition. When the nodes that were over-written, were seen again in the next iteration, they “appeared” to be unique data because they didn't match anything in the bucket they were being written to. Just like I said!
bob wrote:...(note, overwrites means storing on top of an entry where signatures do NOT match).

ply=35 stores=184757 overwrites=64631
ply=40 stores=1000538 overwrites=654789
ply=45 stores=4729806 overwrites=3698845
ply=50 stores=42M overwrites=36M

Does that prove a thing? Only that my tree is not blowing up, as the branching factor remains consistent. If the TT were getting swamped by shallow entries the search would die.
I highly doubt your last statement. As I recall you have a triangular array that stores the PV at every iteration. This array is undoubtedly larger than a 4,096 entry TT. While I haven't looked to see exactly how and when it's used my suspicion is that if the entire TT were cleared between iterations the previous iteration's PV would still be in the triangular array to guide the search. This array constitutes a large secondary cache of useful data for the engine to draw upon.

After ply 26 nothing should ever cause it to change it's root move. So as far as I'm concerned it makes no difference if you end the search at 26 plies or continue for ever. The game should be decided by the q-search at ply 26 in which it should be determined that white wins a black pawn without compensation.
bob wrote:Somehow you are overlooking some SERIOUS math flaws.


I think it's more likely that you're misinterpreting the data you presented. My counts at 26 plies might not be perfect, but I don't think they're in serious error.
bob wrote:The shallowest positions are not "the fewest in number". This is an exponential tree. Those nodes near the tip are the huge majority of total positions searched. How does this slip by your analysis???
I think you are confusing a shallow position with a leaf node. A shallow position is near the root not near a leaf. So in a search a shallow position would be from ply 1 to some number less than half the search depth. You seem to be thinking I meant a node near to the leaf nodes.
bob wrote:My point is this, and only this. You fix the boundary between the two tables wherever you want. I claim, and can probably prove with some coding changes, that wherever you fix that boundary, you prevent transpositions from occurring across that boundary.
I don't want you to go to the trouble if you have other things you would rather attend to. Even so, I don't think it can be proven, but if you attempt to prove it I would love to see your data as well as the conditions under which the experiment(s) were performed.
bob wrote:And I will remind you these trees are NOT searched to a fixed depth. In Crafty, null-move R can range up to 8 or beyond, and the LMR reductions are about the same. So a path can have only 6 moves in it and represent a 30-40 ply search from the root. And it can transpose easily...
I've been aware since the very beginning that various reduction can greatly affect the depth of searches in different branches. This does not worry me in the slightest. I doubt that this will be a problem since these are reductions to the search depth. I believe this will be handled nicely, but just in case it's a problem, I have an easy fix worked out.

Regards,

Zen
The PV array is used to seed the TT at the start of every iteration. But if something doesn't fit and gets overwritten, that would be a deal-breaker.

I don't think anyone really needs a proof to show that a position from the first three plies can be reached via a transposition at the last three plies. I can do one of those by hand. And when you figure that many of the searches are quite shallow, not allowing transpositions across the boundary is a problem.

But more significantly, the OTHER direction is just as important with grafting, which is how programs find the solution to fine #70 well before they have actually searched 26 plies deep.

For fine 70, here is what happens. (note that if you have REALLY good move ordering this will not happen and you will require the full 26+ plies to solve this position).

You search starting with Kb1, and your opponent plays poor moves, allowing you to reach a position somewhere deeper where you force the win of the pawn. Now your opponent plays a better defense, and you eventually reach that same position, but too deep in the tree to be able to see the forced win of the pawn, UNLESS you fetch it from the TT. This tree grafting is an important part of our programs' ability to solve really deep endgames. It is reasonably important that the ENTIRE tree be visible through the tt, throughout the entire tree. Reasonable replacement strategies make this work extremely well.

If you believe there is a flaw in current programs, that's a task you need to take on to prove. It is so counter to what I know about tree searching (and apparently others as well since they seem to agree with my methodology) that it really doesn't attract my interest for testing...
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by Zenmastur »

bob wrote:The PV array is used to seed the TT at the start of every iteration.
So, in your particular case, some of the most valuable data is stored in a secondary array. Even if it's overwritten in the TT during an iteration it will be re-written to the TT before the next iteration begins. I agree that this is a very nice feature, but it makes it more difficult to determine how well the TT replacement algorithm is working. This seems to defeat the purpose of this type of test. So, it should probably be turned off if you are trying to determine how good your replacement replacement algorithm is working on a highly over-subscribed TT.
bob wrote:But if something doesn't fit and gets overwritten, that would be a deal-breaker.
Well, if you are writing that last iteration's PV to the TT it should easily fit in 4,096 entries. If your trying to write the entire triangular array to the TT it could be a problem, but honestly I don't know why you would do such a thing. There doesn't seem to be any advantage to this.
bob wrote:I don't think anyone really needs a proof to show that a position from the first three plies can be reached via a transposition at the last three plies. I can do one of those by hand.
I think you greatly underestimate the constraints placed on the problem by the engines need to find the best line of play. I don't think this is as nearly as easy as you make it sound. So, here's a position from Fine70# analysis. They aren't going to get any easier than this one.

[d]8/1k6/3p4/p2P1p2/P2P1P2/2K5/8/8 b - - 0 1

Here is some analysis that shows two lines of play that transpose after 3-plies just like you claimed.

[pgn][Event "Analysis of"]
[Site "BCE Fine#70"]
[Date "1901.06.20"]
[Round "70"]
[White "Lasker, Emanuel"]
[Black "Reichhelm, Gustavus Charles"]
[Result "1-0"]
[FEN "8/1k6/3p4/p2P1p2/P2P1P2/2K5/8/8 b - - 0 1"]

1. ... Kc7
( 1. ... Ka6 2. Kd3 Kb7 )
( 1. ... Kc7 2. Kd3 Kb7)
1-0[/pgn]

In case you are wondering, or you think there is some trick to this, these two variations are co-best lines of play in this position.

Give a line of play in which this transpositions occurs at the bottom of a long search, just like you claim you can. There is one additional constraint, both lines of play need to appear in the TT simultaneously. (i.e. neither line will be pruned by an engine like Crafty, Stockfish, or Komodo.) I don't think it's nearly as easy as you claim. In fact, I think the general case is a very hard problem. But what do I know? Maybe you can whip out a solution in a few seconds like you claim. I doubt it, but feel free to demonstrate!

I've made this point before, but just in case it was overlooked, or you didn't understand WHY I was making it: Just because a transposition exists, doesn't mean the engine will include it in it's search tree. If the engine prunes it, it effectively doesn't exist. It won't be in the TT and therefore can't affect the search. Changing the move order is fine in theory “UNTIL” you look at the individual moves with an engine. It will, in many cases, point out why that move order can't be played. The longer the transposition the higher the probability that this will happen and the engine never see's the transposition because the line of play needed to complete it is pruned.

The other point I want to make about “long” AND/OR “deep” transpositions I've made before, but it bears repeating. The single most important factor in determining the value of a transposition is the distance in plies from the last move of the transposition to the end of the search. The length of the transposition is almost irrelevant. The only reason it's length has any bearing on it's worth is that in a depth limited search a long transposition can't have it's ending node as far from the leaf node as a short one. Therefore they tend to be less valuable. For the same reason, a transposition that starts deeper in the search is less valuable than one that start nearer to the root position. If they are the same length then the one that starts first is more important.

These factors, when combined, mean that it's much more important to find the transpositions nearer the root move than near a leaf node. This is precisely what a split TT is meant to accomplish in a highly over-subscribed TT. It will no doubt miss some of the transpositions at the very end of the search. So this amounts to a trade. We trade some of the least valuable transpositions to facilitate a VERY high percentage of all transpositions higher in the tree being found.

The question is how much more important is finding a transposition that is x-plies further from a leaf node? As far as I've been able to determine the relative difference in work saved is:

Difference ~ = C * EBF ^ ( Dt1 – Dt2 )

Where:
Dt1 is the distance from the last ply of the first transposition to the end of the search in plies.
Dt2 is the distance from the last ply of the second transposition to the end of the search in plies.
The transpositions should be ordered so that Dt1 is always the larger of the two.
EBF is the effective branching factor of the engine.
C is a constant.

If the EBF of the engine is 1.7, the search depth is 45-plies, Dt1=42, Dt2=0 then the approximate difference in work saved is 1.7 ^ 42 = 4,773,695,332 * C

I have a few other thoughts and comments on your post, but those will have to wait a few days.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by bob »

Zenmastur wrote:
bob wrote:The PV array is used to seed the TT at the start of every iteration.
So, in your particular case, some of the most valuable data is stored in a secondary array. Even if it's overwritten in the TT during an iteration it will be re-written to the TT before the next iteration begins. I agree that this is a very nice feature, but it makes it more difficult to determine how well the TT replacement algorithm is working. This seems to defeat the purpose of this type of test. So, it should probably be turned off if you are trying to determine how good your replacement replacement algorithm is working on a highly over-subscribed TT.
bob wrote:But if something doesn't fit and gets overwritten, that would be a deal-breaker.
Well, if you are writing that last iteration's PV to the TT it should easily fit in 4,096 entries. If your trying to write the entire triangular array to the TT it could be a problem, but honestly I don't know why you would do such a thing. There doesn't seem to be any advantage to this.
bob wrote:I don't think anyone really needs a proof to show that a position from the first three plies can be reached via a transposition at the last three plies. I can do one of those by hand.
I think you greatly underestimate the constraints placed on the problem by the engines need to find the best line of play. I don't think this is as nearly as easy as you make it sound. So, here's a position from Fine70# analysis. They aren't going to get any easier than this one.

[d]8/1k6/3p4/p2P1p2/P2P1P2/2K5/8/8 b - - 0 1

Here is some analysis that shows two lines of play that transpose after 3-plies just like you claimed.

[pgn][Event "Analysis of"]
[Site "BCE Fine#70"]
[Date "1901.06.20"]
[Round "70"]
[White "Lasker, Emanuel"]
[Black "Reichhelm, Gustavus Charles"]
[Result "1-0"]
[FEN "8/1k6/3p4/p2P1p2/P2P1P2/2K5/8/8 b - - 0 1"]

1. ... Kc7
( 1. ... Ka6 2. Kd3 Kb7 )
( 1. ... Kc7 2. Kd3 Kb7)
1-0[/pgn]

In case you are wondering, or you think there is some trick to this, these two variations are co-best lines of play in this position.

Give a line of play in which this transpositions occurs at the bottom of a long search, just like you claim you can. There is one additional constraint, both lines of play need to appear in the TT simultaneously. (i.e. neither line will be pruned by an engine like Crafty, Stockfish, or Komodo.) I don't think it's nearly as easy as you claim. In fact, I think the general case is a very hard problem. But what do I know? Maybe you can whip out a solution in a few seconds like you claim. I doubt it, but feel free to demonstrate!

I've made this point before, but just in case it was overlooked, or you didn't understand WHY I was making it: Just because a transposition exists, doesn't mean the engine will include it in it's search tree. If the engine prunes it, it effectively doesn't exist. It won't be in the TT and therefore can't affect the search. Changing the move order is fine in theory “UNTIL” you look at the individual moves with an engine. It will, in many cases, point out why that move order can't be played. The longer the transposition the higher the probability that this will happen and the engine never see's the transposition because the line of play needed to complete it is pruned.

The other point I want to make about “long” AND/OR “deep” transpositions I've made before, but it bears repeating. The single most important factor in determining the value of a transposition is the distance in plies from the last move of the transposition to the end of the search. The length of the transposition is almost irrelevant. The only reason it's length has any bearing on it's worth is that in a depth limited search a long transposition can't have it's ending node as far from the leaf node as a short one. Therefore they tend to be less valuable. For the same reason, a transposition that starts deeper in the search is less valuable than one that start nearer to the root position. If they are the same length then the one that starts first is more important.

These factors, when combined, mean that it's much more important to find the transpositions nearer the root move than near a leaf node. This is precisely what a split TT is meant to accomplish in a highly over-subscribed TT. It will no doubt miss some of the transpositions at the very end of the search. So this amounts to a trade. We trade some of the least valuable transpositions to facilitate a VERY high percentage of all transpositions higher in the tree being found.

The question is how much more important is finding a transposition that is x-plies further from a leaf node? As far as I've been able to determine the relative difference in work saved is:

Difference ~ = C * EBF ^ ( Dt1 – Dt2 )

Where:
Dt1 is the distance from the last ply of the first transposition to the end of the search in plies.
Dt2 is the distance from the last ply of the second transposition to the end of the search in plies.
The transpositions should be ordered so that Dt1 is always the larger of the two.
EBF is the effective branching factor of the engine.
C is a constant.

If the EBF of the engine is 1.7, the search depth is 45-plies, Dt1=42, Dt2=0 then the approximate difference in work saved is 1.7 ^ 42 = 4,773,695,332 * C

I have a few other thoughts and comments on your post, but those will have to wait a few days.

Regards,

Zen
A very tiny bit of data is stored in the PV array. But the PV is a tiny part of the total search tree, and ordering and scores at the other nodes are also extremely important. And no, I ONLY store the PV itself in the tt after an iteration. This is more efficient than having a special bit of move ordering/selection code to follow the PV. Since the tt will now produce each PV move as we advance ply by ply, with no extra in-search code that then has to be skipped over everywhere.
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by jwes »

I think what we need is better data. If you create a test version that includes path info in the tt, we can see how many transpositions occur at what depth and when the paths diverge.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by bob »

jwes wrote:I think what we need is better data. If you create a test version that includes path info in the tt, we can see how many transpositions occur at what depth and when the paths diverge.
I don't know how to do that. I did do a counter version that took the depth at which a hit occurred, and incremented a counter for the difference between the current depth and the table draft...

This is from fine 70 searched up to depth=26:

difference= 0 count=12195
difference= 1 count=7098
difference= 2 count=4623
difference= 3 count=3106
difference= 4 count=2165
difference= 5 count=1550
difference= 6 count=1195
difference= 7 count=964
difference= 8 count=709
difference= 9 count=470
difference=10 count=425
difference=11 count=294
difference=12 count=189
difference=13 count=112
difference=14 count=64
difference=15 count=36
difference=16 count=17
difference=17 count=3
difference=18 count=3


If you look at say difference=7, that says that 1408 times we got a hash hit that was at from at least a 7 ply deeper search. And these counters were ONLY incremented if current depth <= 5, which means within 5 plies of the tip.

However, we don't need any data IMHO, the idea is straightforward. Transpositions should be recognized WHEREVER they occur, there is no justification for saying that the above are not allowed.

BTW if I change the restriction above to count ALL transpositions, I get this:

difference= 0 count=17686
difference= 1 count=11317
difference= 2 count=7885
difference= 3 count=5178
difference= 4 count=3437
difference= 5 count=2416
difference= 6 count=1910
difference= 7 count=1408
difference= 8 count=994
difference= 9 count=632
difference=10 count=523
difference=11 count=316
difference=12 count=205
difference=13 count=118
difference=14 count=68
difference=15 count=40
difference=16 count=17
difference=17 count=3
difference=18 count=3

That is, the latter represent ANY transposition recognized anywhere in the tree, counting the number of times the stored draft - depth was equal to the difference. The bigger the difference value, the farther away from the last 5 plies we were when we stored the entry we are now matching against (the usual grafting problem).

Who wants to pass on improving the tree search by this much?
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by Zenmastur »

bob wrote:
jwes wrote:I think what we need is better data. If you create a test version that includes path info in the tt, we can see how many transpositions occur at what depth and when the paths diverge.
I don't know how to do that. I did do a counter version that took the depth at which a hit occurred, and incremented a counter for the difference between the current depth and the table draft...

This is from fine 70 searched up to depth=26:
I read this several times and I'm still not sure I understand what this data represents. I want to make sure we are using the same terminology when talking about different depths. I always seem to have trouble with some of the definitions. But before I do that, I have a few question that have a direct bearing on the validity of this test.

My original objection to using fine#70 still stands. The whole point of using split TT's is to unload the “main” or L2 TT when a “normal” TT of the same size would be highly over-subscribed. The definition I've been using for “highly over-subscribed” is based on the mathematics for the simplest of hash tables, (i.e. one entry per bucket and always over-write existing data.) The number I use to determine if a TT is highly-oversubscribed is the minimum number of unique writes to the TT that statistically could cause every entry in the TT to be overwritten “if” the data is IID. I use this even though most TT's have larger bucket sizes, use a different replacement policy, and the data isn't IID. Because of these differences the number given WILL NOT overwrite all entries of a more sophisticated hash table, but it will overwrite a sufficient number of entries to seriously degrade its performance. So I find this is a useful definition of “highly oversubscribed”. If the number of unique writes is greater than the number of table entries then it's “just” an over-subscribed TT. It's easier if this number is express as a multiple of the number of entries in the TT. This number isn't a static constant. It changes with the number of entries in the hash table. For a 2^12 entry TT the minimum number of unique writes needed is 36,118 which is 8.82 times the number of entries. So the over-subscription factor needed to meet the definition of "highly over-subscribed" is 8.82 in this particular case. For a TT with 2^24 entries the factor is 17.14 and for 2^32 entries the factor is 22.68.

So my first objection to using Fine#70 is that at ply 26 the over-subscription factor is ~5,200 / 4,096 = 1.27 so it doesn't qualify as highly over-subscribed. Therefore no benefit will be had by using a split TT.

The transposition space of a 26-ply search is only 26 – 3 = 23 plies. Directing 5-plies of this to a L1 TT represents greater than 21.7% of the total transposition space. A 45-ply search has 42-plies of transposition space. 4-plies directed to an L1 TT represents less than 9.5% of transposition space. This disparity is greater than a factor of two.

How many threads did you use? The reason I ask is each thread has it's own L1 TT. If you used 20-threads this is a problem. There is only about 5,200 unique nodes in a 26-ply search of Fine#70 so if you divide these among 20 threads each thread is only going to search about 260 unique nodes. Each thread will have it's own L1 TT. If you're directing 5-plies of the search to the L1 TT's then each one should at least 16K entries and probably larger. Which means combined the 20 threads will have about 320K entries of L1 TT space. This seems a little bit of over-kill for a problem with only 5,200 unique nodes. The L1 and L2 TT's are too different in size to make this work. Having a 4K entry L2 TT and multiple 16+K entry L1 TT's isn't they way this should be configured.

In order for this test to have any validity at all the sum of all the L1 TT space should be a least an order of magnitude less than the L2 TT space. So, if you want to uses 20-threads with 4-plies directed to an L1 TT then I suggest that, AT A MINIMUM, 4 million entries in the L2 TT. The L1 TT should have 8K entries each with 4-plies directed to them so 20 * 8 * 10 = 1,600K or 2^21 entries. Even this seems to be a little on the small side. So, something a little larger than this should be preferred. A 4-million entry L2 TT needs an over-subscription factor of about 16 to be considered “highly over-subscribed”. This means 4 * 16 = 64 million UNIQUE nodes needs to be written to the TT. NOT 64-million nodes total, they have to be unique node so more will be required. A search of approximately 200 million nodes should be used because, depending on the EFB, only about a third of the node count will be unique in an iterative deepening search. For Crafty, which counts nodes that aren't even probed, considerably more nodes will be needed to assure that enough unique writes are generated to guarantee the the L2 TT is actually "highly over-subscribed". So something like a 30-seconds at 100Mnps should be sufficient depending on what fraction of these nodes are from the q-search. It would be even better to uses less cores with the figures given above as this will be much closer to what this type of TT will see in service.

Also a test should be done with the same position with the largest TT possible. The idea is to see the difference in number between a “normal” TT that is highly oversubscribed to one that isn't. The problem is you need a HUGE TT to do this test if you use 20 threads and a 30-second search as described above. So maybe a good compromise would be to use 2 threads, a TT size of 256K entries, a search time set so the number of non-q-search nodes is at least 13 times the number of TT entries or about 3.5 million nodes. Then a control test with a 4 billion entry TT should catch almost all transpositions. This way you can see the how many transpositions are missed by a normal "TT" that is highly over subscribed vice one that isn't over subscribed.

I think a good source of positions to use in this type of test is a balanced position from an 8 ply book.

One of the points to using a split TT is so you can let a search run overnight with out overwriting every entry in the TT. That way if you have to back-up the search a few plies the data from earlier searches will still be in the TT, which is what the OP was complaining about. Fine#70 shouldn't ever be considered for testing this kind of a patch. period. You can't scale the problem down that far and expect it to work the same.

Where did you come-up with the number 5? I don't recall using it in any example. The number of plies directed to the L1 TT has a direct effect on the size of the search the that can be conducted with a given L2 TT size. (e.g. a 2 billion entry L2 TT (32GB if using Crafty) with the last 5-plies directed to an L1 TT should handle a search with about 10 trillion unique writes. On your average home computer this is going to be an extremely long search. At 10Mnps it will take 3,000,000 seconds or about 35 days. This seems a little excessive even for correspondence chess. I think 4 is more than enough for most needs even if the TT is restricted to 4GB or 8GB in size.

I also don't think the numbers you posted accurately reflect what is going to happen. First, if you're directing 5-plies to the L1 TT and the TT is sized properly it will find all 3, 4, and 5 ply transpositions that occur entirely with in its confines and it's also going to find almost 100% of all transpositions that occur entirely with in the L2 TT something that a “highly over-subscribed” won't do. This isn't reflected in your numbers as far as I can tell. Second, you seem to be assuming that just because a transposition crosses L1 L2 TT boundary it will be missed. I know that's not the case. I really haven't worked out what size the TT should be if 5-plies are direct to it, but for 4-plies an 8K entry TT should catch almost all 3 and 4 ply transpositions that cross the boundary between the two TT's as well as some of the 5 and 6 ply transpositions.

I think part of the problem here is that you don't have any idea of how this is to be used, so let me give one example. Imagine this, you are playing in a correspondence tournament with eleven players so you have 10 games to play. You have a 12-core system with 128-GB of ram (or three or four smaller systems). The normal way to handle this is to work on each game one at a time. You use all 12-cores and a 64GB TT. When you're finished analyzing one game you make your move and then go onto the next game that needs attention. The problem comes when you are “on the move” in more than one game. On the average you're going to be “on the move” in half of your games if you and your opponent are moving at about the same pace. So in this case you'll be on the move in 5 games on average. This is a problem because you're burning time on 5 of your game clocks but only working on one of the games. If your already a day into the analysis of the game you are working on and aren't close to having a move ready it's not very efficient to stop this analysis, work on other games, and then restart the analysis later, as this could take a full day to get back to where you were. You can be forced into doing this is you are low on time in one of the other games. So, you end stopping the current analysis, working on any game in which you are low on time and then going back and restarting your analysis on the other games when you have time. In the mean time your always burning time on about 4 game clocks while you are analyzing.

Instead of doing this, you use a split TT of 8GB with the last 4-plies directed to an 8k entry L1 TT. This should allow a 1 trillion node search with out any problems. Each game gets one core that will run 24/7. If a single core can process 2Mnps then a trillion node search requires 500,000 seconds or about 6 days which is about right for most time controls. The advantages are, you don't have any efficiency loss due to using multiple threads, you can work on any game at any time without losing days of work, all games will have long searches completed by the time your opponent decides to move, and a split TT won't incur any losses by having a transposition missed because the positions occurred in two different threads.


I still haven't had time to address all of either of your last two posts, maybe this weekend.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by Zenmastur »

I forgot one thing with the tests I was suggesting. The search depth isn't going to be deep enough. To many things in this problem don't scale very well. But maybe we can find a late middle game position that has enough nodes, which is balanced but can't be resolved to a draw, has a high nps search speed so that a deep search can still be conducted without to large of a node count or too much time.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by Zenmastur »

bob wrote:
jwes wrote:I think what we need is better data. If you create a test version that includes path info in the tt, we can see how many transpositions occur at what depth and when the paths diverge.
I don't know how to do that. I did do a counter version that took the depth at which a hit occurred, and incremented a counter for the difference between the current depth and the table draft...

This is from fine 70 searched up to depth=26:

difference= 0 count=12195
difference= 1 count=7098
difference= 2 count=4623
difference= 3 count=3106
difference= 4 count=2165
difference= 5 count=1550
difference= 6 count=1195
difference= 7 count=964
difference= 8 count=709
difference= 9 count=470
difference=10 count=425
difference=11 count=294
difference=12 count=189
difference=13 count=112
difference=14 count=64
difference=15 count=36
difference=16 count=17
difference=17 count=3
difference=18 count=3


If you look at say difference=7, that says that 1408 times we got a hash hit that was at from at least a 7 ply deeper search. And these counters were ONLY incremented if current depth <= 5, which means within 5 plies of the tip.

However, we don't need any data IMHO, the idea is straightforward. Transpositions should be recognized WHEREVER they occur, there is no justification for saying that the above are not allowed.

BTW if I change the restriction above to count ALL transpositions, I get this:

difference= 0 count=17686
difference= 1 count=11317
difference= 2 count=7885
difference= 3 count=5178
difference= 4 count=3437
difference= 5 count=2416
difference= 6 count=1910
difference= 7 count=1408
difference= 8 count=994
difference= 9 count=632
difference=10 count=523
difference=11 count=316
difference=12 count=205
difference=13 count=118
difference=14 count=68
difference=15 count=40
difference=16 count=17
difference=17 count=3
difference=18 count=3

That is, the latter represent ANY transposition recognized anywhere in the tree, counting the number of times the stored draft - depth was equal to the difference. The bigger the difference value, the farther away from the last 5 plies we were when we stored the entry we are now matching against (the usual grafting problem).

Who wants to pass on improving the tree search by this much?
How can you have two transpositions that have a move difference of one. This doesn't seem possible since the opposite side will be on the move. What am I missing here?

This data doesn't seem very useful. Assuming it's accurate, the only thing I can conclude from it is you lose approximately 1/3 of your transpositions per ply difference. It doesn't contain any useful information about what happens when the TT is highly over subscribed.

I ran a bunch of tests on Crafty V24.0 to test the effects of an overloaded TT. The main problem is the node count doesn't represent the TT probe count. Even so, when the node count reached 5 times the number of slots in the TT the there was a noticeable rise in the number of nodes required to reach a certain depth of search. (i.e. the TT starts losing efficiency.) This becomes very marked when the node count reaches ~250 times the size of the TT. It effectively doubles the search time to depth. It would be nice to know how many TT probes have actually occurred. As the pseudo-over-subscription factor rises so does the search times relative to an under-subscribed TT. It seems to rise with out bound, but I was only able to test it up until the time to depth was about 12 times that of an under-subscribed TT. At that point I had insufficient memory to keep the control TT under-subscribed.
hgm wrote:A doubling of the Hash size typically produces only 6% speedup (the 12th root of 2), which corresponds to 6 Elo. A 10% more efficient use of the hash table would only cause 0.8 Elo. It would take >100,000 games to see that, while the speedup could already be measured accurately from a few hundred test positions (the equivalent of 10 games or so).
I always doubted the first statement. It may be true if the TT's are kept at the same subscription factor.
(i.e. if the TT size is double the number of search nodes is also doubled). It's definitely false if the problem size stays the same and the TT size is increased. For Crafty I was getting more than a 20% decrease in tree size per doubling until the TT was sufficiently large and then no further decrease was observed regardless of TT size. With large differences in TT sizes I was able to reduce the tree size by a factor of about 40. So unloading a TT can pay big dividends in search time.


The position I'm using is this one.

[d]6k1/2p2rp1/p6p/P2b3N/5P2/7P/2p4K/2B1R3 b - - 0 41

The primary advantages are that if it's a win it's unlikely that anyone one has enough computing power to find it with a normal engine. There is no shortage of unique nodes to fill a TT with and it runs pretty quickly on most systems so a high search depth can be reached in a reasonable time.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by bob »

Zenmastur wrote:
bob wrote:
jwes wrote:I think what we need is better data. If you create a test version that includes path info in the tt, we can see how many transpositions occur at what depth and when the paths diverge.
I don't know how to do that. I did do a counter version that took the depth at which a hit occurred, and incremented a counter for the difference between the current depth and the table draft...

This is from fine 70 searched up to depth=26:

difference= 0 count=12195
difference= 1 count=7098
difference= 2 count=4623
difference= 3 count=3106
difference= 4 count=2165
difference= 5 count=1550
difference= 6 count=1195
difference= 7 count=964
difference= 8 count=709
difference= 9 count=470
difference=10 count=425
difference=11 count=294
difference=12 count=189
difference=13 count=112
difference=14 count=64
difference=15 count=36
difference=16 count=17
difference=17 count=3
difference=18 count=3


If you look at say difference=7, that says that 1408 times we got a hash hit that was at from at least a 7 ply deeper search. And these counters were ONLY incremented if current depth <= 5, which means within 5 plies of the tip.

However, we don't need any data IMHO, the idea is straightforward. Transpositions should be recognized WHEREVER they occur, there is no justification for saying that the above are not allowed.

BTW if I change the restriction above to count ALL transpositions, I get this:

difference= 0 count=17686
difference= 1 count=11317
difference= 2 count=7885
difference= 3 count=5178
difference= 4 count=3437
difference= 5 count=2416
difference= 6 count=1910
difference= 7 count=1408
difference= 8 count=994
difference= 9 count=632
difference=10 count=523
difference=11 count=316
difference=12 count=205
difference=13 count=118
difference=14 count=68
difference=15 count=40
difference=16 count=17
difference=17 count=3
difference=18 count=3

That is, the latter represent ANY transposition recognized anywhere in the tree, counting the number of times the stored draft - depth was equal to the difference. The bigger the difference value, the farther away from the last 5 plies we were when we stored the entry we are now matching against (the usual grafting problem).

Who wants to pass on improving the tree search by this much?
How can you have two transpositions that have a move difference of one. This doesn't seem possible since the opposite side will be on the move. What am I missing here?

This data doesn't seem very useful. Assuming it's accurate, the only thing I can conclude from it is you lose approximately 1/3 of your transpositions per ply difference. It doesn't contain any useful information about what happens when the TT is highly over subscribed.

I ran a bunch of tests on Crafty V24.0 to test the effects of an overloaded TT. The main problem is the node count doesn't represent the TT probe count. Even so, when the node count reached 5 times the number of slots in the TT the there was a noticeable rise in the number of nodes required to reach a certain depth of search. (i.e. the TT starts losing efficiency.) This becomes very marked when the node count reaches ~250 times the size of the TT. It effectively doubles the search time to depth. It would be nice to know how many TT probes have actually occurred. As the pseudo-over-subscription factor rises so does the search times relative to an under-subscribed TT. It seems to rise with out bound, but I was only able to test it up until the time to depth was about 12 times that of an under-subscribed TT. At that point I had insufficient memory to keep the control TT under-subscribed.
hgm wrote:A doubling of the Hash size typically produces only 6% speedup (the 12th root of 2), which corresponds to 6 Elo. A 10% more efficient use of the hash table would only cause 0.8 Elo. It would take >100,000 games to see that, while the speedup could already be measured accurately from a few hundred test positions (the equivalent of 10 games or so).
I always doubted the first statement. It may be true if the TT's are kept at the same subscription factor.
(i.e. if the TT size is double the number of search nodes is also doubled). It's definitely false if the problem size stays the same and the TT size is increased. For Crafty I was getting more than a 20% decrease in tree size per doubling until the TT was sufficiently large and then no further decrease was observed regardless of TT size. With large differences in TT sizes I was able to reduce the tree size by a factor of about 40. So unloading a TT can pay big dividends in search time.


The position I'm using is this one.

[d]6k1/2p2rp1/p6p/P2b3N/5P2/7P/2p4K/2B1R3 b - - 0 41

The primary advantages are that if it's a win it's unlikely that anyone one has enough computing power to find it with a normal engine. There is no shortage of unique nodes to fill a TT with and it runs pretty quickly on most systems so a high search depth can be reached in a reasonable time.

Regards,

Zen
Easy. One idea is a king triangulation where you do nothing but change the side to move. If this were no worry, no one would find it necessary to include side to move in the hash signature.

You have made a couple of statements in this thread that leave me wondering. The above is one. The other had to do with no transpositions after a capture. Either of these can happen trivially, in fact, and they highlight how important and useful the TT can be.

BTW this is not exactly "move distance". It is "draft difference". The difference between the depth of the entry that was stored and the depth at the point where we get an acceptable match (draft >= depth) at the current position.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by Zenmastur »

bob wrote:Easy. One idea is a king triangulation where you do nothing but change the side to move. If this were no worry, no one would find it necessary to include side to move in the hash signature.
A transposition starts in one position and ends in another position. The path between the two positions is what changes. If there is an odd number of plies between the start positions then it can't be a transposition because a different side is on the move. Likewise, if there is an odd number of plies between the ending position then the ending positions can't be identical for the same reason. If you are allowing transpositions that don't start from the same position then it's length has little meaning.

I think these ideas would be obvious even to the most casual observer.

So, I don't know what you're measuring but it doesn't seem to be transpositions.

In any case, I assume that these numbers were supposed to support your claim that it's
bob wrote:I can transpose near a tip, with a position near the root, just as easily as I can transpose between branches near the tip.
I don't think think they do. In fact, they seem to support this idea:
Zenmastur wrote:I'm not sure you can transpose just as easily from root to tip. Long transpositions, while clearly possible, seem harder.
Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.