Suggestion for hash tables and analysis.

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by bob »

Zenmastur wrote:
bob wrote:I very carefully hang on to tt entries with deep drafts. And I very carefully allow those to be probed near the tips. This is common enough that it is what lets a program solve fine #70 WAY before the exhaustive search can force the loss of a black pawn with best play by both sides.

So they DO happen. With your case they will NEVER happen. In the last few plies you will NEVER recognize a position that occurred earlier in the tree. Seems wrong to me...
bob wrote:Again, they are easy enough to find. Just run fine #70 with the normal hash approach and with your approach. I suspect you will be surprised.

As far as your "prove me wrong" that's not going to work. YOU prove that your idea is better and folks will look at it. The idea of such a split tt is inherently wrong IMHO. One for near the tips and one for the rest misses out on too many transpositions.
I have no doubt that 10-ply transpositions exist. I also have no doubt that you won't be finding many of them deep in a search with a highly over-subscribed TT.

I have run fine #70 with a very small TT.

Code: Select all

2016-06-15 17&#58;35&#58;00.701<--1&#58;info depth 11 seldepth 13 multipv 1 score cp 55 nodes 9132 nps 3044000 tbhits 0 time 3 pv a1b1 a7b7 b1c2 b7b6 c2d2 b6c7 d2e2 c7d8 e2d3 d8c7 d3c3 c7b7 c3b37 c3d3 c7b6 d3e3 b6c7 e3f3 c7d7 f3g3 d7e7 g3h4 e7f6 h4h5 f6f7 h5g5 f7g7 g5f5 g7f7 f5g5 f7g7 f4f5 g7f7 f5f6.......
If finds the correct move in 3 milliseconds and stays with it. Notice the node count is 9,132.

If all of chess was as simple as Fine#70 the OP wouldn't be complaining, chess would be solved, and no chess engine would need more than a 1mb TT.

Regards,

Zen
It is not about finding the right move. It is about finding the right move for the right reason. But there is still more here.

1. Crafty finds Kb1 at depth=9. But the eval is wrong. You find the right move for the right answer when your eval hits +2 or so...

2. Crafty finds Kb1 with eval of +2 at depth 19 to 21 depending on hash size, in less than 10 milliseconds.

3. More importantly, it hits depth 42 with a score of +12 in under one second, using just one core.

So just playing Kb1 doesn't show what you have is working, until you have an evaluation and a PV that shows the f5 pawn falling, at least.

Note all of the above with a 64kb (4k entry) hash table. To get to depth 42 requires 7.2M nodes. That GREATLY overwrites 4K entries. Yet it works just as well as 64M entries...
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by Zenmastur »

bob wrote:It is not about finding the right move. It is about finding the right move for the right reason. But there is still more here.

1. Crafty finds Kb1 at depth=9. But the eval is wrong. You find the right move for the right answer when your eval hits +2 or so.

2. Crafty finds Kb1 with eval of +2 at depth 19 to 21 depending on hash size, in less than 10 milliseconds.

3. More importantly, it hits depth 42 with a score of +12 in under one second, using just one core.

So just playing Kb1 doesn't show what you have is working, until you have an evaluation and a PV that shows the f5 pawn falling, at least.
I don't trust evals for the most part. A position can be a forced draw even if your a full piece to the good. So a position that is +2 means nothing other than one side has an advantage. It doesn't mean its winning and it doesn't mean the first move of the PV is correct.

I've found that a much better method is to ignore the evaluation all together. Instead, if the first move, the opponents response, and the continuation move are all correct the engine has found the correct line of play. The problem is most opening and middle-game positions aren't forced wins, so there is no trail of bread crumbs to follow, (i.e. a monotonically increasing score with search depth.) Positions that are roughly even are much harder to analyze. In these cases, if you rely solely on the engines evaluation function, even if it comes from very deep searches, you could easily be lead astray.

bob wrote:Note all of the above with a 64kb (4k entry) hash table. To get to depth 42 requires 7.2M nodes. That GREATLY overwrites 4K entries. Yet it works just as well as 64M entries...
There is a problem with your node counting. It tells you nothing about the number of unique positions seen. Without any pawns being captured there are only about 2,600 possible position in fine#70. So, I'm not impressed by the fact that the solution can be found with 4,096 entry TT. Even a 256 entry TT should be sufficient to solve this problem.

Fine#70 has nothing to do with the OP's problem so, all of this is way off the mark.

It seems to me that you're telling the OP that the solution to his problem is to use Crafty with a 4k entry TT.

Because if that's not what your trying to get across, then your not addressing the OP's problem. And I don't think Fine#70 will have anything to do with a solution to the problem and neither will “being very careful” with replacement policies.

I think we have gone as far with this discussion as I care to.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by bob »

Zenmastur wrote:
bob wrote:It is not about finding the right move. It is about finding the right move for the right reason. But there is still more here.

1. Crafty finds Kb1 at depth=9. But the eval is wrong. You find the right move for the right answer when your eval hits +2 or so.

2. Crafty finds Kb1 with eval of +2 at depth 19 to 21 depending on hash size, in less than 10 milliseconds.

3. More importantly, it hits depth 42 with a score of +12 in under one second, using just one core.

So just playing Kb1 doesn't show what you have is working, until you have an evaluation and a PV that shows the f5 pawn falling, at least.
I don't trust evals for the most part. A position can be a forced draw even if your a full piece to the good. So a position that is +2 means nothing other than one side has an advantage. It doesn't mean its winning and it doesn't mean the first move of the PV is correct.

I've found that a much better method is to ignore the evaluation all together. Instead, if the first move, the opponents response, and the continuation move are all correct the engine has found the correct line of play. The problem is most opening and middle-game positions aren't forced wins, so there is no trail of bread crumbs to follow, (i.e. a monotonically increasing score with search depth.) Positions that are roughly even are much harder to analyze. In these cases, if you rely solely on the engines evaluation function, even if it comes from very deep searches, you could easily be lead astray.

bob wrote:Note all of the above with a 64kb (4k entry) hash table. To get to depth 42 requires 7.2M nodes. That GREATLY overwrites 4K entries. Yet it works just as well as 64M entries...
There is a problem with your node counting. It tells you nothing about the number of unique positions seen. Without any pawns being captured there are only about 2,600 possible position in fine#70. So, I'm not impressed by the fact that the solution can be found with 4,096 entry TT. Even a 256 entry TT should be sufficient to solve this problem.

Fine#70 has nothing to do with the OP's problem so, all of this is way off the mark.

It seems to me that you're telling the OP that the solution to his problem is to use Crafty with a 4k entry TT.

Because if that's not what your trying to get across, then your not addressing the OP's problem. And I don't think Fine#70 will have anything to do with a solution to the problem and neither will “being very careful” with replacement policies.

I think we have gone as far with this discussion as I care to.

Regards,

Zen
I agree. However your PV did NOT show the correct line of play. Nor does Crafty's until depth 19 or so where it actually sees the forced win of a pawn. If you let it search for a minute or so, you get this which certainly proves it "sees" everything here...

Code: Select all

         63    59.40/16&#58;39   Mat40   1. Kb1 Ka8 2. Kb2 Ka7 3. Kb3 Ka6 4. Kc2
                                     Kb7 5. Kc3 Kb6 6. Kc4 Ka6 7. Kd3 Kb7
                                     8. Ke2 Kc7 9. Kf2 Kd7 10. Kg3 Ke7 11. Kh4
                                     Kf6 12. Kh5 Kf7 13. Kg5 Kg7 14. Kxf5 Kf7
                                     15. Kg5 Kg7 16. f5 Kf7 17. f6 Kf8 18. Kg4
                                     Kg8 19. Kf4 Kf8 20. Kg5 Kf7 21. Kf5 Kf8
                                     22. Ke6 Ke8 23. Kxd6 Kf7 24. Kc7 Kg6
                                     25. d6 Kxf6 26. Kb6 Kf5 27. Kxa5 <EGTB>
But the key here is you HAVE to win the pawn to force the win. And until you see winning the pawn, playing Kb1 is meaningless, right move, wrong reason.

I wasn't telling him using a 4K table solved anything. I was telling YOU that a 4K entry table works quite well, WAY oversubscribed, and it doesn't miss any important transpositions that occur across your self-imposed boundary near the tips...

That "trick" is just (a) not needed and (b) damaging to search efficiency. Granted if you are sloppy about replacement, maybe limiting transpositions to all but the last N plies will work. But I have yet to see a case where this has been shown to be necessary or even useful...

And I can clearly see how it hurts to miss transpositions across that boundary.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by Zenmastur »

bob wrote:I agree. However your PV did NOT show the correct line of play. Nor does Crafty's until depth 19 or so where it actually sees the forced win of a pawn. If you let it search for a minute or so, you get this which certainly proves it "sees" everything here...

Code: Select all

         63    59.40/16&#58;39   Mat40   1. Kb1 Ka8 2. Kb2 Ka7 3. Kb3 Ka6 4. Kc2
                                     Kb7 5. Kc3 Kb6 6. Kc4 Ka6 7. Kd3 Kb7
                                     8. Ke2 Kc7 9. Kf2 Kd7 10. Kg3 Ke7 11. Kh4
                                     Kf6 12. Kh5 Kf7 13. Kg5 Kg7 14. Kxf5 Kf7
                                     15. Kg5 Kg7 16. f5 Kf7 17. f6 Kf8 18. Kg4
                                     Kg8 19. Kf4 Kf8 20. Kg5 Kf7 21. Kf5 Kf8
                                     22. Ke6 Ke8 23. Kxd6 Kf7 24. Kc7 Kg6
                                     25. d6 Kxf6 26. Kb6 Kf5 27. Kxa5 <EGTB>
Yeah... I grabbed the first line that had the correct first move. I was running it on a clean system with no EGTB. I didn't save the analysis. I just re-did it but I have a bunch of stuff running that I can't stop without losing a day or more of work. It gets the first three moves correct at ply 17 but the time isn't accurate because the CPU's were pegged at 100% utilization running other processes before I started the test. Even so, it only took 5 milliseconds.

Code: Select all

2016-06-16 22&#58;05&#58;46.006<--1&#58;info depth 17 seldepth 25 multipv 1 score cp 213 nodes 90060 nps 18012000 tbhits 0 time 5 pv a1b1 a7b7 b1c1 b7c7 c1d1 c7b8 d1c2 b8b7 c2c3 b7c7 c3d3 c7b7 d3e3 b7b6 e3f3 b6b7 f3g3 b7a8 g3h4 a8b8 h4g5
Later on it finds a mate in 32.
bob wrote:But the key here is you HAVE to win the pawn to force the win. And until you see winning the pawn, playing Kb1 is meaningless, right move, wrong reason.
I personally hate depending on evaluations but an engine has no choice. It must rely on its evaluations or fail. In this particular case winning a pawn without compensation is enough to conclude you are winning. So I agree with that assessment. Once that occurs it's likely immaterial if you continue the search or not. But I don't think it's necessary to win material to conclude which move is “best” in general. The fact that you can do it in this particular problem doesn't mean much to the “general” case. I think you can, and do select the best move regardless of finding material gains on almost every move made. So, that part, I don't agree with.
bob wrote:I wasn't telling him using a 4K table solved anything. I was telling YOU that a 4K entry table works quite well, WAY oversubscribed, and it doesn't miss any important transpositions that occur across your self-imposed boundary near the tips...
I disagree with the “WAY over-subscribed” characterization you use. I think it's dead wrong!

First, I think we need a definition of what subscription means. In every text I've read this is defined as UNIQUE DATA written to the TT. If the data isn't unique it will simply update an entry that is already in the TT. These non-unique data updates don't count towards the subscription factor. So probes that hit existing data with matching keys WILL NOT affect the over-subscription level of the TT. The number of TT hits (even if they don't produce useful data) need to be subtracted from the total number of writes to the TT. In Fine#70 this will be a huge fraction of all probes.

Second, I found tree->nodes_searched++; in your q-search even though you don't probe the TT there. So your node counts ARE NOT EVEN REMOTELY close to the number of writes to the TT. All of these need to be removed from the node count in order to get a reasonable idea of the “true” subscription factor of the TT.

Third, in an over-subscribed TT, entries will be overwritten. If the same position is searched again, a very likely proposition in an iterative deeping search, it will be written to the TT again, but it will appear as if it's unique data when in fact it isn't. This becomes a huge problem for over reporting unique writes.

Fourth, until a pawn is captured there are less than 5,200 possible legal positions in fine#70. I originally said the number was roughly 2,600 hundred, but I forgot that each position can have either player “on the move” so the number doubles. It requires at least 14 moves for the black king to reach a1, a2, or a3 even if the white king were removed from the board. It would require at least 15 moves (30-plies) to reach all legal positions with all the pawns still present, because half of them require an identical position with the opposite side on the move. This requires some form of triangulation. Getting the same position with the opposite side on the move will require at least 1 extra move if the triangulation can be preformed simultaneous with the movement to a square. Other wise more moves will be required.

Fifth, the q-search should see the pawn dropping for free before the main search gets that far. How much sooner depends on your q-search. So, my suspicion is at move 13 or 14. If you count just unique writes to the TT (i.e. not record TT hits AND not record nodes in the q-search that by design can't cause information to be written to the TT) a 4,096 entry TT may still have empty entries in it. In any case, it WILL NOT BE any were near as heavily oversubscribe as you are claiming if it's over-subscribed at all!

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by hgm »

Zenmastur wrote:Second, I found tree->nodes_searched++; in your q-search even though you don't probe the TT there. So your node counts ARE NOT EVEN REMOTELY close to the number of writes to the TT. All of these need to be removed from the node count in order to get a reasonable idea of the “true” subscription factor of the TT.
There isn't much to q-search in Fine #70...
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by Zenmastur »

I think we should wait until the numbers are corrected before we make that assessment.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by bob »

Zenmastur wrote:
bob wrote:I agree. However your PV did NOT show the correct line of play. Nor does Crafty's until depth 19 or so where it actually sees the forced win of a pawn. If you let it search for a minute or so, you get this which certainly proves it "sees" everything here...

Code: Select all

         63    59.40/16&#58;39   Mat40   1. Kb1 Ka8 2. Kb2 Ka7 3. Kb3 Ka6 4. Kc2
                                     Kb7 5. Kc3 Kb6 6. Kc4 Ka6 7. Kd3 Kb7
                                     8. Ke2 Kc7 9. Kf2 Kd7 10. Kg3 Ke7 11. Kh4
                                     Kf6 12. Kh5 Kf7 13. Kg5 Kg7 14. Kxf5 Kf7
                                     15. Kg5 Kg7 16. f5 Kf7 17. f6 Kf8 18. Kg4
                                     Kg8 19. Kf4 Kf8 20. Kg5 Kf7 21. Kf5 Kf8
                                     22. Ke6 Ke8 23. Kxd6 Kf7 24. Kc7 Kg6
                                     25. d6 Kxf6 26. Kb6 Kf5 27. Kxa5 <EGTB>
Yeah... I grabbed the first line that had the correct first move. I was running it on a clean system with no EGTB. I didn't save the analysis. I just re-did it but I have a bunch of stuff running that I can't stop without losing a day or more of work. It gets the first three moves correct at ply 17 but the time isn't accurate because the CPU's were pegged at 100% utilization running other processes before I started the test. Even so, it only took 5 milliseconds.

Code: Select all

2016-06-16 22&#58;05&#58;46.006<--1&#58;info depth 17 seldepth 25 multipv 1 score cp 213 nodes 90060 nps 18012000 tbhits 0 time 5 pv a1b1 a7b7 b1c1 b7c7 c1d1 c7b8 d1c2 b8b7 c2c3 b7c7 c3d3 c7b7 d3e3 b7b6 e3f3 b6b7 f3g3 b7a8 g3h4 a8b8 h4g5
Later on it finds a mate in 32.
bob wrote:But the key here is you HAVE to win the pawn to force the win. And until you see winning the pawn, playing Kb1 is meaningless, right move, wrong reason.
I personally hate depending on evaluations but an engine has no choice. It must rely on its evaluations or fail. In this particular case winning a pawn without compensation is enough to conclude you are winning. So I agree with that assessment. Once that occurs it's likely immaterial if you continue the search or not. But I don't think it's necessary to win material to conclude which move is “best” in general. The fact that you can do it in this particular problem doesn't mean much to the “general” case. I think you can, and do select the best move regardless of finding material gains on almost every move made. So, that part, I don't agree with.
bob wrote:I wasn't telling him using a 4K table solved anything. I was telling YOU that a 4K entry table works quite well, WAY oversubscribed, and it doesn't miss any important transpositions that occur across your self-imposed boundary near the tips...
I disagree with the “WAY over-subscribed” characterization you use. I think it's dead wrong!

First, I think we need a definition of what subscription means. In every text I've read this is defined as UNIQUE DATA written to the TT. If the data isn't unique it will simply update an entry that is already in the TT. These non-unique data updates don't count towards the subscription factor. So probes that hit existing data with matching keys WILL NOT affect the over-subscription level of the TT. The number of TT hits (even if they don't produce useful data) need to be subtracted from the total number of writes to the TT. In Fine#70 this will be a huge fraction of all probes.

Second, I found tree->nodes_searched++; in your q-search even though you don't probe the TT there. So your node counts ARE NOT EVEN REMOTELY close to the number of writes to the TT. All of these need to be removed from the node count in order to get a reasonable idea of the “true” subscription factor of the TT.

Third, in an over-subscribed TT, entries will be overwritten. If the same position is searched again, a very likely proposition in an iterative deeping search, it will be written to the TT again, but it will appear as if it's unique data when in fact it isn't. This becomes a huge problem for over reporting unique writes.

Fourth, until a pawn is captured there are less than 5,200 possible legal positions in fine#70. I originally said the number was roughly 2,600 hundred, but I forgot that each position can have either player “on the move” so the number doubles. It requires at least 14 moves for the black king to reach a1, a2, or a3 even if the white king were removed from the board. It would require at least 15 moves (30-plies) to reach all legal positions with all the pawns still present, because half of them require an identical position with the opposite side on the move. This requires some form of triangulation. Getting the same position with the opposite side on the move will require at least 1 extra move if the triangulation can be preformed simultaneous with the movement to a square. Other wise more moves will be required.

Fifth, the q-search should see the pawn dropping for free before the main search gets that far. How much sooner depends on your q-search. So, my suspicion is at move 13 or 14. If you count just unique writes to the TT (i.e. not record TT hits AND not record nodes in the q-search that by design can't cause information to be written to the TT) a 4,096 entry TT may still have empty entries in it. In any case, it WILL NOT BE any were near as heavily oversubscribe as you are claiming if it's over-subscribed at all!

Regards,

Zen
You do realize that in fine 70 q-search is minimal?

But none of this is really relevant to the discussion about two tables. If you disallow transpositions between the last N plies and the rest of the search, that hurts performance, no way around it. And that is exactly what you are doing.

There MUST be some other reason to justify a second local TT for the last few plies to get any attention here. And I can't see what that reason is, and without it, all I can see is a performance loss...

when I have time tonight, I can easily add a counter for the number of overwrites done and print it out ply by ply...

The last time I collected similar data, in a normal position about 1/3 of the hash stores were overwrites of the same position, the rest were writing something new to the table. In fine 70 the ratio was about the same, although the number of stores per ply drops significantly.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by Zenmastur »

bob wrote:You do realize that in fine 70 q-search is minimal?
What I realize is this:

1.) If you call your q-search 2-million times and only 5% add 1 or more nodes to the count then your total count will be way over 100,000 nodes. This will make a 4,096 entry TT look like it's over-subscribed by more than 20 to 1 when it may not even be full yet.
2.) Many of the nodes counted in an iterative deepening search get counted again with each successive iteration. This is what happens if the evaluation doesn't change from one iteration to the next AND there are no re-searches conducted. If a search fails or has to be re-searched one or more times then the number of times some nodes are counted can grow to more than the iteration count. The lower the EBF is the higher the fraction of nodes that are recounted with every iteration. Fine#70 will have a VERY LOW EBF. This, by itself, is enough to drive the node counts high enough to make it appear that a 4,096 byte TT is highly over-subscribed,when in fact it isn't.
3.) There aren't many legal positions until after the the first pawn is captured. At ply 6 there is only 130 and at ply 12 there are 565 possible positions that can be reached from the starting position. At ply 26 the number is less is less than 5,200. Every single engine I've tested gives huge over-counts at these depths because a node count isn't an accurate reflection of TT subscription rates. PERIOD!
bob wrote:But none of this is really relevant to the discussion about two tables...
Well, let me point something out to you that you either haven't considered or if you did, you didn't consider it very well.

By not probing the TT in the q-search you are throwing away a huge number of transpositions.
bob wrote:...The q-search alone can be 90% of the total nodes searched.

Finally, at least in my code, EVERY node that is searched causes a probe, AND a store when that node is completed. No matter what the outcome, it is stored.
The 90% figure you gave is nothing more than a convenient figure. The fact is that this number varies depending on position complexity and could be much higher than 90% in complicated positions. This being the case, you are throwing away a huge number of transpositions. You have zero chance of catching any of these transpositions because the data isn't even stored. In compensation for this you get a TT that is less over-subscribed than it other wise would be, you cut down the TLB thrashing and avoid a lot of memory wait time.
bob wrote:... If you disallow transpositions between the last N plies and the rest of the search, that hurts performance, no way around it. And that is exactly what you are doing.
I NEVER said I was going to disallow anything. You seem to think that a transposition will be missed JUST BECAUSE it crosses the boundary between the two TT's. I disagree!

The most common type of transposition is constructed by a simple re-ordering of moves. We'll call these Type I transpositions. Since black and white alternate moves you end up with two ordered lists that are interleaved. One list for black and the other for white. If we represent one sides moves by capital letters and the other sides moves by numbers then we can construct a lists of all possible transpositions of any length. Below is four list of transpositions with lengths up to six plies.

Three plies:
A1B B1A

Four plies:
A1B2 B1A2 A2B1 B2A1

Five Plies:
A1B2C A1C2B B1A2C B1C2A
C1A2B C1B2A A2B1C A2C1B
B2A1C B2C1A C2A1B C2B1A

Six Plies:
A1B2C3 A1B3C2 A2B1C3 A2B3C1 A3B1C2 A3B2C1
A1C2B3 A1C3B2 A2C1B3 A2C3B1 A3C1B2 A3C2B1
B1A2C3 B1A3C2 B2A1C3 B2A3C1 B3A1C2 B3A2C1
B1C2A3 B1C3A2 B2C1A3 B2C3A1 B3C1A2 B3C2A1
C1A2B3 C1A3B2 C2A1B3 C2A3B1 C3A1B2 C3A2B1
C1B2A3 C1B3A2 C2B1A3 C2B3A1 C3B1A2 C3B2A1

I expect a split TT like the one I've described to catch the vast majority of the transpositions listed with high probability. Being able to do this depends on the size of the L1 TT. This will require more than just a 2K entry L1 TT which is precisely why I made these comments:
Zenmastur wrote: By allowing the TT more entries than is required for an N-ply search and allowing the shallowest plies more space than they would normally occupy, they will be preserved longer in the L1 TT ...
Zenmastur wrote:So, by allowing a larger L1 TT than necessary to hold an N-ply search and reserving extra space for the shallowest entries, which are, in fact, the fewest in number, I believe that most transposition will be caught. For those that are not caught, they may expand the tree, but this part of the tree will be entirely contained in the L1 TT....
I've been thinking all along that an 8K entry TT would be sufficient for this purpose. This provides enough room to store a 4-ply search (~2K nodes) and the nodes that are most important to finding transpositions that cross the L1-L2 TT boundary.

But, if you are right about the L1 TT not residing in one of the 3 CPU caches, then there is little reason to keep the L1 TT this small. As long as the number of memory pages it uses is well under the TLB size then the TLB problem is still solved and I don't see any problem with making it larger than 8K entries. (i.e. 128 pages of 4k each would allow a 512K L1 TT to contain 64K entries of 8-bytes each.) This seems more than sufficient to do the job.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by bob »

Zenmastur wrote:
bob wrote:You do realize that in fine 70 q-search is minimal?
What I realize is this:

1.) If you call your q-search 2-million times and only 5% add 1 or more nodes to the count then your total count will be way over 100,000 nodes. This will make a 4,096 entry TT look like it's over-subscribed by more than 20 to 1 when it may not even be full yet.
2.) Many of the nodes counted in an iterative deepening search get counted again with each successive iteration. This is what happens if the evaluation doesn't change from one iteration to the next AND there are no re-searches conducted. If a search fails or has to be re-searched one or more times then the number of times some nodes are counted can grow to more than the iteration count. The lower the EBF is the higher the fraction of nodes that are recounted with every iteration. Fine#70 will have a VERY LOW EBF. This, by itself, is enough to drive the node counts high enough to make it appear that a 4,096 byte TT is highly over-subscribed,when in fact it isn't.
3.) There aren't many legal positions until after the the first pawn is captured. At ply 6 there is only 130 and at ply 12 there are 565 possible positions that can be reached from the starting position. At ply 26 the number is less is less than 5,200. Every single engine I've tested gives huge over-counts at these depths because a node count isn't an accurate reflection of TT subscription rates. PERIOD!
bob wrote:But none of this is really relevant to the discussion about two tables...
Well, let me point something out to you that you either haven't considered or if you did, you didn't consider it very well.

By not probing the TT in the q-search you are throwing away a huge number of transpositions.
bob wrote:...The q-search alone can be 90% of the total nodes searched.

Finally, at least in my code, EVERY node that is searched causes a probe, AND a store when that node is completed. No matter what the outcome, it is stored.
The 90% figure you gave is nothing more than a convenient figure. The fact is that this number varies depending on position complexity and could be much higher than 90% in complicated positions. This being the case, you are throwing away a huge number of transpositions. You have zero chance of catching any of these transpositions because the data isn't even stored. In compensation for this you get a TT that is less over-subscribed than it other wise would be, you cut down the TLB thrashing and avoid a lot of memory wait time.
bob wrote:... If you disallow transpositions between the last N plies and the rest of the search, that hurts performance, no way around it. And that is exactly what you are doing.
I NEVER said I was going to disallow anything. You seem to think that a transposition will be missed JUST BECAUSE it crosses the boundary between the two TT's. I disagree!

The most common type of transposition is constructed by a simple re-ordering of moves. We'll call these Type I transpositions. Since black and white alternate moves you end up with two ordered lists that are interleaved. One list for black and the other for white. If we represent one sides moves by capital letters and the other sides moves by numbers then we can construct a lists of all possible transpositions of any length. Below is four list of transpositions with lengths up to six plies.

Three plies:
A1B B1A

Four plies:
A1B2 B1A2 A2B1 B2A1

Five Plies:
A1B2C A1C2B B1A2C B1C2A
C1A2B C1B2A A2B1C A2C1B
B2A1C B2C1A C2A1B C2B1A

Six Plies:
A1B2C3 A1B3C2 A2B1C3 A2B3C1 A3B1C2 A3B2C1
A1C2B3 A1C3B2 A2C1B3 A2C3B1 A3C1B2 A3C2B1
B1A2C3 B1A3C2 B2A1C3 B2A3C1 B3A1C2 B3A2C1
B1C2A3 B1C3A2 B2C1A3 B2C3A1 B3C1A2 B3C2A1
C1A2B3 C1A3B2 C2A1B3 C2A3B1 C3A1B2 C3A2B1
C1B2A3 C1B3A2 C2B1A3 C2B3A1 C3B1A2 C3B2A1

I expect a split TT like the one I've described to catch the vast majority of the transpositions listed with high probability. Being able to do this depends on the size of the L1 TT. This will require more than just a 2K entry L1 TT which is precisely why I made these comments:
Zenmastur wrote: By allowing the TT more entries than is required for an N-ply search and allowing the shallowest plies more space than they would normally occupy, they will be preserved longer in the L1 TT ...
Zenmastur wrote:So, by allowing a larger L1 TT than necessary to hold an N-ply search and reserving extra space for the shallowest entries, which are, in fact, the fewest in number, I believe that most transposition will be caught. For those that are not caught, they may expand the tree, but this part of the tree will be entirely contained in the L1 TT....
I've been thinking all along that an 8K entry TT would be sufficient for this purpose. This provides enough room to store a 4-ply search (~2K nodes) and the nodes that are most important to finding transpositions that cross the L1-L2 TT boundary.

But, if you are right about the L1 TT not residing in one of the 3 CPU caches, then there is little reason to keep the L1 TT this small. As long as the number of memory pages it uses is well under the TLB size then the TLB problem is still solved and I don't see any problem with making it larger than 8K entries. (i.e. 128 pages of 4k each would allow a 512K L1 TT to contain 64K entries of 8-bytes each.) This seems more than sufficient to do the job.

Regards,

Zen
Why are you stopping at ply=26. Feel free to go to ply 50. Here are the actual counts.

ply=26 hast stores done = 36840 Positions overwritten = 2697
ply=30 stores=66755 overwrites=6817

(note, overwrites means storing on top of an entry where signatures do NOT match).

ply=35 stores=184757 overwrites=64631
ply=40 stores=1000538 overwrites=654789
ply=45 stores=4729806 overwrites=3698845
ply=50 stores=42M overwrites=36M

Does that prove a thing? Only that my tree is not blowing up, as the branching factor remains consistent. If the TT were getting swamped by shallow entries the search would die.

Somehow you are overlooking some SERIOUS math flaws. The shallowest positions are not "the fewest in number". This is an exponential tree. Those nodes near the tip are the huge majority of total positions searched. How does this slip by your analysis???

My point is this, and only this. You fix the boundary between the two tables wherever you want. I claim, and can probably prove with some coding changes, that wherever you fix that boundary, you prevent transpositions from occurring across that boundary. And I will remind you these trees are NOT searched to a fixed depth. In Crafty, null-move R can range up to 8 or beyond, and the LMR reductions are about the same. So a path can have only 6 moves in it and represent a 30-40 ply search from the root. And it can transpose easily...
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Ok, Maybe I didn't make it clear in previous posts...

Post by Zenmastur »

bob wrote:Why are you stopping at ply=26. Feel free to go to ply 50.
What would be the point? The root move will not change after ply 26 if your q-search is working properly. The q-search should discover that the f5 pawn is free at ply 26, there is no larger piece that can be captured and no earlier search can force the loss of pawn. Therefore nothing but an erroneous search result can change the root move after ply 26.
bob wrote:Here are the actual counts.

ply=26 hast stores done = 36840 Positions overwritten = 2697
ply=30 stores=66755 overwrites=6817
Just as I suspected. At 26 plies you have 36,840 stores and only 2,697 of them are over-writes. This means that AT MOST, the engine saw 6,793 positions. Not exactly what I would describe as highly over-subscribed TT. In fact, most moves made during TCEC suffer from a greater over-subscription factor than this. (i.e. 4,096/6,793 = ~1.66 )

It's actually very unlikely that the engine saw 6,793 unique nodes. It probably saw about 5,200 just like I said. The rest “appear” to be unique as I explained here:
Zenmastur wrote:...Third, in an over-subscribed TT, entries will be overwritten. If the same position is searched again, a very likely proposition in an iterative deepening search, it will be written to the TT again, but it will appear as if it's unique data when in fact it isn't. This becomes a huge problem for over reporting unique writes...
It probably saw approximately 5,200, just like I said, but only had room for 4,096. About 1,100 or so were over written during the course of 26 iterations and any re-searches done due to a fail-high/low condition. When the nodes that were over-written, were seen again in the next iteration, they “appeared” to be unique data because they didn't match anything in the bucket they were being written to. Just like I said!
bob wrote:...(note, overwrites means storing on top of an entry where signatures do NOT match).

ply=35 stores=184757 overwrites=64631
ply=40 stores=1000538 overwrites=654789
ply=45 stores=4729806 overwrites=3698845
ply=50 stores=42M overwrites=36M

Does that prove a thing? Only that my tree is not blowing up, as the branching factor remains consistent. If the TT were getting swamped by shallow entries the search would die.
I highly doubt your last statement. As I recall you have a triangular array that stores the PV at every iteration. This array is undoubtedly larger than a 4,096 entry TT. While I haven't looked to see exactly how and when it's used my suspicion is that if the entire TT were cleared between iterations the previous iteration's PV would still be in the triangular array to guide the search. This array constitutes a large secondary cache of useful data for the engine to draw upon.

After ply 26 nothing should ever cause it to change it's root move. So as far as I'm concerned it makes no difference if you end the search at 26 plies or continue for ever. The game should be decided by the q-search at ply 26 in which it should be determined that white wins a black pawn without compensation.
bob wrote:Somehow you are overlooking some SERIOUS math flaws.


I think it's more likely that you're misinterpreting the data you presented. My counts at 26 plies might not be perfect, but I don't think they're in serious error.
bob wrote:The shallowest positions are not "the fewest in number". This is an exponential tree. Those nodes near the tip are the huge majority of total positions searched. How does this slip by your analysis???
I think you are confusing a shallow position with a leaf node. A shallow position is near the root not near a leaf. So in a search a shallow position would be from ply 1 to some number less than half the search depth. You seem to be thinking I meant a node near to the leaf nodes.
bob wrote:My point is this, and only this. You fix the boundary between the two tables wherever you want. I claim, and can probably prove with some coding changes, that wherever you fix that boundary, you prevent transpositions from occurring across that boundary.
I don't want you to go to the trouble if you have other things you would rather attend to. Even so, I don't think it can be proven, but if you attempt to prove it I would love to see your data as well as the conditions under which the experiment(s) were performed.
bob wrote:And I will remind you these trees are NOT searched to a fixed depth. In Crafty, null-move R can range up to 8 or beyond, and the LMR reductions are about the same. So a path can have only 6 moves in it and represent a 30-40 ply search from the root. And it can transpose easily...
I've been aware since the very beginning that various reduction can greatly affect the depth of searches in different branches. This does not worry me in the slightest. I doubt that this will be a problem since these are reductions to the search depth. I believe this will be handled nicely, but just in case it's a problem, I have an easy fix worked out.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.