Hi,
My engine is actually using a simple minimax (no AlphaBeta) function for the moment. I want to get it completely functional before I write the AB search and all the other improvements.
This week I implemented a transposition table, because I want to be able to retrieve the principal variation from it. So I implemented it into my Minimax function and I get confused by the results.
If I use a small hash table of 4Kb (256 entries), the search for my test positions takes 34.4s at 8.5M nps searching 293M nodes.
If I use a 32Mb hash table the search takes 22.6s at 5M nps searching 112M nodes
and if I use a 256Mb the search takes 25.2s at 4.3M nps searching 107M nodes.
I was expecting a drop in nodes per seconds, but it seems that with a 256Mb hash tables, the nps drop by about 50% making the search even slower than with a 32 Mo hash table.
However, when I profile my engine, the profiling code seems to use only <1% of the total cpu time. What does that means? Do I have a bug or is this due the cache miss the huge table will inevitably cause?
Transposition Table and nps drop
Moderators: hgm, Rebel, chrisw
-
- Posts: 287
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
Transposition Table and nps drop
Mathieu Pagé
mathieu@mathieupage.com
mathieu@mathieupage.com
-
- Posts: 27811
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Transposition Table and nps drop
How do you count your nodes? If a move leads to a position that is found in the hash table, do you count that as a node, or do you only count nodes where you do move generation, or only count leave nodes where you do evaluation?
What is your eplacement algorithm? Do you replace everything, or do you make an attempt to spare deep eentries
How many test positions do you average (or sum) to get the quoted times?
What is your eplacement algorithm? Do you replace everything, or do you make an attempt to spare deep eentries
How many test positions do you average (or sum) to get the quoted times?
-
- Posts: 287
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
Re: Transposition Table and nps drop
Hi,hgm wrote:How do you count your nodes? If a move leads to a position that is found in the hash table, do you count that as a node, or do you only count nodes where you do move generation, or only count leave nodes where you do evaluation?
What is your eplacement algorithm? Do you replace everything, or do you make an attempt to spare deep eentries
How many test positions do you average (or sum) to get the quoted times?
I count every calls to Minimax, so it counts internal nodes and leaf nodes. At a position if a move is found in the transposition table it's counted as 1 node.
I actually use a replace if deeper or equal scheme. I'll try to use a replace always one to see what happens.
The data I provided are the sum of 6 positions searched to depth 5 or 6. The positions are the ones used by the benchmark command of Crafty.
Thanks for your help.
Mathieu Pagé
mathieu@mathieupage.com
mathieu@mathieupage.com
-
- Posts: 287
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
Re: Transposition Table and nps drop
Hi,
I just tried the sames test with a replace always scheme and the results are only slightly better.
Can all the caches miss on the TT probe add ups to 50% of the CPU time?
I just tried the sames test with a replace always scheme and the results are only slightly better.
Can all the caches miss on the TT probe add ups to 50% of the CPU time?
Mathieu Pagé
mathieu@mathieupage.com
mathieu@mathieupage.com
-
- Posts: 27811
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Transposition Table and nps drop
Yes, easily. To fetch randomly addressed data from memory takes 386 CPU clock cycles on my 1.3GHz Pentium M. On faster machines, it will take even more CPU clocks, as only the CPU will be faster, not the memory. And a CPU can do between 2 and 3 instructions per clock cycle. So doing a single probe costs you as much as ~1000 instructions.
I do not know what exactly you do in your leave nodes, but if you do one hash probe, and less than 1000 instructions of work, the TT probes take 50%.
I do not know what exactly you do in your leave nodes, but if you do one hash probe, and less than 1000 instructions of work, the TT probes take 50%.
-
- Posts: 892
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
Re: Transposition Table and nps drop
Without minimal positional evaluation, alpha-beta pruning and quiescence search your search trees are artificial. Optimizing performance in such unnatural conditions can make final code even worse.
Re: Transposition Table and nps drop
What are you doing with your table ?
Retrieving the pv for the search at the next depth is not enought.
Are you using it to do some cutoff ?
Are you using it to order the moves with IID ?
HJ.
Retrieving the pv for the search at the next depth is not enought.
Are you using it to do some cutoff ?
Are you using it to order the moves with IID ?
HJ.
-
- Posts: 27811
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Transposition Table and nps drop
Well, he does do cutoffs, or his node count couldn't go down dependent on hash size. Ordering the moves would have little effect without alpha-beta.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Transposition Table and nps drop
I have never seen such a drop in NPS, unless you are either (a) so small that everything fits into cache until you start probing the hash which causes things to get replaced frequently; (b) using a hash implementation that is too slow, perhaps using multiple probes or something...mathmoi wrote:Hi,
My engine is actually using a simple minimax (no AlphaBeta) function for the moment. I want to get it completely functional before I write the AB search and all the other improvements.
This week I implemented a transposition table, because I want to be able to retrieve the principal variation from it. So I implemented it into my Minimax function and I get confused by the results.
If I use a small hash table of 4Kb (256 entries), the search for my test positions takes 34.4s at 8.5M nps searching 293M nodes.
If I use a 32Mb hash table the search takes 22.6s at 5M nps searching 112M nodes
and if I use a 256Mb the search takes 25.2s at 4.3M nps searching 107M nodes.
I was expecting a drop in nodes per seconds, but it seems that with a 256Mb hash tables, the nps drop by about 50% making the search even slower than with a 32 Mo hash table.
However, when I profile my engine, the profiling code seems to use only <1% of the total cpu time. What does that means? Do I have a bug or is this due the cache miss the huge table will inevitably cause?
In crafty, nps is relatively stable regardless of hash size used, but the tree gets smaller (as yours apparently does) as the size is increased...
-
- Posts: 287
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
Re: Transposition Table and nps drop
Hi H.G.Muller,hgm wrote:Yes, easily. To fetch randomly addressed data from memory takes 386 CPU clock cycles on my 1.3GHz Pentium M. On faster machines, it will take even more CPU clocks, as only the CPU will be faster, not the memory. And a CPU can do between 2 and 3 instructions per clock cycle. So doing a single probe costs you as much as ~1000 instructions.
I do not know what exactly you do in your leave nodes, but if you do one hash probe, and less than 1000 instructions of work, the TT probes take 50%.
You pin pointed the issue. Before I added the TT code in my Minimax function at depth=0, I was returning Eval() wich is really fast since it only return an incrementally updated value. When I added my TT code, I did a probe _before_ I checked for depth=0, So I was doing an extra probe at each leaf nodes.
Thanks for you inputs.
Mathieu Pagé
mathieu@mathieupage.com
mathieu@mathieupage.com