Page 1 of 2

Transposition Table and nps drop

Posted: Wed Feb 27, 2008 1:35 pm
by mathmoi
Hi,

My engine is actually using a simple minimax (no AlphaBeta) function for the moment. I want to get it completely functional before I write the AB search and all the other improvements.

This week I implemented a transposition table, because I want to be able to retrieve the principal variation from it. So I implemented it into my Minimax function and I get confused by the results.

If I use a small hash table of 4Kb (256 entries), the search for my test positions takes 34.4s at 8.5M nps searching 293M nodes.

If I use a 32Mb hash table the search takes 22.6s at 5M nps searching 112M nodes

and if I use a 256Mb the search takes 25.2s at 4.3M nps searching 107M nodes.

I was expecting a drop in nodes per seconds, but it seems that with a 256Mb hash tables, the nps drop by about 50% making the search even slower than with a 32 Mo hash table.

However, when I profile my engine, the profiling code seems to use only <1% of the total cpu time. What does that means? Do I have a bug or is this due the cache miss the huge table will inevitably cause?

Re: Transposition Table and nps drop

Posted: Wed Feb 27, 2008 2:53 pm
by hgm
How do you count your nodes? If a move leads to a position that is found in the hash table, do you count that as a node, or do you only count nodes where you do move generation, or only count leave nodes where you do evaluation?

What is your eplacement algorithm? Do you replace everything, or do you make an attempt to spare deep eentries

How many test positions do you average (or sum) to get the quoted times?

Re: Transposition Table and nps drop

Posted: Wed Feb 27, 2008 3:03 pm
by mathmoi
hgm wrote:How do you count your nodes? If a move leads to a position that is found in the hash table, do you count that as a node, or do you only count nodes where you do move generation, or only count leave nodes where you do evaluation?

What is your eplacement algorithm? Do you replace everything, or do you make an attempt to spare deep eentries

How many test positions do you average (or sum) to get the quoted times?
Hi,

I count every calls to Minimax, so it counts internal nodes and leaf nodes. At a position if a move is found in the transposition table it's counted as 1 node.

I actually use a replace if deeper or equal scheme. I'll try to use a replace always one to see what happens.

The data I provided are the sum of 6 positions searched to depth 5 or 6. The positions are the ones used by the benchmark command of Crafty.

Thanks for your help.

Re: Transposition Table and nps drop

Posted: Wed Feb 27, 2008 3:44 pm
by mathmoi
Hi,

I just tried the sames test with a replace always scheme and the results are only slightly better.

Can all the caches miss on the TT probe add ups to 50% of the CPU time?

Re: Transposition Table and nps drop

Posted: Wed Feb 27, 2008 4:14 pm
by hgm
Yes, easily. To fetch randomly addressed data from memory takes 386 CPU clock cycles on my 1.3GHz Pentium M. On faster machines, it will take even more CPU clocks, as only the CPU will be faster, not the memory. And a CPU can do between 2 and 3 instructions per clock cycle. So doing a single probe costs you as much as ~1000 instructions.

I do not know what exactly you do in your leave nodes, but if you do one hash probe, and less than 1000 instructions of work, the TT probes take 50%.

Re: Transposition Table and nps drop

Posted: Wed Feb 27, 2008 4:18 pm
by Aleks Peshkov
Without minimal positional evaluation, alpha-beta pruning and quiescence search your search trees are artificial. Optimizing performance in such unnatural conditions can make final code even worse.

Re: Transposition Table and nps drop

Posted: Wed Feb 27, 2008 5:28 pm
by Harald Johnsen
What are you doing with your table ?

Retrieving the pv for the search at the next depth is not enought.

Are you using it to do some cutoff ?
Are you using it to order the moves with IID ?

HJ.

Re: Transposition Table and nps drop

Posted: Wed Feb 27, 2008 5:33 pm
by hgm
Well, he does do cutoffs, or his node count couldn't go down dependent on hash size. Ordering the moves would have little effect without alpha-beta.

Re: Transposition Table and nps drop

Posted: Wed Feb 27, 2008 5:54 pm
by bob
mathmoi wrote:Hi,

My engine is actually using a simple minimax (no AlphaBeta) function for the moment. I want to get it completely functional before I write the AB search and all the other improvements.

This week I implemented a transposition table, because I want to be able to retrieve the principal variation from it. So I implemented it into my Minimax function and I get confused by the results.

If I use a small hash table of 4Kb (256 entries), the search for my test positions takes 34.4s at 8.5M nps searching 293M nodes.

If I use a 32Mb hash table the search takes 22.6s at 5M nps searching 112M nodes

and if I use a 256Mb the search takes 25.2s at 4.3M nps searching 107M nodes.

I was expecting a drop in nodes per seconds, but it seems that with a 256Mb hash tables, the nps drop by about 50% making the search even slower than with a 32 Mo hash table.

However, when I profile my engine, the profiling code seems to use only <1% of the total cpu time. What does that means? Do I have a bug or is this due the cache miss the huge table will inevitably cause?
I have never seen such a drop in NPS, unless you are either (a) so small that everything fits into cache until you start probing the hash which causes things to get replaced frequently; (b) using a hash implementation that is too slow, perhaps using multiple probes or something...

In crafty, nps is relatively stable regardless of hash size used, but the tree gets smaller (as yours apparently does) as the size is increased...

Re: Transposition Table and nps drop

Posted: Thu Feb 28, 2008 3:00 am
by mathmoi
hgm wrote:Yes, easily. To fetch randomly addressed data from memory takes 386 CPU clock cycles on my 1.3GHz Pentium M. On faster machines, it will take even more CPU clocks, as only the CPU will be faster, not the memory. And a CPU can do between 2 and 3 instructions per clock cycle. So doing a single probe costs you as much as ~1000 instructions.

I do not know what exactly you do in your leave nodes, but if you do one hash probe, and less than 1000 instructions of work, the TT probes take 50%.
Hi H.G.Muller,

You pin pointed the issue. Before I added the TT code in my Minimax function at depth=0, I was returning Eval() wich is really fast since it only return an incrementally updated value. When I added my TT code, I did a probe _before_ I checked for depth=0, So I was doing an extra probe at each leaf nodes.

Thanks for you inputs.