Measuring hash size versus "time to depth"

gordonr · Post by **gordonr** » Fri Nov 28, 2014 2:45 pm

I've automated the timing of how long it takes to reach a certain depth, across multiple sample runs. I firstly wanted to compare number of cores and now want to compare hash size. I'm using only the start position. I wonder if this is significant?!

So I tested my i7-4930 using 8 GB hash and got an average of 455 seconds across 50 runs (to reach a depth of 35 using stockfish). I then tried 16 GB hash and the average was over 500 seconds. I guess this may be due to some TLB cache behaviour?! I then tried 4 GB and expected this to be worse than 8 GB but found it to be about 425 seconds.

Can I conclude that this is indeed best for my hardware? Or am I making some mistake in my sampling? Too few samples? Use different positions? Search to a deeper depth? (though a depth of 35 is enough to fill even 16 GB of hash I'd have thought: average of 7 or 8 minutes of searching with 6 cores).

hgm · Post by **hgm** » Fri Nov 28, 2014 10:55 pm

Time to depth is very noisy, in the sense that on doubling hash size many positions become slower, but even more become faster, and it is only the average that is relevant. So the results on a single position is meaningless. You should average at least over 100, and preferably over 1000. (That is the equivalent of only 10 test games, however.)

The depth is also relevant: once the tree of the depth you test at fits entirely in the hash table, you won't see any effect of further increasing the size. At what size that happens depends on the depth. (Or, when you use positions of different complexity, on the search time rather than the depth.)

gordonr · Post by **gordonr** » Sat Nov 29, 2014 8:31 am

hgm wrote:doubling hash size many positions become slower, but even more become faster, and it is only the average that is relevant. So the results on a single position is meaningless.

Ok, I accept this. Any simple explanation for why bigger can be better in general but some positions are worse?

hgm wrote:You should average at least over 100, and preferably over 1000.

I'll look at using an EPD file based on a range of positions. I avoided this because it makes it more difficult to judge what depth to use for timing purposes (deep enough to fill any size of hash table I use but not so long that doing enough samples is unfeasible).

hgm wrote:The depth is also relevant: once the tree of the depth you test at fits entirely in the hash table, you won't see any effect of further increasing the size. At what size that happens depends on the depth. (Or, when you use positions of different complexity, on the search time rather than the depth.)

I'm interested in doing overnight analysis for many hours so a relatively deep search. But for my tests, I didn't want too deep a "time to depth" since it would make it unfeasible to do many sample runs and I believe that due to "MP search luck" I need a good number of samples to get a reliable average. Maybe I'm wrong to think I can do a reliable test run in under 24 hours and that days are going to be required.

Thanks for your help.

Adam Hair · Post by **Adam Hair** » Sat Nov 29, 2014 12:48 pm

I have done some experimentation with time to depth for increasing number of cores in the past. And as H.G. wrote, more positions are better. Time to depth for individual positions was affected in unpredictable ways, but the central tendency for the time to depth as the conditions changed could be clearly determined.

I do suggest to use the median as the measure of central tendency rather than the average (mean). If you were to plot the time to depth measurements, you will see that the distribution is right skewed (long tail to the right), which will cause the computed average time to be higher than the time it takes for the majority of the positions. The times will be grouped more around the median. At least this is the case for the lower depths that I used (depth 28 for pre-Stockfish 5).

gordonr · Post by **gordonr** » Sun Nov 30, 2014 3:01 pm

Adam Hair wrote:I do suggest to use the median as the measure of central tendency rather than the average (mean)

I will try that. Thanks for your advice.

syzygy · Post by **syzygy** » Sun Nov 30, 2014 4:40 pm

gordonr wrote:I guess this may be due to some TLB cache behaviour?

Look at the nps and total nodes searched.

Increasing hash size should, at least on average, always decrease the number of nodes searched to reach the same depth. The longer/deeper you search, the more impact increased hash size will have on this number.

Increasing hash size may lead to decreased nps. If the decrease is substantial, it may outweigh the benefit of the decreased number of nodes. This will depend on how long/deep you search.

The decrease in nps depends on the hardware used and on whether large/huge pages are enabled. On Windows large pages need special configuration. On Linux the OS will attempt to use large pages transparently to the user.

gordonr · Post by **gordonr** » Sun Nov 30, 2014 6:05 pm

syzygy wrote:
gordonr wrote:I guess this may be due to some TLB cache behaviour?
Look at the nps and total nodes searched.

I hadn't considered that - will do.

Thanks for all the help.

Measuring hash size versus "time to depth"

Measuring hash size versus "time to depth"

Re: Measuring hash size versus "time to depth"

Re: Measuring hash size versus "time to depth"

Re: Measuring hash size versus "time to depth"

Re: Measuring hash size versus "time to depth"

Re: Measuring hash size versus "time to depth"

Re: Measuring hash size versus "time to depth"