For the experiment, I started n=1,2,4,8 independent processes with my engine Arminius searching the initial position. n-1 processes searched with no time limit to create background activity, 1 process was started with a fixed depth to measure nps.
Hard-/software
Core i7 860 processor 4 cores / 8 threads
32KB L1 data cache per core, 256KB L2 cache per core, 8MB L3 cache
8GB RAM in 4 modules for dual channel
2GB RAM in 1 module for single channel
Hyperthreading and Turbo Boost on
Windows 7 64 bit
Arminius (current development version) using 120MB main hash
Test results
Code: Select all
nps nps
processes single dual
1 2128K 2118K
2 2014K 2020K
4 1784K 1717K
8 1086K 1065K
Arminius does not use hashing in qsearch. The main hash table is not aligned to cache boundaries. Every hash entry has 16 bytes. One position on the board can be in one of four hash positions. These regions are overlapping, so if one positon on the board can be somewhere in hash position 1..4, another one can be somewhere in 2..5.
Arminius is a magic bitboard engine with a 705KB lookup table (90232 positions with 8 bytes each).
Pawn hash size is 44KB, 512 positions with 88 bytes each. When searching from the initial position like in this test the hit rate is lower than normal.
Material hash table size is 128KB, 4096 positions with 32 bytes each.
And finally there are many other tables like history table, bitboards of positions between 2 given squares, moves along a single file/line/diagonal...