Cache-friendier material index

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
hgm
Posts: 23718
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Cache-friendier material index

Post by hgm » Wed Mar 31, 2010 9:56 am

Even when you ignore promotions, the number of material combinations that can result from the initial setup is quite large. (About a quarter million.) It wil fit in L2, but not in L1. The obvious way to calculate a material index, e.g. nWN + 3*nBN + 9*nWB + 27*nBB + 81*nWR + 243*nBR + 729*nWQ + 1458*nBQ + 2916*nWP + 26244*nBP, has the disadvantage that the set of accessible material combinations is not mapped very densely on the table quite early in the game. E.g. when yu trade nights, nWN and nBN now can only run upto 1, so of the original 9 possible combinations (for which the space is reserved in the table) you only use 4. Trade a Bishop next to it an you are down to using 16 out of 81 entries, (~20%), while trading the second Knight would put you at 1 out of 9 (~11%). For the pieces with weights < 64, the size of the cache line, this is bad, as the unused entries will be forced to be together in cache with the used entries in the cache ine they share. So having many holes in the small-scale structure of the mapping causes poor L1 efficiency.

In an attempt to improve on that, I am now using a material index

2*243*nWR - (2*243-81)*nBR + (243+27+9)*nWB - (243+27)*nBB + (243+3)*nWN - (243+1)*nBN + 729*nWQ + 1458*nBQ + 2916*nWP + 26244*nBP

(So the weights of Rooks, Bishops and Knights are altered.)

This still gives a unique mapping, as can be easily seen when expressing the weights as trinary numbers: the number of BN can be deduced from the lowest digit, as all other weights have that digit zero, the WN from the next digit, then the WB, then the BB, etc. (Although BB and WB both affect the '27' digit, you already know nWB from the '9' digit, and thus can subtract their contribution first.)

The advantage of this mapping is that common trades keep you in the same cache line. E.g. trading Knights would change the index by only 2, trading a white Knight for a black Bishop would change it by 24 trading a white Rook for the black Bishop pair would change it by 54. So approximately neutral trades would map close by, thus increasing the probe density in the material table, and hence L1 cache hits on it. Very unequal trades (i.e. giving away material) would map far away, and thus produce cache misses, but presumably the search would kill such branches pretty quickly, or even allow the trade to be completed. The side that is leading would start null-moving, which will not alter the material imbalance, and only get back into action when the other side catches up by capturing something back (which brings you back in the vicinity of where you were in the table).

Gerd Isenberg
Posts: 2127
Joined: Wed Mar 08, 2006 7:47 pm
Location: Hattingen, Germany

Re: Cache-friendier material index

Post by Gerd Isenberg » Thu Apr 01, 2010 6:57 am

hgm wrote:Even when you ignore promotions, the number of material combinations that can result from the initial setup is quite large. (About a quarter million.) It wil fit in L2, but not in L1. The obvious way to calculate a material index, e.g. nWN + 3*nBN + 9*nWB + 27*nBB + 81*nWR + 243*nBR + 729*nWQ + 1458*nBQ + 2916*nWP + 26244*nBP, has the disadvantage that the set of accessible material combinations is not mapped very densely on the table quite early in the game. E.g. when yu trade nights, nWN and nBN now can only run upto 1, so of the original 9 possible combinations (for which the space is reserved in the table) you only use 4. Trade a Bishop next to it an you are down to using 16 out of 81 entries, (~20%), while trading the second Knight would put you at 1 out of 9 (~11%). For the pieces with weights < 64, the size of the cache line, this is bad, as the unused entries will be forced to be together in cache with the used entries in the cache ine they share. So having many holes in the small-scale structure of the mapping causes poor L1 efficiency.

In an attempt to improve on that, I am now using a material index

2*243*nWR - (2*243-81)*nBR + (243+27+9)*nWB - (243+27)*nBB + (243+3)*nWN - (243+1)*nBN + 729*nWQ + 1458*nBQ + 2916*nWP + 26244*nBP

(So the weights of Rooks, Bishops and Knights are altered.)

This still gives a unique mapping, as can be easily seen when expressing the weights as trinary numbers: the number of BN can be deduced from the lowest digit, as all other weights have that digit zero, the WN from the next digit, then the WB, then the BB, etc. (Although BB and WB both affect the '27' digit, you already know nWB from the '9' digit, and thus can subtract their contribution first.)

The advantage of this mapping is that common trades keep you in the same cache line. E.g. trading Knights would change the index by only 2, trading a white Knight for a black Bishop would change it by 24 trading a white Rook for the black Bishop pair would change it by 54. So approximately neutral trades would map close by, thus increasing the probe density in the material table, and hence L1 cache hits on it. Very unequal trades (i.e. giving away material) would map far away, and thus produce cache misses, but presumably the search would kill such branches pretty quickly, or even allow the trade to be completed. The side that is leading would start null-moving, which will not alter the material imbalance, and only get back into action when the other side catches up by capturing something back (which brings you back in the vicinity of where you were in the table).
Very nice. I never used pre-calculated perfect hashing so far, but on the fly calculation cached in a material hash table, where index is a vector of piece counters (i. e. weights are power of two) modulo some prime number as tables size. When was this 2^2 * 3^10 scheme first used or proposed in computer chess?

User avatar
hgm
Posts: 23718
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Cache-friendier material index

Post by hgm » Thu Apr 01, 2010 7:12 am

Rybka used the 2^2*3^10 scheme for its material table, but it is of course just normal multi-dimensional array indexing. The innovative feature in Rybka was that it made the ordering of the indices such that there would be no fatal crash on super-numerary promotions, by making Pawns correspond to the argest strides. So with multiple Queens the table-lookup would return garbage, but usually this happens in a stage where the game is already decided, so a noise of a few pawns on the eval in that case would not hurt.

A true modulo operation is extremely expensive, btw. Bob tried to use one for hash-tabe access in Crafty, and reported something like a 20% drop in NPS because of it. Better use prime weights and do modulo a power of two!

I thought about hashing material, but it seems I would just be doing what the cache algorithm does in hardware. So as long as the entire table fits in L2, and recalculation would take longer than an L2 read access, I guess there is no way hashing could ever compete.

grant
Posts: 67
Joined: Mon Aug 06, 2007 2:42 pm
Location: London, England

Re: Cache-friendier material index

Post by grant » Thu Apr 01, 2010 1:43 pm

Hi HG

I'm just a little bit puzzled by this.
The number of material combinations for
nWN + 3*nBN + 9*nWB + 27*nBB + 81*nWR + 243*nBR + 729*nWQ + 1458*nBQ + 2916*nWP + 26244*nBP
is 236196

The number of material combinations for
2*243*nWR - (2*243-81)*nBR + (243+27+9)*nWB - (243+27)*nBB + (243+3)*nWN - (243+1)*nBN + 729*nWQ + 1458*nBQ + 2916*nWP + 26244*nBP
is 235652

Grant

User avatar
hgm
Posts: 23718
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Cache-friendier material index

Post by hgm » Thu Apr 01, 2010 2:57 pm

That can't be. The index can take negative values, and taken that into account there would actually be a little more, as at the ends of the table not every entry will be used. I guess I should have added a constant offset to avoid negative indexes.

Post Reply