Caching AttackSets

Desperado · Post by **Desperado** » Mon Nov 09, 2009 2:15 pm

Hello everyone,

fiew days ago i had the following idea:

caching the last computed attackSet and blockerSet.
==================================

Code: Select all


extern __inline BTB_T attackQ&#40;SQR_T sq,BTB_T occ&#41;
	&#123;
	 //initialize,so computation
	 //is done on the 1st run.

	 static BTB_T attOld&#91;ID64&#93; =
		&#123;
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1
		&#125;;
	 
	 static BTB_T bloOld&#91;ID64&#93;;

	 //-check if last computed attackset can be used again&#58; if not, then compute...
	 //-store additionally the blockerSet to be able to use the next condition.

	 if&#40;!(&#40;attOld&#91;sq&#93;&occ&#41; == bloOld&#91;sq&#93;))
		&#123;
       //compute
		 attOld&#91;sq&#93; = attackLine&#40;&lmsk&#91;sq&#93;&#91;DIAGDIR&#93;,occ&#41; |
					     attackLine&#40;&lmsk&#91;sq&#93;&#91;ANTIDIR&#93;,occ&#41; |
					     attackLine&#40;&lmsk&#91;sq&#93;&#91;FILEDIR&#93;,occ&#41; |
					     attackLine&#40;&lmsk&#91;sq&#93;&#91;RANKDIR&#93;,occ&#41; ;

		 bloOld&#91;sq&#93; = attOld&#91;sq&#93; & occ;
		&#125;

	 return&#40;attOld&#91;sq&#93;);
	&#125;

I tested this in a perftframe for 123 positions,with about 4500000000 nodes. The speedup for 32 bit version was about 16%, and 12% for 64bit
version.
Also the speedgain may be concealed somehow in a real engine.

Maybe this approach can be used for various attackGetters.

There are different scenarios, where i think the speedgain comes from:
==============================================

- unmoved pieces
- revisiting square(with the same blockers for the lines)
- multiple calls within a stage(not for my perftframe,only incheck,movegen issues)

...and so on.

formular:
=======

i think it is a simple(cheap) formular:

(lastAttacks[sq] & currentOccupancy) == lastBlockers[sq]

where no computation is needed again.

Some ideas on that, or anyone did something similar ?

Maybe tell me about the drawbacks, i dont think of at the moment...

Michael

michiguel · Post by **michiguel** » Mon Nov 09, 2009 2:32 pm

Desperado wrote:Hello everyone,

fiew days ago i had the following idea:

caching the last computed attackSet and blockerSet.
==================================
Code: Select all
extern __inline BTB_T attackQ&#40;SQR_T sq,BTB_T occ&#41;
	&#123;
	 //initialize,so computation
	 //is done on the 1st run.

	 static BTB_T attOld&#91;ID64&#93; =
		&#123;
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1,
		 -1,-1,-1,-1,-1,-1,-1,-1
		&#125;;
	 
	 static BTB_T bloOld&#91;ID64&#93;;

	 //-check if last computed attackset can be used again&#58; if not, then compute...
	 //-store additionally the blockerSet to be able to use the next condition.

	 if&#40;!(&#40;attOld&#91;sq&#93;&occ&#41; == bloOld&#91;sq&#93;))
		&#123;
       //compute
		 attOld&#91;sq&#93; = attackLine&#40;&lmsk&#91;sq&#93;&#91;DIAGDIR&#93;,occ&#41; |
					     attackLine&#40;&lmsk&#91;sq&#93;&#91;ANTIDIR&#93;,occ&#41; |
					     attackLine&#40;&lmsk&#91;sq&#93;&#91;FILEDIR&#93;,occ&#41; |
					     attackLine&#40;&lmsk&#91;sq&#93;&#91;RANKDIR&#93;,occ&#41; ;

		 bloOld&#91;sq&#93; = attOld&#91;sq&#93; & occ;
		&#125;

	 return&#40;attOld&#91;sq&#93;);
	&#125;
I tested this in a perftframe for 123 positions,with about 4500000000 nodes. The speedup for 32 bit version was about 16%, and 12% for 64bit
version.
Also the speedgain may be concealed somehow in a real engine.

Maybe this approach can be used for various attackGetters.

There are different scenarios, where i think the speedgain comes from:
==============================================

- unmoved pieces
- revisiting square(with the same blockers for the lines)
- multiple calls within a stage(not for my perftframe,only incheck,movegen issues)

...and so on.

formular:
=======

i think it is a simple(cheap) formular:

(lastAttacks[sq] & currentOccupancy) == lastBlockers[sq]

where no computation is needed again.

Some ideas on that, or anyone did something similar ?

If I understand correctly, I have been doing exactly the same for a long time. I do not know now, but the speed up was noticeable when I implemented it. I do not think there is any drawback at all.

From the opening position, I just checked and the hit rate is 83% (see the line Attacks generation).

Code: Select all

sd 10
go
********* Starts iterative deepening, thread = 0
set timer to 1000000.000000 , max_ply 10, panic time&#58; 4999999.750000
         1   1       0.0    -0.09  1.f4
         2   1       0.0    +0.15  1.c4
         3   1       0.0    +0.66  1.e4
        11   1       0.0    +0.80  1.e3
        20   1&#58;      0.0    +0.80  1.e3
        22   2       0.0      &#58;-(  1.e3
        60   2       0.0      &#58;-(  
        89   2       0.0    +0.00  1.e3 e6
       129   2&#58;      0.0    +0.00  1.e3 e6
       177   3       0.0      &#58;-)  1.e3
       261   3       0.0    +0.65  1.e3 Nf6 2.Nc3
       333   3&#58;      0.0    +0.65  1.e3 Nf6 2.Nc3
       405   4       0.0      &#58;-(  1.e3
      1101   4       0.0      &#58;-(  
      1259   4       0.0    -0.00  1.e3 Nf6 2.Nc3 d5
      1927   4       0.0    +0.00  1.d4 d5 2.Qd3 Qd6
      2494   4&#58;      0.0    +0.00  1.d4 d5 2.Qd3 Qd6
      3957   5       0.0      &#58;-)  1.d4
      7104   5       0.0    +0.48  1.d4 Nf6 2.Qd3 d5 3.Nf3
      9751   5&#58;      0.0    +0.48  1.d4 Nf6 2.Qd3 d5 3.Nf3
     12215   6       0.1      &#58;-(  1.d4
     28184   6       0.1      &#58;-(  
     35798   6       0.2    +0.02  1.d4 Nf6 2.Qd3 d5 3.Qb5+ Nc6 4.Nf3
     51583   6       0.2    +0.02  1.Nc3 Nf6 2.e4 d5 3.Qf3 c6 4.exd5 cxd5
     51973   6&#58;      0.2    +0.02  1.Nc3 Nf6 2.e4 d5 3.Qf3 c6 4.exd5 cxd5
     81637   7       0.3      &#58;-)  1.Nc3
    128155   7       0.5    +0.52  1.Nc3 Nc6 2.Nf3 d5 3.d4 Qd6 4.Bg5
    152356   7&#58;      0.6    +0.52  1.Nc3 Nc6 2.Nf3 d5 3.d4 Qd6 4.Bg5
    174950   8       0.6      &#58;-(  1.Nc3
    379858   8       1.3      &#58;-(  
    434244   8       1.5    +0.00  1.Nc3 Nc6 2.Nf3 Nf6 3.d3 d6 4.Be3 Be6
    666010   8&#58;      2.1    +0.00  1.Nc3 Nc6 2.Nf3 Nf6 3.d3 d6 4.Be3 Be6
    953264   9       2.9    +0.25  1.Nc3 Nc6 2.d4 Nf6 3.Bf4 d6 4.e3 Bf5
                                   5.Bc4
   1543350   9       4.4    +0.25  1.d4 Nf6 2.Qd3 c6 3.Nf3 d6 4.a4 Be6
                                   5.a5
   1760679   9&#58;      4.9    +0.25  1.d4 Nf6 2.Qd3 c6 3.Nf3 d6 4.a4 Be6
                                   5.a5
   3209766  10       8.9    +0.14  1.d4 Nf6 2.Nc3 d6 3.e4 c6 4.e5 dxe5
                                   5.dxe5 Qxd1+ 6.Kxd1 Nd5 7.Nxd5 cxd5
   5108852  10&#58;     14.3    +0.14  1.d4 Nf6 2.Nc3 d6 3.e4 c6 4.e5 dxe5
                                   5.dxe5 Qxd1+ 6.Kxd1 Nd5 7.Nxd5 cxd5

Ply&#58; 10
d2-d4  Ng8-f6  Nb1-c3  d7-d6  e2-e4  c7-c6  e4-e5  d6xe5  
d4xe5  Qd8xd1  Ke1xd1  Nf6-d5  Nc3xd5  c6xd5  
Score&#58; 0.14 &#40;34&#41;  Evals&#58; 3920189   Time&#58; 14.3s   nps&#58; 357638  Q/all&#58; 0.34
-------------------------------------------------------------------
               nodes        cutoffs         missed       tree_exp
-------------------------------------------------------------------
path         3375517         529806          60767         1.11
quies        1733335         707829          33355         1.05
all          5108852        1237635          94122         1.08
-------------------------------------------------------------------
hashtable=  attempts&#58; 3375719   hits&#58; 33.0%   perfect&#58; 17.3%

Side-to-wait attack calls&#58; 26202 calls/node&#58; 0.513%
Lazy evals = low&#58;0  cutoff&#58;0  normal&#58;0
-------------------------------------------------------------------
                            hits         missed     efficiency
-------------------------------------------------------------------
 BB cached in eval        235828        3683926      0.060
  Checks Cheap/Exp       5039977          10694      0.998
InChecks Cheap/Exp       4775128         269077      0.947
Attacks generation      54999438       11536259      0.827
       Attacks SEE        366181         572683      0.390
         Pawn hash       3213378         706376      0.820
     Material hash       3916806           2948      0.999
 Make normal moves       5039957          10694      0.998
-------------------------------------------------------------------

Maybe tell me about the drawbacks, i dont think of at the moment...

Michael

Desperado · Post by **Desperado** » Mon Nov 09, 2009 3:00 pm

hi miguel,

i just measured my hitrate, in the perftframe which
is _only_ 49.1%.

But if on this low hitrate, a speedgain of 16% is done, i am optimistic
after other implementations like mobilityComputation,SEEComputations,
evaluation and so on, the hitrate will rise.

Also on a hitrate of 25-30%(which i guess is a minimum) it should not be worse than without caching. Of course this also depends on the speed of the attackGetter.

regards

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Nov 09, 2009 9:43 pm

Nice and simple. You may improve hitrate a bit if you consider relevant blockers only by one additional and.

Code: Select all

if&#40; &#40;attOld&#91;sq&#93; & occ & relevant4Q&#91;sq&#93;) != bloOld&#91;sq&#93;) 
&#123;
    attOld&#91;sq&#93; = ...
    bloOld&#91;sq&#93; = attOld&#91;sq&#93; & relevant4Q&#91;sq&#93; & occ; 
&#125;

for instance relevant4Q[d3] safes 8 outer squares, which occupancy may change without affecting the result.

Code: Select all

 . . . . . . . .
 . . . 1 . . . .
 . . . 1 . . 1 .
 . 1 . 1 . 1 . .
 . . 1 1 1 . . .
 . 1 1 . 1 1 1 .
 . . 1 1 1 . . .
 . . . . . . . .

Zach Wegner · Post by **Zach Wegner** » Mon Nov 09, 2009 9:52 pm

Gerd Isenberg wrote:Nice and simple. You may improve hitrate a bit if you consider relevant blockers only by one additional and.
Code: Select all
if&#40; &#40;attOld&#91;sq&#93; & occ & relevant4Q&#91;sq&#93;) != bloOld&#91;sq&#93;) 
&#123;
    attOld&#91;sq&#93; = ...
    bloOld&#91;sq&#93; = attOld&#91;sq&#93; & relevant4Q&#91;sq&#93; & occ; 
&#125;
for instance relevant4Q[d3] safes 8 outer squares, which occupancy may change without affecting the result.
Code: Select all
 . . . . . . . .
 . . . 1 . . . .
 . . . 1 . . 1 .
 . 1 . 1 . 1 . .
 . . 1 1 1 . . .
 . 1 1 . 1 1 1 .
 . . 1 1 1 . . .
 . . . . . . . .

...or even just a relevant[64] table, which for non-edge squares has the inner 6x6 bits all 1, and if the square is on an edge has the edge(s) filled in too.

Desperado · Post by **Desperado** » Mon Nov 09, 2009 11:28 pm

Hello everyone again.

@Gerd+Zach:

1:

i will test with the _relevevant_ bitsets tomorrow again.

2: "KILLERCACHE"

There is another idea based on this, but this maybe "too much"
of tuning.(simple has to be tested).

I thought of a "KillerCache"

.

The mechanics like killermoves are using, so
that would extend the datastructures above to

attOld[ID64][2]
bloOld[ID64][2]

this may lead to a very hi hitrate my _feeling_ tells me.
Tomorrow i will test this and report.
If not i will keep it simple and stupid, but maybe updated
with the _relevant_ statement.(have to think about it first)

3:

which attackGetter types may not profit from this ? (magics,rotated...)
and why?

Thx for your replies.

michiguel · Post by **michiguel** » Mon Nov 09, 2009 11:58 pm

Zach Wegner wrote:
Gerd Isenberg wrote:Nice and simple. You may improve hitrate a bit if you consider relevant blockers only by one additional and.
Code: Select all
if&#40; &#40;attOld&#91;sq&#93; & occ & relevant4Q&#91;sq&#93;) != bloOld&#91;sq&#93;) 
&#123;
    attOld&#91;sq&#93; = ...
    bloOld&#91;sq&#93; = attOld&#91;sq&#93; & relevant4Q&#91;sq&#93; & occ; 
&#125;
for instance relevant4Q[d3] safes 8 outer squares, which occupancy may change without affecting the result.
Code: Select all
 . . . . . . . .
 . . . 1 . . . .
 . . . 1 . . 1 .
 . 1 . 1 . 1 . .
 . . 1 1 1 . . .
 . 1 1 . 1 1 1 .
 . . 1 1 1 . . .
 . . . . . . . .
...or even just a relevant[64] table, which for non-edge squares has the inner 6x6 bits all 1, and if the square is on an edge has the edge(s) filled in too.

Correct. I seem to recall that there is no much gain with this. Not many pieces move in and out of the edges. I may want to retry this.

However, one thing that could be tuned is how the Q attacks are builded. The hit rate increases if attacks for B and R are kept in cache from the same square to build the Q attacks. More calculation is needed though, but it is very likely that at least one of them will be in cache (R or B).

Miguel

Zach Wegner · Post by **Zach Wegner** » Tue Nov 10, 2009 12:49 am

michiguel wrote:
Zach Wegner wrote:
Gerd Isenberg wrote:Nice and simple. You may improve hitrate a bit if you consider relevant blockers only by one additional and.
Code: Select all
if&#40; &#40;attOld&#91;sq&#93; & occ & relevant4Q&#91;sq&#93;) != bloOld&#91;sq&#93;) 
&#123;
    attOld&#91;sq&#93; = ...
    bloOld&#91;sq&#93; = attOld&#91;sq&#93; & relevant4Q&#91;sq&#93; & occ; 
&#125;
for instance relevant4Q[d3] safes 8 outer squares, which occupancy may change without affecting the result.
Code: Select all
 . . . . . . . .
 . . . 1 . . . .
 . . . 1 . . 1 .
 . 1 . 1 . 1 . .
 . . 1 1 1 . . .
 . 1 1 . 1 1 1 .
 . . 1 1 1 . . .
 . . . . . . . .
...or even just a relevant[64] table, which for non-edge squares has the inner 6x6 bits all 1, and if the square is on an edge has the edge(s) filled in too.
Correct. I seem to recall that there is no much gain with this. Not many pieces move in and out of the edges. I may want to retry this.

However, one thing that could be tuned is how the Q attacks are builded. The hit rate increases if attacks for B and R are kept in cache from the same square to build the Q attacks. More calculation is needed though, but it is very likely that at least one of them will be in cache (R or B).

Miguel

Yeah, it turns out that Gerd's way is the same as mine. Based on the "Q" in his array name, I thought his example was for queen attacks, and there would be two more arrays for bishops and rooks. Of course, this is for separate B/R attacks, where queen moves are computed as the union of the two. I guess that's what happens if you reply in 9 minutes.

Caching AttackSets

Caching AttackSets

Re: Caching AttackSets

Re: Caching AttackSets

Re: Caching AttackSets

Re: Caching AttackSets

Re: Caching AttackSets

Re: Caching AttackSets

Re: Caching AttackSets