doing undoing

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

LoopList

doing undoing

Post by LoopList »

Code: Select all

What will be faster when incremental doing/undoing moves:

1.
do_move(const int move) {
  int to_from;
  ...
  to_from = move & 0xFFF; 
  key ^= random_rook_key[to_from];
  bitboard1 ^= set_bits[to_from];
  bitboard2 ^= set_bits[to_from];
  ...
}

2.
do_move(const int move) {
  int to;
  int from;
  ...
  to = move & 0x3F;
  from = (move >> 6) & 0x3F;
  key ^= random_rook_key[to]; 
  key ^= random_rook_key[from];
  bitboard1 |= set_bit[to];
  bitboard1 &= clear_bit[from];
  bitboard2 |= set_bit[to];
  bitboard2 &= clear_bit[from];
  ...
}

In case 1. we have less bit operations but we need much larger look-up-tables (size = 4096 x 8 Bytes = 32 KB).

In case 2. we have more bit operations but we need smaller loop-up-tables (size = 64 x 8 Bytes = 0.5 KB).

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: doing undoing

Post by sje »

Code both, then compare actual usage time results.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: doing undoing

Post by Gerd Isenberg »

Hi Fritz,

actually I like your first one better, but using simdwise sse2.
For approach 2, you may better use xor with set_bit always.

It depends on how many cachelines you can save in average with denser, but twice as much lookups. It depends not necessarily on the distribution of those cachelines in memory (but there are issues as well, pages, hardware pre-fetcher).

Memory versus computation is often hard to predict and will largely depend on your other memory and cache requirements. As long your L1 Datacache is not saturated, likely using more memory is faster. But if it becomes polluted it might be the other way around.

Gerd
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: doing undoing

Post by Gerd Isenberg »

Another idea for recent x64 cpus like core 2 duo is to use bittest intrinsics, to set, reset or complement bits by bitindex with one instruction. btx reg64, reg64 has a latency of one cycle on a core 2 duo.

http://msdn2.microsoft.com/en-us/library/azcs88h2.aspx

_bittestandset64
_bittestandreset64
_bittestandcomplement64

u64 key = ....;
_bittestandcomplement64(&key, from);
_bittestandcomplement64(&key, to);

Hopefully the compiler will use the btc reg64, reg64 here, since the btx-version with memory operand is much more expensive since it allows to process large bit-arrays and doesn't rely on mod 32/64...

In a perft- or "beancounter" framework, doMove- and undoMove-performance is more critical than in approaches where one usually spends much more time in leaf-nodes - also a futility-pruning versus lazy-eval issue.

Gerd
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: doing undoing

Post by Aleks Peshkov »

Gerd Isenberg wrote:...where one usually spends much more time in leaf-nodes - also a futility-pruning versus lazy-eval issue.
Please explain the last statement a bit.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: doing undoing

Post by Gerd Isenberg »

Aleks Peshkov wrote:
Gerd Isenberg wrote:...where one usually spends much more time in leaf-nodes - also a futility-pruning versus lazy-eval issue.
Please explain the last statement a bit.
If you prune forward, you probably don't do Make/Unmake. In contrast, if you have a lot of cheap leaves, make/Unmake becomes relative more expensive.