Discussion of chess software programming and technical issues.
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
-
LoopList
Post
by LoopList » Mon May 14, 2007 3:16 pm
Code: Select all
What will be faster when incremental doing/undoing moves:
1.
do_move(const int move) {
int to_from;
...
to_from = move & 0xFFF;
key ^= random_rook_key[to_from];
bitboard1 ^= set_bits[to_from];
bitboard2 ^= set_bits[to_from];
...
}
2.
do_move(const int move) {
int to;
int from;
...
to = move & 0x3F;
from = (move >> 6) & 0x3F;
key ^= random_rook_key[to];
key ^= random_rook_key[from];
bitboard1 |= set_bit[to];
bitboard1 &= clear_bit[from];
bitboard2 |= set_bit[to];
bitboard2 &= clear_bit[from];
...
}
In case 1. we have less bit operations but we need much larger look-up-tables (size = 4096 x 8 Bytes = 32 KB).
In case 2. we have more bit operations but we need smaller loop-up-tables (size = 64 x 8 Bytes = 0.5 KB).
-
sje
- Posts: 4675
- Joined: Mon Mar 13, 2006 6:43 pm
Post
by sje » Mon May 14, 2007 4:00 pm
Code both, then compare actual usage time results.
-
Gerd Isenberg
- Posts: 2105
- Joined: Wed Mar 08, 2006 7:47 pm
- Location: Hattingen, Germany
Post
by Gerd Isenberg » Mon May 14, 2007 6:24 pm
Hi Fritz,
actually I like your first one better, but using simdwise sse2.
For approach 2, you may better use xor with set_bit always.
It depends on how many cachelines you can save in average with denser, but twice as much lookups. It depends not necessarily on the distribution of those cachelines in memory (but there are issues as well, pages, hardware pre-fetcher).
Memory versus computation is often hard to predict and will largely depend on your other memory and cache requirements. As long your L1 Datacache is not saturated, likely using more memory is faster. But if it becomes polluted it might be the other way around.
Gerd
-
Gerd Isenberg
- Posts: 2105
- Joined: Wed Mar 08, 2006 7:47 pm
- Location: Hattingen, Germany
Post
by Gerd Isenberg » Tue May 15, 2007 9:06 am
Another idea for recent x64 cpus like core 2 duo is to use bittest intrinsics, to set, reset or complement bits by bitindex with one instruction. btx reg64, reg64 has a latency of one cycle on a core 2 duo.
http://msdn2.microsoft.com/en-us/library/azcs88h2.aspx
_bittestandset64
_bittestandreset64
_bittestandcomplement64
u64 key = ....;
_bittestandcomplement64(&key, from);
_bittestandcomplement64(&key, to);
Hopefully the compiler will use the btc reg64, reg64 here, since the btx-version with memory operand is much more expensive since it allows to process large bit-arrays and doesn't rely on mod 32/64...
In a perft- or "beancounter" framework, doMove- and undoMove-performance is more critical than in approaches where one usually spends much more time in leaf-nodes - also a futility-pruning versus lazy-eval issue.
Gerd
-
Aleks Peshkov
- Posts: 845
- Joined: Sun Nov 19, 2006 8:16 pm
- Location: Russia
Post
by Aleks Peshkov » Tue May 15, 2007 11:29 am
Gerd Isenberg wrote:...where one usually spends much more time in leaf-nodes - also a futility-pruning versus lazy-eval issue.
Please explain the last statement a bit.
-
Gerd Isenberg
- Posts: 2105
- Joined: Wed Mar 08, 2006 7:47 pm
- Location: Hattingen, Germany
Post
by Gerd Isenberg » Tue May 15, 2007 11:46 am
Aleks Peshkov wrote:Gerd Isenberg wrote:...where one usually spends much more time in leaf-nodes - also a futility-pruning versus lazy-eval issue.
Please explain the last statement a bit.
If you prune forward, you probably don't do Make/Unmake. In contrast, if you have a lot of cheap leaves, make/Unmake becomes relative more expensive.