Terje wrote: ↑Fri Apr 10, 2020 12:46 pm
The to^8 trick tested well in Weiss but not in Ethereal (likely due to factors discussed by hgm), which is why I was curious about what your profiler would say (and the actual performance).
It's not a dramatic difference, but "to ^ 8" is measurably slower than "if us == WHITE {to - 8} else {to + 8}" construction.
In another thread I posted that calculating the pawn attacks was faster than getting them from the attack-table.
Normally, from that call, you get the squares they attack, so get_pawn_attacks[WHITE][E2] gives you d3 and f3 in a bitboard. For the square_attacked() function, I needed the squares the attacking pawns are ON, not where they are attacking TO. If us == WHITE and attacker == BLACK, and you want to know which squares are attacked by black, you can't just do get_pawn_attacks[attacker][square] as you do with other pieces, because in case of E2, a black pawn on E2 would give you D1 and F1; in this case, you want to have the square where the black panws ARE, not what they can attack.
The trick is to flip the attacker, and get the pawns from the other side: get_pawn_attacks[attacker ^ 1][E2], which gives you D3 and F3, which are indeed the squares where an opponent pawn must be to attack E2.
As this lookup has "attacker ^ 1" in it, and "to ^ 1" is also much slower than a massive statement, I'm starting to believe the XOR operation is slow; at least in Rust. (I'll see if ! works in this context; as in, !attacker. Could be that this is only for bools though, and side is a u8).
If you want to see Weiss speed without the eval/move order code you can comment out the lines marked as 'update material', 'update phase', and 'update various piece lists' (this comment needs updating lol), in Add/Clear/MovePiece functions in makemove.c and line 67-75 in movegen.c. On my machine perft 7 drops from 104 to 88 seconds, a ~15% speedup.
I too only count leaf nodes so my perft output is a bit misleading. I'll update this to properly specify that it means leaves not all nodes, as well as leaves/s.
Don't need to do that; I also calculate hashes, piece-table updates and material updates.
Nice speedup again!
Thanks
I've switched to the GNU toolchain to test this again. It seems that HGM is right on this; suddenly, the GNU toolchain became much slower than MSVC. Now they are almost equal again, with GNU pulling ahead a little bit.
Rustic's perft 7 from starting position on this system is now 86.xx or 87.xx seconds:
Code: Select all
MSVC toolchain:
===============
Benchmarking perft 1-7:
8 r n b q k b n r
7 i i i i i i i i
6 . . . . . . . .
5 . . . . . . . .
4 . . . . . . . .
3 . . . . . . . .
2 I I I I I I I I
1 R N B Q K B N R
A B C D E F G H
Zobrist key: 819aa694337673fb
Active Color: White
Castling: KQkq
En Passant: -
Half-move clock: 0
Full-move number: 1
Perft 1: 20 (0 ms, inf leaves/sec)
Perft 2: 400 (0 ms, inf leaves/sec)
Perft 3: 8902 (0 ms, inf leaves/sec)
Perft 4: 197281 (5 ms, 39456200 leaves/sec)
Perft 5: 4865609 (139 ms, 35004381 leaves/sec)
Perft 6: 119060324 (3340 ms, 35646803 leaves/sec)
Perft 7: 3195901860 (87941 ms, 36341431 leaves/sec)
Total time spent: 91425 ms
Execution speed: 36314294 leaves/second
GNU toolchain:
==============
Benchmarking perft 1-7:
8 r n b q k b n r
7 i i i i i i i i
6 . . . . . . . .
5 . . . . . . . .
4 . . . . . . . .
3 . . . . . . . .
2 I I I I I I I I
1 R N B Q K B N R
A B C D E F G H
Zobrist key: 819aa694337673fb
Active Color: White
Castling: KQkq
En Passant: -
Half-move clock: 0
Full-move number: 1
Perft 1: 20 (0 ms, inf leaves/sec)
Perft 2: 400 (0 ms, inf leaves/sec)
Perft 3: 8902 (0 ms, inf leaves/sec)
Perft 4: 197281 (5 ms, 39456200 leaves/sec)
Perft 5: 4865609 (133 ms, 36583526 leaves/sec)
Perft 6: 119060324 (3272 ms, 36387629 leaves/sec)
Perft 7: 3195901860 (86980 ms, 36742950 leaves/sec)
Total time spent: 90390 ms
Execution speed: 36730107 leaves/second
For my feeling, I've now "proven" that Rust can be as fast as C, even in non-trivial projects like mini-benchmarks, but it does require a bit of special work. I have two arrays backed by a counter, in a struct, which implement the normal vector methods such as push, pop, clear, len, etc, just because it's faster than even a fixed length vector. Also, they are created with non-initialized memory... so with some heart-ache, I had to use a line of unsafe code:
Code: Select all
list: unsafe { mem::MaybeUninit::uninit().assume_init() },
(Where list is an array of 256 Move elements.)
This is basically the equivalent of C's malloc(); it creates a block of memory without initializing. The ".assume_init()" makes sure Rust drops the memory as soon as "list" (or in this case, the struct where "list" is in) goes out of scope. The memory is full with garbage, and it's my responsibility to fill it with usable data, just likein C. So, if I'd try to get "list[0]" without ever putting something in the memory block first, I get some random number.
In this case, it doesn't matter; I create the MoveList struct, and then immediately fill it with a call to gen_moves(). The struct's counter points to the first element that is still free. (Exactly the way it'd be done in C.)
The move list pool did work as well, and was as fast as this, but it required more code, and it's "not standard" with regard to chess programs. This is more understandable.
As I can't optimize any further right now (every change I make, degrades performance), I'm going to make a generic out of those two arrays to avoid code duplication, and call it quits on the speed optimizations. I'll start working on the search and UCI-protocol this weekend.