- apply the child moves
- calculate new hashes
- evaluate child positions
- detect legal moves for child positions
Then for the next (n-1) iterations through the loop, that work is already done. But if the loop exits early, some of that work is wasted: up to (n-1) child positions worth. The idea is that (hopefully) the SIMD path is more efficient, even after paying for the wasted calculations. So does it work?
Yes


I have only done a few minutes of testing, but comparing the scalar x64 path to the 2-way SSE path:
- the SSE path discards about 10% of its work because of early loop exits (or odd move list length)
- even so, the search is about 10% faster overall
The next test will be 4-way SIMD using AVX2. I have no idea where this one will land.
If you'd like to see the (now uglier) code, it's on GitHub here. Look in src/engine.h, around line 450, the "while( movesTried < ..." loop in NegaMax().
(The Linux build is broken again, I have to go fiddle with some declarations)
Cheers,