Question to syzygy author

mcostalba · Post by **mcostalba** » Tue May 17, 2016 2:02 pm

syzygy wrote: Suppose pawns are on d2, e2, h2.
The ptwist[] values are 11, 10 and 46.
The first sort gives 46, 11, 10, so h2, d2, e2. The leading pawn is h2.
Since we now know the leading file, we can extract the other pieces.
Next, we make sure that the leading pawn is in one of files a, b, c, d.
Since it is on h2, we horizontally flip the board. This gives a2,e2,d2.
The ptwist[] values are now 46,10,11. So this is not sorted anymore.

Thanks for your explanation. Now I have understood.

I have fixed the code and verified against a hand-crafted test case position.

Rein Halbersma · Post by **Rein Halbersma** » Tue May 17, 2016 4:25 pm

syzygy wrote: I vaguely remember realising that I could compare to Binomial(sq, k) and skip the subtractions, but as long as I keep counting from k-1 and up it would not really matter. But that approach loses where it runs over the same squares repeatedly, which is avoided by starting from the end of the board. The only disadvantage of your approach is that you need to know where to start (but that information will normally be available).

Hmm, I only really lose if there are 3 or more like pieces, so generation time is unlikely to improve by a large factor

Here's some further optimized code that repeatedly scans the binomial table from below, does not need knowledge of N (board size) and does a minimal amount of subtractions of binomial coefficients.

Code: Select all

template<size_t K>
void comb_unrank&#40;size_t index, size_t (&tup&#41;&#91;K&#93;)
&#123;
    for &#40;size_t k = K, n = k; k > 1; --k, n = k&#41; &#123; 
        while &#40;binom&#40;n, k&#41; <= index&#41;
            ++n;     
        tup&#91;k - 1&#93; = --n; 
        index -= binom&#40;n, k&#41;; 
    &#125;
    tup&#91;0&#93; = index;
    assert&#40;std&#58;&#58;is_sorted&#40;std&#58;&#58;begin&#40;tup&#41;, std&#58;&#58;end&#40;tup&#41;));
&#125;

Test code here: http://coliru.stacked-crooked.com/a/05f3c3cb2b3af776

Michel · Post by **Michel** » Tue May 17, 2016 8:12 pm

Here is the proof of concept O(k) algorithm

Code: Select all

import scipy.special
import scipy.optimize

def unrank&#40;index,k&#41;&#58;
    l=&#91;&#93;
    while k>0&#58;
        t=scipy.optimize.brentq&#40;lambda x&#58;scipy.special.binom&#40;x,k&#41;-index,0,index&#41;
        sq=int&#40;t&#41;
        l.insert&#40;0,sq&#41;
        index-=scipy.special.binom&#40;sq,k&#41;
        k-=1
    return l

Of course

Code: Select all

t=scipy.optimize.brentq&#40;lambda x&#58;scipy.special.binom&#40;x,k&#41;-index,0,index&#41;

must be replaced by something smarter.

syzygy · Post by **syzygy** » Tue May 17, 2016 8:42 pm

Rein Halbersma wrote:Here's some further optimized code that repeatedly scans the binomial table from below, does not need knowledge of N (board size) and does a minimal amount of subtractions of binomial coefficients.

I think this is my "modified" approach. Unless I am mistaken, it takes more operations on average than the scanning from above approach. Since one cannot avoid extracting the highest square first, the "right" way to scan seems to be from the top down.

Even for K = 2, the highest square of the two will on average be closer to h8 than to a1, so searching for it from above is faster. The remaining square will on average be as close to a1 as to the highest square minus 1. (Of course the remaining square does not need to be searched for anymore, because idx equals it.)

A binary search for each next highest square would be possible, but is probably not worth it.

syzygy · Post by **syzygy** » Tue May 17, 2016 8:51 pm

Michel wrote:
Yours is linear in N and mine seems to be quadratic on average. Skipping the last loop is of course also possible in your approach.
It is possble to do the unmapping in O(k) since you know the inner while loop (in Rein's modified code) will exit when sq is of the order of fact(k)*root^k(index). So you can do a small loop around that value (I haven't bothered to look for exact bounds).

Of course for small k the overhead will probably be too large.

True, in principle the value can be calculated with a formula. For large enough N that should be worth it, but for N < 64 probably not (but just a guess).

A shift followed by a lookup might be a good way to find an upper bound for the highest square.

Rein Halbersma · Post by **Rein Halbersma** » Tue May 17, 2016 9:36 pm

syzygy wrote:
Rein Halbersma wrote:Here's some further optimized code that repeatedly scans the binomial table from below, does not need knowledge of N (board size) and does a minimal amount of subtractions of binomial coefficients.
I think this is my "modified" approach. Unless I am mistaken, it takes more operations on average than the scanning from above approach. Since one cannot avoid extracting the highest square first, the "right" way to scan seems to be from the top down.

Why does your upward scanning code do the binom() subtraction from the index from within the while() loop? That seems quite expensive.

Even for K = 2, the highest square of the two will on average be closer to h8 than to a1, so searching for it from above is faster. The remaining square will on average be as close to a1 as to the highest square minus 1. (Of course the remaining square does not need to be searched for anymore, because idx equals it.)

A binary search for each next highest square would be possible, but is probably not worth it.

The difference is quite large. I did some benchmarking with 2 optimized versions. I changed to using std::array so that I can return a tuple of squares, this make composing ranking and unranking functions a bit easier. With C++11 move semantics, the returned array is guaranteed to be constructed in place (so no extra copying).

Code: Select all

#if IMP == 0

template<size_t N, size_t K>
std&#58;&#58;array<size_t, K> comb_unrank&#40;size_t index&#41;
&#123;
    std&#58;&#58;array<size_t, K> a;
    for &#40;size_t n = N, k = K; k > 1; --n, --k&#41; &#123; 
        while &#40;binom&#40;n, k&#41; > index&#41;
            --n;     
        a&#91;k - 1&#93; = n; 
        index -= binom&#40;n, k&#41;; 
    &#125;
    a&#91;0&#93; = index;
    assert&#40;std&#58;&#58;is_sorted&#40;a.begin&#40;), a.end&#40;)));
    return a;
&#125;

#elif IMP == 1

template<size_t N, size_t K>
std&#58;&#58;array<size_t, K> comb_unrank&#40;size_t index&#41;
&#123;
    std&#58;&#58;array<size_t, K> a;
    for &#40;size_t n = K, k = K; k > 1; n = --k&#41; &#123; 
        while &#40;binom&#40;n, k&#41; <= index&#41;
            ++n;     
        a&#91;k - 1&#93; = --n; 
        index -= binom&#40;n, k, false&#41;; 
    &#125;
    a&#91;0&#93; = index;
    assert&#40;std&#58;&#58;is_sorted&#40;a.begin&#40;), a.end&#40;)));
    return a;
&#125;

#endif

See the code here: http://coliru.stacked-crooked.com/a/29cec910f4783947

Click "Edit" and change the command line to -DIMP=1 to get a comparison between methods. I count calls to binom() for the 2 roundtrips of all Binom(64, 3) distinct ways to place 3 identical pieces on a chessboard. I assert that rank/unrank are each other's inverse for all indices and all tuples.

Method 0 (scanning down from above) takes 3,12 million calls to binom() and method 1 (scanning up from below) takes 6,85 million calls. Quite a difference!

On an *unrank per position** basis (there are B(64,3)=41664 positions), it is the difference between 35 and 80 calls to binom(). I corrected for the fact that there are 2 calls to unrank() and 2 calls to rank() in the loop. Each *rank per position* takes exactly 2 calls to binom().

Method 1 also does some extra work because it can overshoot its target (if the remaining index is equal to a binom coefficient. The overhead is 2.5% extra calls to binom().

syzygy · Post by **syzygy** » Tue May 17, 2016 9:53 pm

Rein Halbersma wrote:
syzygy wrote:
Rein Halbersma wrote:Here's some further optimized code that repeatedly scans the binomial table from below, does not need knowledge of N (board size) and does a minimal amount of subtractions of binomial coefficients.
I think this is my "modified" approach. Unless I am mistaken, it takes more operations on average than the scanning from above approach. Since one cannot avoid extracting the highest square first, the "right" way to scan seems to be from the top down.
Why does your code do the binom() subtraction from the index take place within the while() loop? That seems quite expensive.

It is a remnant of an earlier version in which I did not use a table for the binomial coefficients but computed it on the fly (within my very specific indexing functions). My approach requires the calculation of a polynomial of one degree less. With a simple table lookup this advantage is of course gone. But still, a simple subtraction of a value that is in a register anyway is not expensive.

Method 0 (scanning down from above) takes 3,12 million calls to binom() and method 1 (scanning up from below) takes 6,85 million calls. Quite a difference!

Exactly. You should remove the "false" argument from binom(n, k, false), though

For K=4: 29,227,297 vs 74,338,993 calls
For K=2: 49,729 vs 89,337 calls

Rein Halbersma · Post by **Rein Halbersma** » Tue May 17, 2016 10:18 pm

syzygy wrote:
Rein Halbersma wrote: Method 0 (scanning down from above) takes 3,12 million calls to binom() and method 1 (scanning up from below) takes 6,85 million calls. Quite a difference!
Exactly. You should remove the "false" argument from binom(n, k, false), though

For K=4: 29,227,297 vs 74,338,993 calls
For K=2: 49,729 vs 89,337 calls

For a previous version i used a track boolean flag to turn counting on / off

Now with nicer output such as binom() calls per rank/unrank: http://coliru.stacked-crooked.com/a/9e52bcd6f45f0cda

Note that in draughts/checkers, there are both more pieces on the board, and less different piece types, so you get e.g. choose(50,5) as ranges. It's not only that scanning downward is faster, it also scales much better with K. E.g. binom-calls per unrank for K=2 to 5 runs from 19 to 39 for method0 (roughly doubling), and from 34 to 113 for method1 (tripling!).

mcostalba · Post by **mcostalba** » Wed May 18, 2016 7:02 am

Looking at the initialization of WDLEntry, I see that in case of symmetric material distributions (same material key for both sides), the precomp struct is fully initialized only for one key and during probe stm is always WHITE.

Nevertheless, even in this case, both precomp are allocated and the second (useless?) one is init with some partial data (pieces, norms and factors).

There is a reason for this?

syzygy · Post by **syzygy** » Wed May 18, 2016 9:50 am

mcostalba wrote:Looking at the initialization of WDLEntry, I see that in case of symmetric material distributions (same material key for both sides), the precomp struct is fully initialized only for one key and during probe stm is always WHITE.

Nevertheless, even in this case, both precomp are allocated and the second (useless?) one is init with some partial data (pieces, norms and factors).

There is a reason for this?

Maybe I am misunderstanding you, but I think the original code only allocates one in that case.

Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author