Polyglot - merge-book oddity

hgm · Post by **hgm** » Sun Mar 21, 2021 7:45 pm

Right, that is why I said it was a dirty fix. It should not affect functionality, but it will bloat the merged book.

The point is that it doesn't matter if there are two entries for the same move in a certain position. The probing will treat them as if they were different moves, but if one had a weight that correspond to playing frequency 20%, and the other 10%, there still is a 30% probability that one of the two is selected.

It would subvert the tuning of variability vs strength, however. E.g. if you would set that for 'best-move only', it would of course misjudge what the best move was, if its score is scattered over many entries.

To do it properly, you would have to collect all moves with the same hash key in a buffer, and remove the duplicats. This would have been much easier if Polyglot books also required the entries to be sorted by move. But alas, they don't. And even making it so in the book creation (which would be easy), would not solve the problem for merging existing book files. So I guess the merge routine has to solve it the hard way.

hgm · Post by **hgm** » Sun Mar 21, 2021 8:43 pm

I now pushed a more thorough fix to my on-line repository. It first buffers all entries from book 1 with the same key as the upcoming one from book 2, and then reads all entries from book 2 with that key, testing whether the move already occurred in kook 1. If it did, it adds the weight to that of the existing entry. If not, it adds the entry from book 2 to the buffer. Finally it flushes the buffer to the merged book.

I had no books at hand to test it on. But this is the code for the equal-key case:

Code: Select all

            entry_t e[500];
            int i = 1, j, n;
            ASSERT(e1->key==e2->key);
            e[0] = e1[0]; i1++; // buffer entry from book 1
            while(i < 500-1 && read_entry(In1, e+i, i1) && e[i].key == e1->key) i++, i1++; // read all for this key
            n = i; // remember how many moves read from book 1
            e[i] = e2[0]; // consider entry from book 2
            do {
               for(j=0; j<n; j++) { // test against all entries from book 1
                  if(e[i].move == e[j].move) { // same move?
                     e[j].count += e[i--].count; // add count and pop
                     break;
                  }
               }
               i++; i2++; // accept entry from book 2
            } while(i < 500 && read_entry(In2, e+i, i2) && e[i].key == e1->key);
            for(j=0; j<i; j++) write_entry(Out,e+i); // flush
            if(i == 500) skip++;

Rebel · Post by **Rebel** » Mon Mar 22, 2021 7:26 pm

hgm wrote: ↑Sun Mar 21, 2021 8:43 pm I had no books at hand to test it on. But this is the code for the equal-key case:

Tried your code, didn't work here. The created book(s) did not show any move. Perhaps you can try a few yourself from - http://rebel13.nl/rebel13/pgn-annotator.html#gambits

I got most of the doubles now, but not those when a move in book-2 comes first. I see only 2 good solutions when keys are equal:

1. Sort both books first on key (already the case) but on moves thereafter, so on the first 10 bytes. Binary sort, a royal pain.

2. Do a regular polyglot book search, if the book-2 move exists skip it.

Seems to me option 2 is the more easy way out.

JohnWoe · Post by **JohnWoe** » Mon Mar 22, 2021 7:31 pm

You can't just merge PolyGlot books because then O(Log2(N)) binary search becomes smt retarded like O(N). They need to be sorted for binary search algorithm to work. Doesn't really matter with enough TC and CPU. But still.

mar · Post by **mar** » Mon Mar 22, 2021 8:54 pm

JohnWoe wrote: ↑Mon Mar 22, 2021 7:31 pm You can't just merge PolyGlot books because then O(Log2(N)) binary search becomes smt retarded like O(N). They need to be sorted for binary search algorithm to work. Doesn't really matter with enough TC and CPU. But still.

well you merge two books that are already sorted, so you don't have to sort/binary search anything, just keep reading from either one as long as the hash is smaller, merging along the way => this should be pretty efficient actually, plus you basically read sequentially

hgm · Post by **hgm** » Mon Mar 22, 2021 9:13 pm

OK, there is a bug there. The loop that saves saves the entry at e+i instead of e+j, while j is the loop index. When I changed this it seems to work:

Code: Select all

$ od -b b1.bin
0000000 106 073 226 030 026 221 374 234 003 034 000 001 000 000 000 000
0000020 106 073 226 030 026 221 374 234 002 232 000 002 000 000 000 000
0000040 106 073 226 030 026 221 374 234 001 225 000 003 000 000 000 000
0000060 202 074 233 120 375 021 101 226 016 152 000 001 000 000 000 000
0000100 312 030 011 074 125 236 127 233 015 044 000 001 000 000 000 000
0000120 377 377 377 377 377 377 377 377 000 000 000 000 000 000 000 000
0000140

Makro@Makro-PC /home/polyglot
$ od -b b2.bin
0000000 106 073 226 030 026 221 374 234 002 333 000 001 000 000 000 000
0000020 106 073 226 030 026 221 374 234 002 232 000 003 000 000 000 000
0000040 106 073 226 030 026 221 374 234 000 122 000 005 000 000 000 000
0000060 235 137 172 356 176 167 235 241 014 343 000 010 000 000 000 000
0000100 312 030 011 074 125 236 127 233 014 242 000 001 000 000 000 000
0000120 377 377 377 377 377 377 377 377 000 000 000 000 000 000 000 000
0000140

Makro@Makro-PC /home/polyglot
$ ./polyglot merge-book -in1 b1.bin -in2 b2.bin -out b12.bin
PolyGlot 2.0.4 by Fabien Letouzey.

Makro@Makro-PC /home/polyglot
$ od -b b12.bin
0000000 106 073 226 030 026 221 374 234 003 034 000 001 000 000 000 000
0000020 106 073 226 030 026 221 374 234 002 232 000 005 000 000 000 000
0000040 106 073 226 030 026 221 374 234 001 225 000 003 000 000 000 000
0000060 106 073 226 030 026 221 374 234 002 333 000 001 000 000 000 000
0000100 106 073 226 030 026 221 374 234 000 122 000 005 000 000 000 000
0000120 202 074 233 120 375 021 101 226 016 152 000 001 000 000 000 000
0000140 235 137 172 356 176 167 235 241 014 343 000 010 000 000 000 000
0000160 312 030 011 074 125 236 127 233 015 044 000 001 000 000 000 000
0000200 312 030 011 074 125 236 127 233 014 242 000 001 000 000 000 000
0000220 377 377 377 377 377 377 377 377 000 000 000 000 000 000 000 000
0000240

mar · Post by **mar** » Mon Mar 22, 2021 10:10 pm

hmm, just a nitpick: shouldn't you saturate when adding both counts? I don't know how likely it is to overflow the 16 bits, but doesn't seem impossible
when merging moves like 1.e4 or 1.d4
also I'm not sure how to handle cases where count is 0 in one one book and not in the other, a sum is probably natural, but if I'm not mistaken
count=0 is used to mark moves to avoid?

hgm · Post by **hgm** » Mon Mar 22, 2021 10:18 pm

Ah, yes, I assume it is better to saturate.

Polyglot itself will never put moves with weight 0 in the book. It is possible to edit the weights, though, and set them to 0 that way. The effect is just the same as not being in the book: the probing will never play them.

Rebel · Post by **Rebel** » Tue Mar 23, 2021 12:49 am

hgm wrote: ↑Mon Mar 22, 2021 9:13 pm OK, there is a bug there. The loop that saves saves the entry at e+i instead of e+j, while j is the loop index. When I changed this it seems to work:

Code: Select all

$ od -b b1.bin
0000000 106 073 226 030 026 221 374 234 003 034 000 001 000 000 000 000
0000020 106 073 226 030 026 221 374 234 002 232 000 002 000 000 000 000
0000040 106 073 226 030 026 221 374 234 001 225 000 003 000 000 000 000
0000060 202 074 233 120 375 021 101 226 016 152 000 001 000 000 000 000
0000100 312 030 011 074 125 236 127 233 015 044 000 001 000 000 000 000
0000120 377 377 377 377 377 377 377 377 000 000 000 000 000 000 000 000
0000140

Makro@Makro-PC /home/polyglot
$ od -b b2.bin
0000000 106 073 226 030 026 221 374 234 002 333 000 001 000 000 000 000
0000020 106 073 226 030 026 221 374 234 002 232 000 003 000 000 000 000
0000040 106 073 226 030 026 221 374 234 000 122 000 005 000 000 000 000
0000060 235 137 172 356 176 167 235 241 014 343 000 010 000 000 000 000
0000100 312 030 011 074 125 236 127 233 014 242 000 001 000 000 000 000
0000120 377 377 377 377 377 377 377 377 000 000 000 000 000 000 000 000
0000140

Makro@Makro-PC /home/polyglot
$ ./polyglot merge-book -in1 b1.bin -in2 b2.bin -out b12.bin
PolyGlot 2.0.4 by Fabien Letouzey.

Makro@Makro-PC /home/polyglot
$ od -b b12.bin
0000000 106 073 226 030 026 221 374 234 003 034 000 001 000 000 000 000
0000020 106 073 226 030 026 221 374 234 002 232 000 005 000 000 000 000
0000040 106 073 226 030 026 221 374 234 001 225 000 003 000 000 000 000
0000060 106 073 226 030 026 221 374 234 002 333 000 001 000 000 000 000
0000100 106 073 226 030 026 221 374 234 000 122 000 005 000 000 000 000
0000120 202 074 233 120 375 021 101 226 016 152 000 001 000 000 000 000
0000140 235 137 172 356 176 167 235 241 014 343 000 010 000 000 000 000
0000160 312 030 011 074 125 236 127 233 015 044 000 001 000 000 000 000
0000200 312 030 011 074 125 236 127 233 014 242 000 001 000 000 000 000
0000220 377 377 377 377 377 377 377 377 000 000 000 000 000 000 000 000
0000240

Patch seems to work, my compliments. Even on the 950 Mb book, but had to increase the buffer size from 500 mto 5000. I plan to make it a separate util (Merge-Books.exe) to keep the old familiar one. Little surprise, Fulvio was so kind to add the extended polyglot format (thus with depth, score and engines) to Scid allowing easy browsing to the database clicking on moves.

Expect this special SCID release in a couple of days. much more user friendly than ProDeo Knowledge.

hgm · Post by **hgm** » Tue Mar 23, 2021 9:46 am

I picked the buffer size under the assumption that no position would ever have more than 500 book moves. Which for orthodox Chess seems very generous, as the maximum number of legal moves even in the most artificial positions seems to be around 216. And even then most of those would be so poor that you wouldn't want them in a book.

So how is it possible that you needed a buffer of 5000? At least one of the books would need to have an enormous number of duplicate move entries to fill that buffer. The current code only checks for the second book duplicating moves of the first. It doesn't check whether first-book moves are duplicats, or whether second-book moves duplicate an earlier move from the second book that did not occur in the first book. Of course it would be easy to check for that too. But the normal procedure for creating a book should never cause such duplicats to be present at all.

Polyglot - merge-book oddity

Re: Polyglot - merge-book oddity

Re: Polyglot - merge-book oddity

Re: Polyglot - merge-book oddity

Re: Polyglot - merge-book oddity

Re: Polyglot - merge-book oddity

Re: Polyglot - merge-book oddity

Re: Polyglot - merge-book oddity

Re: Polyglot - merge-book oddity

Re: Polyglot - merge-book oddity

Re: Polyglot - merge-book oddity