Book learning

hgm · Post by **hgm** » Wed May 11, 2011 10:55 am

Polyglot saves learning information into its .bin book when you have set the option BookLearn=true. But it does not use this information when probing the book.

What would be a good way to use the learn info (which consists of number of times the move is played, and the number of half-points scored with it)? would it make sense to multiply the move weights used to select the move by a learn-dependent factor as follows:

usedWeight = bookWeight * (scoredHalfPoints + 10.) / (nrPlayed + 1.);

The idea would be to initially bias play towards moves that have not been sufficiently often played yet. (Moves that have not been playet at all have scoredHalfPoints = nrPlayed = 0, so their weight would be multiplied by 10 compared to what is in the book.) By the time the moves have been played often enough to make the score statistically significant (i.e. >> 10 times), the multiplier would converge to the average result (scoredHalfPoints / nrPlayed), which could vary between 0 and 2.

Gian-Carlo Pascutto · Wed May 11, 2011 10:58 am

Doesn't sound like a very good idea. You'll end up playing every move in book, no matter how crappy it is. You shouldn't totally discard the initial weights.

hgm · Post by **hgm** » Wed May 11, 2011 11:23 am

But this doesn't discard the initial weights. They are just modified by the multiplier. So if a certain move had only 5% (compared to two others that have 50% and 45%) it could at most be scaled up to 50% (before renormalization) if it has not been played at all, and the others have been played infinitely many times with average result.

Gian-Carlo Pascutto · Wed May 11, 2011 11:32 am

Ah, right, then it's not so bad. But if the 5% move then scores 4 straight losses, it will still be at 10%, which doesn't look desirable to me.

I'm not sure why you want to force exploration of the moves. This is useful if there is no strong prior, but here there is good information already about what moves are likely to be good.

hgm · Post by **hgm** » Wed May 11, 2011 11:57 am

OK, so you would propose to scale back the 10 in the enumerator to 1.

The point is that in a book compiled from a PGN collection, the prior might not be all that strong. a certain position deeper in the book might only have been played 5 times, with 3 moves and 3 / 2 / 1 move weights. In that case a bit of exploration does not seem like such a bad idea.

So perhaps it would make sense to replace the 10 by something that depends on the total of the weights. Like (50. + totalWeight) / (5. + totalWeight) so that it starts near 10 for a prior based on almost nothing, but is already down to 3 when the prior is based on 20 games, and would approach 1 for a prior based on thousands of games.

Eelco de Groot · Post by **Eelco de Groot** » Wed May 11, 2011 1:01 pm

hgm wrote:Polyglot saves learning information into its .bin book when you have set the option BookLearn=true. But it does not use this information when probing the book.

What would be a good way to use the learn info (which consists of number of times the move is played, and the number of half-points scored with it)? would it make sense to multiply the move weights used to select the move by a learn-dependent factor as follows:

usedWeight = bookWeight * (scoredHalfPoints + 10.) / (nrPlayed + 1.);

The idea would be to initially bias play towards moves that have not been sufficiently often played yet. (Moves that have not been playet at all have scoredHalfPoints = nrPlayed = 0, so their weight would be multiplied by 10 compared to what is in the book.) By the time the moves have been played often enough to make the score statistically significant (i.e. >> 10 times), the multiplier would converge to the average result (scoredHalfPoints / nrPlayed), which could vary between 0 and 2.

Because computers are not very good at finding the right opening-moves by themselves, good opening percentages probably only mean opponents with weaker books than your own, but this might change. Relying on good percentages acquired over many games still does not help if your opponents suddenly get better books. Bad percentages probably mean your own book has bad lines. But if you have no alternative moves that are better in this position or before, changing the weights alone will not help you. You end up with a very narrow book and no more learning. A book is only as good as its moves, not as its weights. Assuming the weights already in place are not so bad relative to your book, I think your approach could not hurt. It depends on what you start out with. That being said, I had not seen any of these possiblities for booklearning code yet, or studied Polyglot so it is very interesting to read about, thanks Harm!

I think I would start with an alternative, where the initial weights can not go down to zero because I would assume it is better to repair bad lines at the move level, and keep the learning from making the book to narrow. So maybe something that keeps 90% of the initial weights intact, even if the results are bad. Then, if the results are good they could go to 110% using your formula, without the constant 10 added. I think I could increase learning, with strong learning the weights can go up if the results are good, by just taking the square of the results, so then the weights could vary from 90% to 130% after learning. Translating that to your formula maybe something like this to start with?

usedWeight = (bookWeight * 90 + bookWeight * 10 * ((scoredHalfPoints + 1.) * (scoredHalfPoints + 1.))/((nrPlayed + 1.) * (nrPlayed + 1.)))/100;

Regards, Eelco

hgm · Post by **hgm** » Wed May 11, 2011 2:12 pm

You really seem to want to reduce the effect of book learning to almost nothing. Suppose there are two moves from a given position, with equal book weights. If after having played line A 1000 times, and every single time I played it resulted in a loss, while with line B I scored 50% on average... You still want to continue playing line A with 47.5% probability (usedWeight 90 vs 100).

Surely that can't be good. Why would you want to play a line that loses 1000 out of 1000 times at all? I would say that even a usedWeight of 0 would give that line much more than it deserves. Even at 100 losses out of 100 a weight of zero sounds quite justified. And at 10 losses out of 10 I would already start to frown upon that line...

Eelco de Groot · Post by **Eelco de Groot** » Wed May 11, 2011 2:45 pm

hgm wrote:You really seem to want to reduce the effect of book learning to almost nothing. Suppose there are two moves from a given position, with equal book weights. If after having played line A 1000 times, and every single time I played it resulted in a loss, while with line B I scored 50% on average... You still want to continue playing line A with 47.5% probability (usedWeight 90 vs 100).

Surely that can't be good. Why would you want to play a line that loses 1000 out of 1000 times at all? I would say that even a usedWeight of 0 would give that line much more than it deserves. Even at 100 losses out of 100 a weight of zero sounds quite justified. And at 10 losses out of 10 I would already start to frown upon that line...

Well, Harm I suppose my thought was more that you can only do so much with booklearning by adjusting the weights. If a move loses ten games out of ten, but it is was a move from even one qualitatively good grandmaster game, it could be your engine does not understand this type position. But if you just avoid this opening altogether, you miss the opportunity to improve the engine. A good engine should be able to handle all kinds of positions. My thought was, it is not very likely the fault of this move, even if ten games were lost. Not if it came from a good quality database for instance. The problem could be in a single follow up move that might be an error introduced by bad bookmaking, an included losing line perhaps that should not have gotten a weight for playing it. There could also be one critical good move down the line but it is not played anymore, because it had a very low, incorrect weight (maybe because something else went wrong with learning this weight before). If learning is applied to all moves in a played line, I don't know that it is, but probably only one of the moves in that played line was the error, it takes only one bad opening move to lose a game, and how is booklearning going to find out by decreasing all the weights of the played moves, which one was the error? Adjusting this particular weight down, would serve not much purpose if you don't look at where the actual problem is. I still think a book is only as good as its moves...

Regards, Eelco

hgm · Post by **hgm** » Wed May 11, 2011 3:17 pm

Well, but this is exactly what book learning is expected to do: adapt the book to the engine. It is not a technique designed to make good general books. For that you can never base yourself on just a single player, but you would need a large collection of games by many hundreds of different players. That is presumably what you would have done in the first place to obtain the book weights.

But then you are left with the problem that your engine might not understand some positions, and tends to lose them even thought hey are theoretically equal. Book learning is intended to effectively remove those lines from the book, as soon as you discover them.

When your purpose is to make a general engine that understands every possition, you would of course not evaluate its performance with a book avoiding its weak spots. Books serve a purpose, and not every book can be used for every purpose. You have testing books that you use to evaluate your engine, and you have tournament books that you use to score as good as you can with a given engine in a particular tourney.

Exactly because of the reason you mention I thought it was a good idea to promote diversity in the beginning, when you still have poor statistics, to prevent move weights being reduced to nearly zero before you are really sure they are no good. The formula I initially proposed is in fact very frgiving in this respect: even with 0 out of 10 it would multiply the bookWeight by 10/11, while competing moves that scored 5 out of 10 would get 20/11. So the relative weight is still only suppressed by a factor 2, which is hardly the same as that it would never be played anymore. Even after the appalling score of 0 out of 100 it would get 10/101, and the competing 50%-scoring move would get 110/101, i.e. its playing probability is reduced by a factor 11, but still around 9% (if the book weights had been equal).

CRoberson · Post by **CRoberson** » Wed May 11, 2011 11:17 pm

I use the philosophy of "when I find a good move, I'll stick with it until it is proven bad".

There is a problem with that philosophy: what does it take to be proven bad? With too much stats a move that has been good for many games could
take too many games to prove it bad. There are several ways to deal with that.

Book learning

Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning

Re: Book learning