Position learning and opening books

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: Position learning and opening books

Post by syzygy »

Does it show a "negative attitude" when I assume that you are going to realise all these great plans? You just need to be single minded about the goals and persistent in their pursuit.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Position learning and opening books

Post by Zenmastur »

syzygy wrote:Does it show a "negative attitude" when I assume that you are going to realise all these great plans?
It does when you interject comments with no apparent point into a conversation in which you otherwise have no involvement. If you were to actually contributing something useful to the conversation, then I might see it differently. You apparently have no intention of doing this, so I see your comments for what they are, a childish attempt to hide your true intentions by the use of duplicitous sarcasm. Nothing more.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: Position learning and opening books

Post by syzygy »

My point is that you are positing truths with little to back them up.
Forrest Hoch wrote:If all available information is used the amount of learning and the time that's required can be considerably reduced. i.e. by an order of magnitude as compared to more "standard" approaches.
You also get abusive quickly. That's another point.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: Position learning and opening books

Post by syzygy »

mvk wrote:(And RdM is certainly right that objectively it is better to use the best engine for that instead of your own. A weak defence is that in my case, as white, I know for sure that I can't get out of book with a negative score and then go for a draw by repetition right after).
Unfortunately also that certainty disappears as you improve your engine's search and evaluation.

But a defense for using your own engine is not needed, as it is simply more fun than "borrowing" SF for building your engine's book.
I'm pretty sure that learning only from public games is not the best use of resources. In other words, I wouldn't expect to be able to remove dropout expansion and private games and then learn faster. On the contrary. Not because opponents can see the public games and learn from that (I care a bit about that, but not too much), but because public games are a very limited resource. There are much more moves I can add in private than if I have wait for them to appear online for the first time.
And the whole point of drop-out expansion is to expand the book there where it is likely to matter. Adding "random" games, even if they are of high quality, will not have much of an effect, as it would likely only add some moves to a line that you won't be playing anyway (or only with extremely small probability).
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Unused information

Post by sje »

There is plenty of unused information in PGN files which are used to create opening books. Maybe there are some book construction programs which use such, but the details aren't published.

1) The PGN Date tag pair. This is in the specification for more than just identification. Position W/L/D statistics taken from games played at a later date should have more weight than those played at an earlier date.

2) The PGN WhiteElo and BlackElo tag pairs. Position W/L/D statistics taken from games where the Result tag value is relatively surprising given the rating delta should have more weight than those where the result is less surprising.

3) The PGN PlyCount tag pair (value can be calculated if not present). Shorter games should have their position statistics given more weight than longer games.

4) The PGN White and Black tag pairs when combined with reliable player data. Games won by younger players during a period of rapid rating increase should be given more weight than games won by older players during a period of end-of-career rating decline. (Obviously, computer players are not included here.)

5) The PGN Event tag pair when combined with accurate event information. Games played in big money events should be given more weight than those from less impressive tournaments.

6) The PGN Site tag pair, when used to discriminate among correspondence events, ICS events, and OTB events. A game played by snail mail over a year or two should be given more weight than a quick ICS game.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: Position learning and opening books

Post by mvk »

syzygy wrote: And the whole point of drop-out expansion is to expand the book there where it is likely to matter. Adding "random" games, even if they are of high quality, will not have much of an effect, as it would likely only add some moves to a line that you won't be playing anyway (or only with extremely small probability).
It is where the resources are spent and what they bring.
1. With dropout expansion I essentially expand 1 step at a time across the whole repertoire's horizon. Very robust, but it can be slow to find simple things
2. With game playing (using the own book) you expand a game length worth of plies, so you learn about deeper problems sooner, especially the specific ones your own program is sensitive to.

But ideas are cheap and magical thinking even more. Measurements count. I believe I have enough data to quantify the effect of the several expansion methods I have used. It just has to be extracted. There is a MAB problem in balancing resources spent on the various expansion algorithms, so this might be a fruitful exercise.

I recall that I did all kinds of statistics negamaxing in the previous version of my program, and I have written something about it at the time. The fundamental concern I have with it is that the value of statistics only increases with the sqrt of the number of underlying games, and games are expensive. Therefore I consider searches more valuable near the tips. And closer to the root negamax does the job. Looking at statistics of arbitrary games feels like a very slow Monte Carlo experiment with an embedded data quality problem, especially from human games. It might be better to generate many fast games instead of relying on external PGNs for that. But then also, if you need all that to correct the engine's judgement, you why not use the data to improve the evaluator, if that is the root problem. Just my thoughts. I'm not a statistician, so I'm happy to be proved wrong.
[Account deleted]
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: Position learning and opening books

Post by mvk »

mvk wrote:There is a MAB problem in balancing resources spent on the various expansion algorithms
Oops. I just meant to write "optimisation problem" there, but the 15 minute limit doesn't allow me to correct that anymore.
[Account deleted]
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Position learning and opening books

Post by Zenmastur »

Forrest Hoch wrote:If all available information is used the amount of learning and the time that's required can be considerably reduced. i.e. by an order of magnitude as compared to more "standard" approaches.
syzygy wrote:My point is that you are positing truths with little to back them up.
A couple of examples might help:

1.) I already pointed out that playing games to collect statistical information is more time consuming than using an engine search. They can both be used to uncover poor play. The difference is that to do it statistically many games (10 to 100's or more depending on various factors) must be played, each of which requires many engine searches ( more than 110 on average for computer games). So, of course, it will be orders of magnitude faster to do things with an engine search than by collecting game statistics.

2.) Adding a large number of positions to the book when its created is much faster than adding them through game play. The rate positions are added during creation is well over 100K positions per second. When any "reasonable" criterion is used for adding positions from engine play the rate positions are added is about 1 position per game. Even if only a small percentage of the positions that a book is created with are useful it's still orders of magnitude faster than adding those positions by playing games.

It takes a lot of good decisions to make a project work. A single poorly thought out decision can ruin a project. Ill considered comments like the following:
syzygy wrote: ... Adding "random" games, even if they are of high quality, will not have much of an effect, as it would likely only add some moves to a line that you won't be playing anyway (or only with extremely small probability).
Can be taken incorrectly. This could easily happen due to its dubious wording. e.g. what is a high quality "random" game and what relationship do they have to making an opening book? This could mislead someone into using overly restrictive rules for a position being included in the book. What harm will be caused by having unused positions in an opening book? It isn't likely to slow down access to the good data, so what does it matter? On the other hand, not including a 100,000 positions that the engine will use can cost a 1,00 hours or more in additional playing time to get these position added to the book.

As far as wasting disk space, I will simply note that book creation only happens once per book and hard-disk space cost about $0.0004 per Mb. The time lost recreating/adding positions that could have been included at creation is worth orders of magnitude more than the cost of the wasted disk space. I consider the 100's of hours of game play required to replace a few seconds of extra time during book creation a difference of at least 3 orders of magnitude.

So I don't think I need to "prove" anything to justify my previous statements. I think they stand on their own merits.
syzygy wrote:You also get abusive quickly. That's another point.
That's a matter of opinion and your behavior is also subject to question. In any case, I suspect no more so than you!

I have no problem with anyone that wants to contribute ideas, previous experience, or constructive comments.

But that's not what you did.

sje wrote:There is plenty of unused information in PGN files which are used to create opening books. Maybe there are some book construction programs which use such, but the details aren't published.

1) The PGN Date tag pair. This is in the specification for more than just identification. Position W/L/D statistics taken from games played at a later date should have more weight than those played at an earlier date.

2) The PGN WhiteElo and BlackElo tag pairs. Position W/L/D statistics taken from games where the Result tag value is relatively surprising given the rating delta should have more weight than those where the result is less surprising.

3) The PGN PlyCount tag pair (value can be calculated if not present). Shorter games should have their position statistics given more weight than longer games.

4) The PGN White and Black tag pairs when combined with reliable player data. Games won by younger players during a period of rapid rating increase should be given more weight than games won by older players during a period of end-of-career rating decline. (Obviously, computer players are not included here.)

5) The PGN Event tag pair when combined with accurate event information. Games played in big money events should be given more weight than those from less impressive tournaments.

6) The PGN Site tag pair, when used to discriminate among correspondence events, ICS events, and OTB events. A game played by snail mail over a year or two should be given more weight than a quick ICS game.
Most of these suggestions seem to be an attempt to address game quality. I have thought about pre-processing PGN files prior to using them to make books. I don't like the idea in general because it takes time. On the other hand human game databases suffer from all manner of errors. Some of the errors aren't of great consequence. Other less apparent ones are.

One problem that concerns me about human game databases are the results tag. There are many games that have results that aren't congruent with the terminal position. I've seen a lot of games that were drawn, but when the terminal position is analyzed it's winning for one side or the other. An other problem is when one side has a clear advantage out of the opening but blunders and loses the game. But I don't have a good solution that is also quick and easy to implement.


Computer games suffer from the screwed up an inconsistent way the openings are handled. In many of the PGN's of these game there is no indication of how the opening was played. This is a problem when using these game to create books. If it weren't for these problems this would be an excellent source of games.

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
User avatar
Ozymandias
Posts: 1535
Joined: Sun Oct 25, 2009 2:30 am

Re: Position learning and opening books

Post by Ozymandias »

Zenmastur wrote:There are many games that have results that aren't congruent with the terminal position. I've seen a lot of games that were drawn, but when the terminal position is analyzed it's winning for one side or the othe
It's worse with correspondence games. About 10% have a wrong result, although we aren't talking about draws here, but rather the winning side resigning, for any number of outside-the-board reasons.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: Position learning and opening books

Post by mvk »

Zenmastur wrote:There are many games that have results that aren't congruent with the terminal position. I've seen a lot of games that were drawn, but when the terminal position is analyzed it's winning for one side or the other. An other problem is when one side has a clear advantage out of the opening but blunders and loses the game. But I don't have a good solution that is also quick and easy to implement.
I prescan foreign games to combat this. Scan the game, starting from the start, and perform at each game position both a shallow and a little bit deeper search, until at both depths abs(score) exceeds 2.5. For most positions the deeper search can be skipped due to short cut evaluation of the && condition, until near the end of the prospect line. The resulting line is then my line of interest, and what happens after that I don't care, including the game result. Secondly, there is a blunder filter in that screening, needed for PGNs of human games. If any side makes a "too big and unneeded error" (in quotes because there are some details here to get right), I throw away the line anyway.

This is only a vetting process, before a game can even be considered for further processing.
[Account deleted]