From PGN databases to a better evaluation

Harald · Post by **Harald** » Wed Jul 11, 2007 10:13 pm

Use a PGN database to improve the evaluation.

Larry Kaufman did it, Rybka does it, Strelka does it and I nearly did it.

No, not really. I never found the time to even begin with it, but I had
some ideas that I released in a chat and a forum. Vasik's reaction was:
"It will not work that way." And that gave me hope. If I had found a way
to do it, my engine would probably have jumped from 2000 ELO to 2200.

So how does it work? I do not like to reverse engineer the assembler code
of another engine. But I would like to discuss and exchange ideas about
the fine art of converting huge PGN game databases to a better evaluation.
Is there a way to use and improve the method that Kaufman used for his
imbalance and material evaluation with the computer. A tool program could
scan the PGNs and write a database to be included in the chess engine.
What should this program do and what data should it produce?

My original idea was to build a key or footprint of a position, look it
up in an idea database or midgame database and find either some hints
for the move ordering or the evaluation function. Perhaps corrections
for material values or good squares for some pieces. I was also inspired
by some graphs from SCID where we can see the probability for a piece
to be on a square in a game with some attributes.

The tool program should scan through the PGNs, discard the bad ones
(too short, too weak players, no opening moves, no endgame moves)
and build the key/footprint of each position. From the game result
a bonus or score is derived that can be used in the database building
process.

A possible key/footprint would be the material for both sides or the
piece signature for both sides or some bits to show the position of
important pieces like king position in some board areas or the
existence of passed pawns or for each side and pawn and file and
rank/4 a bit. Do not forget the side to move. Just something
descriptive in 32 to 64 (or even 128) bits.

In the first step such information is collected and weighted with
number of occurance and game result. We can even collect the moves from
there. This database may need a few hundred gigabytes. No problem.
Then we have to decide which position, move or info is important enough
for the midgame database. Build it and include it in the engine.

The easiest start would be to collect the piece signatures for both sides,
the number of positions with this material and the winning percentages.
From that an evaluation bonus could be derived. If this does not work
then add a bit for passed pawns or exposed kings. And so on.

Can we use the bitboard infos?
if (bb_white_king & bb_some_area) set_a_bit_in_key();

Can we please discuss some of this ideas instead of clones.
What about engineering and not reverse engineering?

Harald

Michael Sherwin · Post by **Michael Sherwin** » Wed Jul 11, 2007 10:54 pm

Here is my idea for the opening and early middlegame. Run a pgn database through a data collection utility that works like the following. In an extreamly large table on the hard drive store the following information for every position that has a certain number of pawns remaining on the board, a pawn hash signature, a material hash signature, some statistics. Then reduce this down so that the data is managable and the most relevant pawn formations and material balances are kept. Craete an exe internal hash to look up a small adjustment to the eval. Or something like that!

Tord Romstad · Post by **Tord Romstad** » Wed Jul 11, 2007 11:01 pm

Michael Sherwin wrote:Here is my idea for the opening and early middlegame. Run a pgn database through a data collection utility that works like the following. In an extreamly large table on the hard drive store the following information for every position that has a certain number of pawns remaining on the board, a pawn hash signature, a material hash signature, some statistics. Then reduce this down so that the data is managable and the most relevant pawn formations and material balances are kept. Craete an exe internal hash to look up a small adjustment to the eval. Or something like that!

That's actually very similar to something I am working on at the moment. It's still far too early to say how well it will work.

By the way, in case you wonder: I got your e-mail - I'll try to reply to it tomorrow.

Tord

Eelco de Groot · Post by **Eelco de Groot** » Thu Jul 12, 2007 12:40 am

Hello guys,

Just a silly idea I had, instead of using the game results from human games, I thought you could have much additional data and better suited to computer chess, simply from all those computer testgames that are played. I suspect that something like this could be one of the things that Vasik has been doing since a long time ago.

First rough sketch: maybe you could, for every time there is a trade down in material in a testgame, compare the eval with the situation ten, or fifteen, whatever works best, moves later. Group the results by material on the board, and especially cases with resulting material imbalances. If the results after ten moves are much better than initial eval you can be more optimistic about the root trade, if results are much worse the engine should not trade down in this situation if it can avoid it. Of course after the classification in material, the positions could be further divided by pawn formation etc., and a database ultimately (inside a chessengine eventually) could compare a situation on the board with the most closely matching position with the same material, piece placement and pawn formation. Each of these positions in the database has to have its own adjustment factor, ultimately in the exe's datatable, but you could start maybe building it with the difference in eval (move 10 - move 0)/10. The datatable could start to be built with the cases with the greatest average difference found in the games database, because there the correction would have the greatest effect.

Of course all this would have to be sanity checked, compared with values as Larry Kaufman gave them etc. and compared with human rules.

[Next refinement ideas: All the cases where the difference in eval was likely the result of positional or tactical advantages that were rarely present in the average case, would have to be filtered out somehow, you could do that by playing more testgames starting from the rootposition, the roottrade. Ideally the law of large numbers should give you a decent correction factor, then you could go on to the next case. Large eval differences because the opponent computer played badly, missing knowledge, are another factor, you could start hunting for those cases by comparing databases with different computer opponents I think.]

There can better not be huge jumps in the eval because of introduced corrections so I think you would have also to consider introducing some kind of gradual increase or decrease. But the advantage is I think you would start now each time the engine can trade down material to introduce a bonus or malus, and at such points a discontinuity in the eval is already taking place, because there are pieces disappearing from the board.

I don't think I'm describing anything radically new, it's just some loose thoughts that I had just now. I hope Vasik does not have too much trouble with all these speculations and attempts at dissecting his poor Rybka..

Regards, Eelco

P.S This kind of database construction I think can help explain also some of the basic quirks Rybka can have, when there is an unusual material situation on the board, such as one or two extra rooks in the opening position. Such positions will not be found in the database, so from that point on the engine has to calculate and think on its own without the use of its tables, resulting in strange evals sometimes if this process is buggy and I think longer searches in general. But I only remember one or two cases where I saw this kind of thing happen myself with Rybka 1.0 beta giving it an extra rook.

Eelco de Groot · Post by **Eelco de Groot** » Thu Jul 12, 2007 12:59 am

Next refinement idea: introduce also human games, but analyzed by your engine. If there are the same kind of jumps or drops in the eval, this can point you to where your engine is missing knowledge, and as far as I can see should neatly fit in with the whole database idea of generating correction factors for trades.

From PGN databases to a better evaluation

From PGN databases to a better evaluation

Re: From PGN databases to a better evaluation

Re: From PGN databases to a better evaluation

Re: From PGN databases to a better evaluation

Re: From PGN databases to a better evaluation