Larry Kaufman did it, Rybka does it, Strelka does it and I nearly did it.
No, not really. I never found the time to even begin with it, but I had
some ideas that I released in a chat and a forum. Vasik's reaction was:
"It will not work that way." And that gave me hope. If I had found a way
to do it, my engine would probably have jumped from 2000 ELO to 2200.

So how does it work? I do not like to reverse engineer the assembler code
of another engine. But I would like to discuss and exchange ideas about
the fine art of converting huge PGN game databases to a better evaluation.
Is there a way to use and improve the method that Kaufman used for his
imbalance and material evaluation with the computer. A tool program could
scan the PGNs and write a database to be included in the chess engine.
What should this program do and what data should it produce?
My original idea was to build a key or footprint of a position, look it
up in an idea database or midgame database and find either some hints
for the move ordering or the evaluation function. Perhaps corrections
for material values or good squares for some pieces. I was also inspired
by some graphs from SCID where we can see the probability for a piece
to be on a square in a game with some attributes.
The tool program should scan through the PGNs, discard the bad ones
(too short, too weak players, no opening moves, no endgame moves)
and build the key/footprint of each position. From the game result
a bonus or score is derived that can be used in the database building
process.
A possible key/footprint would be the material for both sides or the
piece signature for both sides or some bits to show the position of
important pieces like king position in some board areas or the
existence of passed pawns or for each side and pawn and file and
rank/4 a bit. Do not forget the side to move. Just something
descriptive in 32 to 64 (or even 128) bits.
In the first step such information is collected and weighted with
number of occurance and game result. We can even collect the moves from
there. This database may need a few hundred gigabytes. No problem.
Then we have to decide which position, move or info is important enough
for the midgame database. Build it and include it in the engine.
The easiest start would be to collect the piece signatures for both sides,
the number of positions with this material and the winning percentages.
From that an evaluation bonus could be derived. If this does not work
then add a bit for passed pawns or exposed kings. And so on.
Can we use the bitboard infos?
if (bb_white_king & bb_some_area) set_a_bit_in_key();
Can we please discuss some of this ideas instead of clones.
What about engineering and not reverse engineering?
Harald