Improving your engine performance: the issue with ELO

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Chessnut1071
Posts: 313
Joined: Tue Aug 03, 2021 2:41 pm
Full name: Bill Beame

Improving your engine performance: the issue with ELO

Post by Chessnut1071 »

For those developers writing their own engines, the procedure for improving performance usually involves testing modifications to the evaluation function by means of ELO scores, requiring the game to go to win or loss. Here's the problem. The best programs today use databases for the opening and end game solutions. There's no chess tactics, just raw hash tables. So, the middle game requires chess intelligence, or, some form of A.I. which doesn't rely on large hash tables. If you develop a strong evaluation function, but, your opening and end game tables are weak, there's no way to score accurately your middle game performance using ELO, which requires a success, failure or drawl. There is a way; however, to score middle game performance without using ELO, but something very close to it. Minimizing the move set from grandmaster games, i.e., minimizing the number of moves before your evaluation function finds the grandmaster move.

I tried this approach on two opening/defenses in the English Opening, 1. Pc4. The opening consisted of approximately 11 moves before it reached the middle game where the engine had to find the best move. Optimizing 60 metrics-30 white & 30 black- using a modified Hooke & Jeeves optimization, I found that this method got 38 out of 44 moves on the first move, and did no worse then 4 moves before it found the grandmaster move. When I used the same openings and end game tables, it won every game. Unfortunately, when I tried it against Stockfish & Fritz it went down in flames, not because of my engine, but, because of my end game hash tables.

I'm thinking I have a better middle game evaluation function, but, no way to prove it without those end game hash tables. Any body know where you can find them in the public domain?
Luecx
Posts: 138
Joined: Thu Jun 18, 2020 9:20 pm
Full name: Finn Eggers

Re: Improving your engine performance: the issue with ELO

Post by Luecx »

ELO is un unfeasible loss function. I agree with you on that but what is wrong with the classical aproaches for tuning the evaluation function?
Especially like taking fens which have been played by the engine itself, extract a few quiet positions and try to fit the evaluation using a sigmoid to the WDL value of the game. That seems to perform very well across all engines used to be the standard aproach for tuning any HCE. With that aproach, we reached 3200 Elo.
The ability to speak does not make you intelligent. https://github.com/Luecx/Koivisto

Image
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: Improving your engine performance: the issue with ELO

Post by brianr »

The most common endgame tablebases used can be found here (scroll to Downloads section):
https://syzygy-tables.info/

Within a chess programming context the term "hash tables" is generally not used for EGTBs nor opening books.
"Hash tables" typically refers specifically to parts of saved search trees (and often pawn or king/pawn positions) in chess engines.
The opening books and EGTBs are generally considered databases.
Creating them are highly specialized subdomains within the broader chess programming area.

Using grandmaster positions will provide minimal, if any, benefit.
There are far too few of them (on the order of a few thousand) to enable effective automated tuning.
Moreover, grandmaster positions are generally too weak to be helpful with engines, and have been for many years.
Minimizing the move set from grandmaster games is a variation on test set tuning, which can be marginally helpful.
However, playing engine matches (current v prior version self-play, and v pool of opponents, albeit often less often) is the best indicator of Elo improvements.

Incidentally, engine strength is a function of the combination of both search and evaluation.
They combine to form a fairly unique ecosystem for each engine.
Improving both is necessary for the strongest results.