Evaluation reflections

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

eligolf
Posts: 114
Joined: Sat Nov 14, 2020 12:49 pm
Full name: Elias Nilsson

Evaluation reflections

Post by eligolf »

I finally managed to get my engine to use uci protocol, so happy :) And also so sad since I can now compare my engine to other engines and see how bad it performs hehe.

From previous discussions I have now only used PST tables and tapered eval to interpolate between mid/endgame. I was disencourage to incorporate other factors along with the PST tables due to the complicated process of optimizing all parameters. I now used PST tables which were already optimized by someone (can't remember who now, but taken from late videos in the Maksim YouTube playlist on BBC).

After performing some tests against e.g. TSCP and looking at the games I noticed several easy positional mistakes from my engine. It often disregards bad double/isolated pawns and gladly takes a knight on say c3 with the b pawn instead of taking with an undeveloped knight on b1. It frequently (almost always) places a bishop on d3 or e3 in front of undeveloped d and e pawns, to protect a double pushed pawn on d4 or e4, instead of just pushing the undeveloped e2/d2 pawn one step. In a position as below, it would most likely place the rook behind the pawn instead of keeping it active by playing Rc2 (maybe in this exact position it is sound, but in general it would be better to play Rc2 instead of Re1 to keep an active rook).

[d]7k/4p3/4r3/8/8/8/4P3/2R4K w - - 0 1

So I was thinking of adding some simple things as complement to PST tables to get rid of these issues. Just something such as:

Code: Select all

if opening_phase:
    bishop_base_value -= x if bishop on d3 and own pawn on d2
And maybe also:

Code: Select all

if doubled_pawn:
    pawn_base_value -= x
Tuning this would be hard since the PST tables are already tuned to exist by themselves. Only testing will show if there will be a significant increase in strength or just ruin things completely.

What are your thoughts on this? What would be the best way to get a better evaluation (since it is obviously not good at the moment)? What would be a good next step for a beginner chess programmer, and programmer in general?
Harald
Posts: 317
Joined: Thu Mar 09, 2006 1:07 am

Re: Evaluation reflections

Post by Harald »

In the beginning I would not use a very tuned piece square table but use a more simple one. That should give all your pawns and pieces a general direction and a frame where they belong. The different values on the squares compared to a tuned PST should step by step be explained and achieved by additional positional terms of other evaluation features. All values should always be just big enough and you should take into account the already existing terms. Do not use a high value near a full 100cp value just to forbid your engine to do a stupid move. This would backfire in other positions. As soon as the positional values sum up to a value bigger than a pawn (100cp) the engine may lose that pawn. Make sure it is worth it.

Instead try to see the bigger picture and find a more general evaluation feature. You could start with some simple and easy to write features from different perspectives.

A mobility term where you count every possible move of your pawns and pieces. Every move could be counted as 1cp or you use a different weight depending on the piece, the centralisation or rank of to-squares (this is already a hint of the possibility and probability of the piece's future on a square with a better PST value) or if that square is free, occupied by your opponent (possible attacks) or by own pieces (possible defends) or if the to square is attacked or defended. Give a bonus for attacking weak pawns (see below). That will take some time and should start simple. It can be improved later.

A king safety term that just counts the pawns (and if there are not enough some own minor pieces) around the king (up to 8 squares around it) and pawns on the three squares in front of this area. The king should stay in a corner behind its pawn shield in the middlegame. but that is already achieved by the PST. Instead the pawns are hold back to the fixed king with the king safety evaluation. In a more advanced evaluation you also count the piece attacks and hidden attacks to your king and its pawn shield by enemy pawns and pieces. But that is not so easy to implement for distant sliders. Beware of rooks on (half) open files to or next to your king. Give yourself a penalty or your opponent a bonus for that.
If you have more than one penalty for the king safety give an extra penalty increasing with the number of these penalties.

Passed pawns are very dangerous and should get a high value. But not too high and too optimistic. Rather add some extra centipawns if the passers are defended or are aligned by another own pawn. Check if the square in front of it is free and not attacked. If you have two passers that are many files apart that is also dangerous and should be awarded some points. Give a rook behind your passer a few points

Give some extra points to a player with both bishops.
Take some penalty points for doubled pawns. Increase this if in the center or in your king shield.
Take some penalty points for isolated pawns. Increase this if in the center or in your king shield.
Give some extra points for rooks on (half) open files.
Give some extra points for rooks on the 7th rank especially if there are enemy pawns on that rank and if the enemy king is still on its first two ranks.
Give a bonus for your knights depending on the distance to the enemy king.
Give a bonus for queens and bishops for being on a file or diagonal that leads to the enemy king. You can use precalculated tables for this or calculate real attack rays which is far more complicated.
Give a bonus for minor outposts (unattacked knight or bishop on ranks 4 to 7) especially on the opponent king side and if defended.

Have a look at your pawn structure.
Give penalties for weak pawns (google this specific pattern).
Give a bonus for pawns defending pawns and aligned pawns.
Give a bonus for advanced pawns near the opponent king but don't ruin your king's pawn shield (pawn storm).
Count the blocked pawns (rams) in the center and on both sides. These numbers (closed or open position) may have influence on the weight of other features. (http://www.fraserheightschess.com/Docum ... _Types.pdf)

Try to detect pins to the king or queen and give that a bonus.
Try to detect x-ray attacks on pieses through pawns or pieces and give that a bonus.

In the opening give a penalty for undeveloped pieces (knights and bishops) on the first rank.
Give a penalty for these pieces in front of pawns on d2, e2.
Do not give a bonus for losing the castle possibilities (that is taken care of in king safety).
Do not give a bonus for keeping the castle possibilities (the engine would avoid castling).

In the endgame (when most pieces are gone) implement the rule of the square (king and enemy passers) with a high penalty.
If the opponent has one bishop only your king should avoid the corner that can be reached by this bishop.
Give a bonus if a piece or the king defends an own passer and the square in front of it.
Give a bonus if a piece or the king attacks an enemy passer and the square in front of it.

May be I forgot something or not everything is right or important. It is your evaluation that you can improve every time.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Evaluation reflections

Post by Sven »

I would focus on getting a strong, bug-free search with a decent effective branching factor that allows to reach depth 10-12 in middle game at least. I don't know for sure whether your engine has already reached that state but I vaguely remember it hasn't. With a strong search and a very simple material+PST eval you can already reach 2200 Elo @CCRL or more. Improving the evaluation early is tempting but you won't get any convincing strength improvements that way when your engine blunders frequently due to its low search depth. Furthermore, it will also avoid several positional errors by searching deeper and detecting that they would lead to a bad position.

Once you start working on a better static eval you will notice that it requires a lot of tuning and testing. Each single eval change might require to re-tune all parameters. There seems to be no way to avoid that if you want to make progress.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
eligolf
Posts: 114
Joined: Sat Nov 14, 2020 12:49 pm
Full name: Elias Nilsson

Re: Evaluation reflections

Post by eligolf »

@Harald, thank you for the extensive input. There are many concepts which will be too time consuming for my engine (10x12 board representation) but it will surely be possible to implement variants of many of the,.

@Sven, you are right, that is true. Right now with only alpha/beta, complete move ordering, check extensions and quiescence I reach a depth of 5 in say 3-5 seconds in a mid game position. The nodes/s is somewhere between 20-30k which is very low, but I can't reach much higher with the current setup (using Python is one of the drawbacks). Implementing nullmoves, TT and LMR will surely help though. I hope to reach at least depth 8 with those features included :)
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Evaluation reflections

Post by mvanthoor »

eligolf wrote: Wed Jan 20, 2021 7:28 am What are your thoughts on this? What would be the best way to get a better evaluation (since it is obviously not good at the moment)? What would be a good next step for a beginner chess programmer, and programmer in general?
The behavior you describe is normal for chess engines that are just starting out.

My engine also only has PST's and material knowledge, and it also drunkenly sways between rook Rc2 and Re1, depending on the depth. As soon as mobility is implemented, even if it is as simple as "how many squares are available to the piece after that move", the eval will see that the rook is more mobile on c2 than it is on e1, and thus choose C2.

In the end, I'd keep the PST's as general as possible that is: don't use the PST to compensate for missing knowledge, as you'd have to undo that again later. (Doesn't really matter if you use a tuner; then it'll be done automatically).
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
Ronald
Posts: 160
Joined: Tue Jan 23, 2018 10:18 am
Location: Rotterdam
Full name: Ronald Friederich

Re: Evaluation reflections

Post by Ronald »

I agree with Sven,

Using PST as only chess knowledge helped me to focus on getting a solid search engine first. After adding more chess knowledge the search parameters stayed pretty much the same, and changing them always led to worse play. The only changes I made to the search parameters concerned LMR/LMP, probably because a better evaluation also leads to a better history and thus a more aggressive LMR is possible.

In TCEC season 17 because 1 engine could/would not compete in the qualification league there was a spot free for PeSTO, an "experimental" version of rofChade which only uses PST as chess knowledge, to see well how a PST only engine could do. The current CCRL blitz rating is 2970 for the single thread version, so this shows that a good search engine is just as important as a good evaluation.
eligolf
Posts: 114
Joined: Sat Nov 14, 2020 12:49 pm
Full name: Elias Nilsson

Re: Evaluation reflections

Post by eligolf »

Yes I read something about rofChade and its amazing rating with only PSt tables. I almost have a hard time believing it since I see the stupidity of my engine, making all sorts of beginner mistakes and choices. But I guess if you can search for 50 moves ahead then it will matter less :)
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Evaluation reflections

Post by Dann Corbit »

If you would like to see a really simple evaluation that is effective, look at Olithink.
The whole program is about 50K and this list:
http://ccrl.chessdom.com/ccrl/4040/
has it ranked above programs like Nimzo and Yace.

The main ingredient of the Olithink evaluation is mobility, a term that I think is probably overlooked or under-valued in many evals.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Evaluation reflections

Post by Dann Corbit »

Ronald wrote: Wed Jan 20, 2021 3:00 pm I agree with Sven,

Using PST as only chess knowledge helped me to focus on getting a solid search engine first. After adding more chess knowledge the search parameters stayed pretty much the same, and changing them always led to worse play. The only changes I made to the search parameters concerned LMR/LMP, probably because a better evaluation also leads to a better history and thus a more aggressive LMR is possible.

In TCEC season 17 because 1 engine could/would not compete in the qualification league there was a spot free for PeSTO, an "experimental" version of rofChade which only uses PST as chess knowledge, to see well how a PST only engine could do. The current CCRL blitz rating is 2970 for the single thread version, so this shows that a good search engine is just as important as a good evaluation.
By PST only, do you mean that it had no other terms (not even wood-count)?
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Ronald
Posts: 160
Joined: Tue Jan 23, 2018 10:18 am
Location: Rotterdam
Full name: Ronald Friederich

Re: Evaluation reflections

Post by Ronald »

It also has a separate "wood count", but you can put it in the PST by simply adding the piece value to every square of the corresponding PST. The only other element in PeSTO is a Tempo bonus.