Improving engine evaluation

OliveriQ · Post by **OliveriQ** » Sun May 22, 2022 3:59 pm

So far I've focused mostly on improving my search function, and have implemented most of the standard pruning techniques. At this point, every other pruning technique that I could implement is either too risky or just simply not worth it. Hence, I want to improve my evaluation function, which at the moment consists of PSQT's + Material and a tappered evaluation interpolating the MG and EG scores. Both the PSQT's and the material values are incrementally updated in make and unmake move. I also use Mop-up for converting late endgames but that isn't really relevant. So, what else can I add that is cheap to implement and can increase the playing strength of my engine to a reasonable degree? I've tried things like mobility, doubled pawn penalties, open file bonuses but they all seemed to slow down the engine and didn't seem worth it after some testing. I also thought about implementing a NNUE, but I don't know much about the topic. The elo estimate is around 2100-2200, leaning towards 2200. Thanks in advance!

JVMerlino · Post by **JVMerlino** » Sun May 22, 2022 7:38 pm

None of the things you mentioned, by themselves, will "increase the strength of [your] engine to a reasonable degree", at least, not what *I* consider "reasonable". Probably an average of 5 elo at most for each one? One you didn't mention that is super easy is an MG/EG bonus for the bishop pair. But all of the simpler things (your list plus bishop pair plus the many possible pawn structure terms), do add up to a nice bonus. But IMHO, the biggest boost comes from a good king safety implementation, which is not cheap or easy.

But the good news is that the expense of adding more eval terms can be offset a fair bit by adding eval and pawn hash tables, the latter of which you don't need at all until your pawn structure evaluation becomes more complex.
Then you can Texel-tune all of those numbers and get a shocking boost. I worked on my engine for more than ten years before I bit that bullet, tuning everything except the PSTs, and got +130 elo.

hgm · Post by **hgm** » Sun May 22, 2022 9:38 pm

In Fairy-Max I observed a gain of about 50 Elo by recognizing passers, and scoring their advance. (As I reported in another recent discussion here.) Although this was on a larger board with faster Pawns.

The point is that you should do it through a Pawn hash table, or it would slow down the engine so much that you offset the benefit. With a Pawn hash table it should not slow you down much; you just have to update a Pawn hash key. In Fairy-Max I even used a 32-bit key for that, although that should not have any advantage if you are running on an x64 architecture.

lithander · Post by **lithander** » Sun May 22, 2022 11:07 pm

hgm wrote: ↑Sun May 22, 2022 9:38 pm With a Pawn hash table it should not slow you down much; you just have to update a Pawn hash key. In Fairy-Max I even used a 32-bit key for that, although that should not have any advantage if you are running on an x64 architecture.

With a bitboard engine you can just use the pawns-bitboard as a hashkey. You need additional information in the entry to ensure equality but it makes the lookup into the hash table super-cheap as you save the overhead of updating a dedicated pawn hash key.

hgm · Post by **hgm** » Mon May 23, 2022 7:22 am

You would still have to distinguish the pawns by color. Also, these boards would not be very good keys if you use the standard way of deriving the table index by masking out a subset of the bits. This would then ignore part of the board, with as a result that many pawn structures would map to the same entry, while many entries would never get populated.

You could get a good key by calculating C1*((pawns & white) >> 8) + C2*((pawns & black) >> 8) with randomly initialized constants C1 and C2, though. If you would use the high bits for index. This is not so cheap anymore, though.

Btw, did anyone experiment with last-recent-used replacement in the pawn table? Like one uses in the killer slots? I only ever tried always replace, but I can imagine that LRU woul drive up the hit rate. (Or allow a smaller table.)

lithander · Post by **lithander** » Mon May 23, 2022 10:58 am

hgm wrote: ↑Mon May 23, 2022 7:22 am You would still have to distinguish the pawns by color. Also, these boards would not be very good keys if you use the standard way of deriving the table index by masking out a subset of the bits. This would then ignore part of the board, with as a result that many pawn structures would map to the same entry, while many entries would never get populated.

I modulo with the number of hash-table entries to get an index into the array. If the size of the table is a prime all bits of the hash should get utilized. And I have a average hit-rate of over 90% with a table size of 4999 entries which is much better than if the table-size is for example power-of-two.

Might be that your approach with the constants would drive the hit-rate up even more but I thought everything above 90% on such a small table is "good enough"? If you expect much better results with a better hash I should probably investigate again...

hgm · Post by **hgm** » Mon May 23, 2022 11:22 am

The problem is that modulo is an extremely slow operation (which could take some 35 clock cycles). I remember Bob Hyatt reporting that by using modulo instead of masking (and a power-of-2 hash size), and no other changes in Crafty, he saw a speed drop of 30%.

This in particular applies to modulo by a variable that is unknown at compile time. (And I suppose the hash size is, as it should be adjustable through an engine option.) For modulo by a constant the compiler can use a trick: it can calculate the inverse of the denominator at compile time, (or (1LL<<64)/N to keep it an integer) and then multiply by that (which only takes a single cycle) to mimic a division. After truncation to the integer part tt could then multiply back with the denominator, and take the difference with the enumerator to get the modulo.

I suppose you could program it that way, calculating the (scaled) inverse at the time the hash-table size is changed. I don't think the optimizer could be smart enough to do that. Because it has no clue that you probe the table much more often than you would change its size.

With the table size a prime your index calculation is probably OK (i.e. gives an even distribution over the table entries). But just because of the modulo it is probably much more expensive than incrementally updating a second Zobrist key.

Might be that your approach with the constants would drive the hit-rate up even more but I thought everything above 90% on such a small table is "good enough"? If you expect much better results with a better hash I should probably investigate again...

It depends on how expensive the from-scratch calculation is for the stuff you want to store. If that would involve running a neural net with a million cells, even having to do it only in 1% of the probes might make it the bottleneck of your engine's speed. For (large) mailbox boards identifying passers is not as cheap as with a small bitboard. And the more info you want to store in the table, the more calculation it will require in case of a miss.

lithander · Post by **lithander** » Mon May 23, 2022 11:49 pm

hgm wrote: ↑Mon May 23, 2022 7:22 am You could get a good key by calculating C1*((pawns & white) >> 8) + C2*((pawns & black) >> 8) with randomly initialized constants C1 and C2, though. If you would use the high bits for index. This is not so cheap anymore, though.

93,05% Hitrate with 4999 table size
...and with your C1 + C2 way of computing the hash:
94,49% Hitrate with 4999 table size

I can also still improve the hitrate with just doubling the table size:
93,98% Hitrate with 9931 table size
...and with your C1 + C2 way of computing the hash:
95,55% Hitrate with 9931 table size

Or with 10x the table size
95,23% Hitrate with a table size of 54667
...and with your C1 + C2 way of computing the hash:
97,04% Hitrate with a table size of 54667

So the "dumb" hash of just using the pawn bitboard directly seems to lose me 1.5% of the potential hits. The limited number of table entries is losing the rest. And of course 100% is impossible. The table needs to be filled first before you can reuse a position.

hgm wrote: ↑Mon May 23, 2022 7:22 am ...the standard way of deriving the table index by masking out a subset of the bits.

I don't really understand how that will support other sizes than power-of-two for the hash table? Not important for the pawn table but for the transposition table the user could expect to be able to set 100 MB of hash-size and then what should I do? Just use the power-of-two that gets me closest to 100MB total? Modulo is much more flexible...

So I tried to synthetically "benchmark" the operations on my PC:

Code: Select all

const int HASH_TABLE_SIZE = 4999;
ulong[] N = new ulong[100];
int X = 0;

for (int j = 0; j < 5000000; j++)
    for (int i = 0; i < N.Length; i++)
    {
        X += (int)(N[i] * HASH_TABLE_SIZE);
    }

And I used different operators in place of the multiplication.

DIV 0,35 seconds!
REM 0,40 seconds!
MUL 0,28 seconds!
ADD 0,25 seconds!

So the Modulo-Operator is expensive but not quite expensive enough to go out of your way to avoid it. I just like simple more than fast, I guess!

hgm · Post by **hgm** » Tue May 24, 2022 7:28 am

This is still not refuting what I said: you divide by a constant that is known at compile time. Have you looked at the assembly code to make sure a div instruction was used?

You should try x += HASH_TABLE_SIZE / N[ i]; to force the optimizer to actually divide in the loop, or it will optimize by calculating 1/HASH_TABLE_SIZE out of the loop.

The hitrate was a concern only with power-of-two table sizes. So I am not surprised that for the tablesizes you use it is OK.

Btw, rather than testing the effect of the hitrate, you could (for testing purposes) put a counter in each entry to count the stores in it, to see if they are evenly distributed through the table.

What I am most curious of is whether LRU replacement in buckets of 2 could drive up the hit rate.

dangi12012 · Post by **dangi12012** » Tue May 24, 2022 10:45 am

lithander wrote: ↑Mon May 23, 2022 11:49 pm And I used different operators in place of the multiplication.

DIV 0,35 seconds!
REM 0,40 seconds!
MUL 0,28 seconds!
ADD 0,25 seconds!

So the Modulo-Operator is expensive but not quite expensive enough to go out of your way to avoid it. I just like simple more than fast, I guess!

Hey can you do Ildasm on that?
https://docs.microsoft.com/en-us/dotnet ... sassembler
Or ILSpy:
https://github.com/icsharpcode/ILSpy

Im not sure you are measuring what you think you are

Improving engine evaluation

Improving engine evaluation

Re: Improving engine evaluation

Re: Improving engine evaluation

Re: Improving engine evaluation

Re: Improving engine evaluation

Re: Improving engine evaluation

Re: Improving engine evaluation

Re: Improving engine evaluation

Re: Improving engine evaluation

Re: Improving engine evaluation