Thoughs on eval terms

tpetzke · Post by **tpetzke** » Tue Apr 01, 2014 9:52 am

Hi Fermin,

for the most features I agree. The most important thing is to have it, the actual value is less important unless it is far off.

I experienced this when I added a few terms with just guessed weights that made the engine a bit stronger. I then thought when I tune them I get some additional strength, but I did not. The tuned weights were different but overall the engine played at the same level.

A counter example was my weight for the double pawn. The tuned value seemed to low for me so I increased it a bit. This was a clear ELO loss.

Thomas...

tpetzke · Post by **tpetzke** » Tue Apr 01, 2014 9:59 am

Code: Select all

But if I add a mobility term my chess program seems to play even worse. My piece square table only contains an estimate of a mobility factor.

This is actually only half of the story.

The value of a piece is the sum of actual and potential (on an empty board) mobility. The PSQ table addresses (among some other things like centralization) the potential mobility. You can safely add the material value of the piece here because it is the same concept.

The actual mobility is the mobility the piece really has on a given board. You can't capture that with PSQ, you have to take the board setup into account.

Thomas...

Henk · Post by **Henk** » Tue Apr 01, 2014 10:16 am

tpetzke wrote:
Code: Select all
But if I add a mobility term my chess program seems to play even worse. My piece square table only contains an estimate of a mobility factor.
This is actually only half of the story.

The value of a piece is the sum of actual and potential (on an empty board) mobility. The PSQ table addresses (among some other things like centralization) the potential mobility. You can safely add the material value of the piece here because it is the same concept.

The actual mobility is the mobility the piece really has on a given board. You can't capture that with PSQ, you have to take the board setup into account.

Thomas...

Might be that the weights of my mobility in eval were no good. I don't know maybe I have to try once again (but with automated tuning) . In quiescence I still have a move related operation for I have to collect the captures but only for white or black and not both. Actually I am not sure if I do loose a ply when adding a mobility but when I tested it, it did not play better but worse and search depth looked lower than normal.

Sven · Post by **Sven** » Tue Apr 01, 2014 11:18 pm

Henk wrote:
Stan Arts wrote:
Henk wrote: If I compute mobility in eval I loose at least one ply. And nothing beats an extra ply they say. So for instance my eval would be 0.1 Pawns more accurate but in some positions it wouldn't see that it looses a piece or gets check mated.
Losing a ply is a 200-300% slowdown. That sounds a bit off.
I can calculate mobility ten times and raytrace a dancing dinosaur before that happens.
If I have 16 pieces and say 40 moves. I get a factor 40/16. That's more than 200%.

Your calculation based on "a factor 40/16" is incorrect, even as a very rough estimate. Let's say you have two program versions A and B:

A has static eval based on PST only, and the evaluation takes P% of the whole runtime on average. Speed is NPS_A.
B is the same as A with the addition of mobility in static eval. Evaluation with mobility included costs F * the cost of the evaluation with PST only (e.g. your F=40/16).

Now what is a hypothetical NPS_B? (Ignoring cache effects etc. of course)

On average A needs 1/NPS_A seconds per node, of which it spends P/(100*NPS_A) seconds for static eval and (100-P)/(100*NPS_A) seconds for other parts of the program.
B spends P*F/(100*NPS_A) seconds per node for static eval and (100-P)/(100*NPS_A) seconds for other parts.

So the slowdown ratio S is (P*F + (100-P)) / 100 = 1 + P*(F-1)/100 and NPS_B = NPS_A / S.

For your F=40/16 and maybe P=5 (so a PST-only based eval takes 5% of all runtime) you get a hypothetical slowdown ratio for mobility of 1.075 or 107.5%. If we fix P at 10 (which is unrealistic, it might even be P=1 or less with an incremental approach) then you need F=21 for a slowdown of 200%.

These numbers may be quite artificial of course, but they should show you that you really need a *very* slow mobility implementation to get close to a 200% slowdown of the engine. And even 200% is far from losing a full ply (initially you even mentioned "two plies"), unless you already have a super-strong effective branching factor of 2.0 like top class engines.

Kempelen · Post by **Kempelen** » Tue Apr 01, 2014 11:19 pm

Ferdy wrote: There should not be a problem with this, if in check, you really need the move, the same if not in check, the engine most ofen needs the move to get ahead with the active piece placement, unless the position is a zugzwang. Did you brutally test this with 30k games or more .

My initial thought is not giving bonus if in check, so you avoid checks in eval, with I think it should be more problematic than having the initiative. Anyway it needs lots of games to test, I suspect. Different engines could value that bonus differently

Kempelen · Post by **Kempelen** » Tue Apr 01, 2014 11:20 pm

Stan Arts wrote:Sad truth is, (finding this out again currently as lately I've finally regained some interest in computerchess and writing something again.) search is what wins the games and 2-3 real ply extra with a completely empty eval besides a single PSQT for all the pieces seems about enough to overcome a maturely developed evaluation. Search just fills a ton of common knowledge gaps.

So a lot of evaluation code is special case. The stuff that short term search doesn't really fill. You watch it play, you notice a gap, you fix it. But the pattern may only happen once every 10 30 or a 100 games. It likely then doesn't translate directly to huge Elo gain but you still need it. Can you measure an Elo gain for trapped bishop code? If so it must be rather small but it's pretty important to have in a practical sense. Because it'll happen exactly at important games.
I'm sure though all that code combined DOES lead to a rather substantial Elo gain right. Right..?

What you say it something I suspect also. The conclusion would be that more games are needed to test those so littles changes. (currently I test with 15.000 games, maybe 30.000 or more would be needed in my case)

Ferdy · Post by **Ferdy** » Wed Apr 02, 2014 11:26 am

Kempelen wrote:
Ferdy wrote: There should not be a problem with this, if in check, you really need the move, the same if not in check, the engine most ofen needs the move to get ahead with the active piece placement, unless the position is a zugzwang. Did you brutally test this with 30k games or more .
My initial thought is not giving bonus if in check, so you avoid checks in eval, with I think it should be more problematic than having the initiative. Anyway it needs lots of games to test, I suspect. Different engines could value that bonus differently

Deep in qsearch I use the eval even when in check, a trade-off for speed against accuracy. But I search non-capture check moves at early plies in qsearch before going deeper.

Henk · Post by **Henk** » Wed Apr 02, 2014 12:27 pm

Sven Schüle wrote:
Henk wrote:
Stan Arts wrote:
Henk wrote: If I compute mobility in eval I loose at least one ply. And nothing beats an extra ply they say. So for instance my eval would be 0.1 Pawns more accurate but in some positions it wouldn't see that it looses a piece or gets check mated.
Losing a ply is a 200-300% slowdown. That sounds a bit off.
I can calculate mobility ten times and raytrace a dancing dinosaur before that happens.
If I have 16 pieces and say 40 moves. I get a factor 40/16. That's more than 200%.
Your calculation based on "a factor 40/16" is incorrect, even as a very rough estimate. Let's say you have two program versions A and B:

A has static eval based on PST only, and the evaluation takes P% of the whole runtime on average. Speed is NPS_A.
B is the same as A with the addition of mobility in static eval. Evaluation with mobility included costs F * the cost of the evaluation with PST only (e.g. your F=40/16).

Now what is a hypothetical NPS_B? (Ignoring cache effects etc. of course)

On average A needs 1/NPS_A seconds per node, of which it spends P/(100*NPS_A) seconds for static eval and (100-P)/(100*NPS_A) seconds for other parts of the program.
B spends P*F/(100*NPS_A) seconds per node for static eval and (100-P)/(100*NPS_A) seconds for other parts.

So the slowdown ratio S is (P*F + (100-P)) / 100 = 1 + P*(F-1)/100 and NPS_B = NPS_A / S.

For your F=40/16 and maybe P=5 (so a PST-only based eval takes 5% of all runtime) you get a hypothetical slowdown ratio for mobility of 1.075 or 107.5%. If we fix P at 10 (which is unrealistic, it might even be P=1 or less with an incremental approach) then you need F=21 for a slowdown of 200%.

These numbers may be quite artificial of course, but they should show you that you really need a *very* slow mobility implementation to get close to a 200% slowdown of the engine. And even 200% is far from losing a full ply (initially you even mentioned "two plies"), unless you already have a super-strong effective branching factor of 2.0 like top class engines.

I would not be surprised if P would be much bigger say > 50 %. Evaluation is only counting chess piece values. Computing a piece value is no more than Material + PST value and some bit board operations which says whether a pawn is passed, blocked, isolated, weak or backward. If a piece value computes a mobility that would certainly become the bottleneck of a piece value computation.

If there are no captures or checks LMR degrades to a bad PV extension which gives a small branching factor (I guess). So a low branching factor is possible for bad engines too.

lucasart · Post by **lucasart** » Wed Apr 02, 2014 3:36 pm

Henk wrote:
Stan Arts wrote:
Henk wrote: If I compute mobility in eval I loose at least one ply. And nothing beats an extra ply they say. So for instance my eval would be 0.1 Pawns more accurate but in some positions it wouldn't see that it looses a piece or gets check mated.
Losing a ply is a 200-300% slowdown. That sounds a bit off.
I can calculate mobility ten times and raytrace a dancing dinosaur before that happens.
If I have 16 pieces and say 40 moves. I get a factor 40/16. That's more than 200%.

How does your mobility work? You must be doing it very wrong for it to be so costly. Granted, mobility is costly, but the elo gain more than compensates the slowdown in my experience. Mobility is worth a lot of elo in DiscoCheck (and I'm sure in most decent engines).

Perhaps you're trying something too complex and too costly to compute which is safe mobility. I've done a fair amount of testing, and what works best for me is simply to count squares attacked by each piece, and exclude the ones occupied by own pawns or king, as well as those attacked by enemy pawns.

Also I let orthogonal mobility see through rooks and diagonal mobility see through bishops. It's an elegant way to not have to add a new eval term to bonus rook batteries or Q+B batteries. It's all handled naturally, in a unified way.

Kempelen · Post by **Kempelen** » Wed Apr 02, 2014 9:52 pm

Ferdy wrote:
Kempelen wrote:
Ferdy wrote: There should not be a problem with this, if in check, you really need the move, the same if not in check, the engine most ofen needs the move to get ahead with the active piece placement, unless the position is a zugzwang. Did you brutally test this with 30k games or more .
My initial thought is not giving bonus if in check, so you avoid checks in eval, with I think it should be more problematic than having the initiative. Anyway it needs lots of games to test, I suspect. Different engines could value that bonus differently
Deep in qsearch I use the eval even when in check, a trade-off for speed against accuracy. But I search non-capture check moves at early plies in qsearch before going deeper.

All test I did introducing check moves in qsearch (early or later) did bad for me. I dont know why. Maybe having a special check_scape gen moves routines would help a lot.
How do you do non-capture check moves? You have to make them to know if it checks, isn't it?

Thoughs on eval terms

Re: Thoughs on eval terms

Re: Thoughs on eval terms

Re: Thoughs on eval terms

Re: Thoughs on eval terms

Re: Thoughs on eval terms

Re: Thoughs on eval terms

Re: Thoughs on eval terms

Re: Thoughs on eval terms

Re: Thoughs on eval terms

Re: Thoughs on eval terms