Score Inaccuracy: An Engine Weakening Algorithm

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

syzygy
Posts: 5647
Joined: Tue Feb 28, 2012 11:56 pm

Re: Score Inaccuracy: An Engine Weakening Algorithm

Post by syzygy »

bob wrote:I don't like just adding a fixed bonus for each root move. If you make the bonus too large, play becomes random...
But isn't that the point here? The engine shouldn't be weakened down to completely random play, so don't make the random bonus / error term too high.
User avatar
Rebel
Posts: 7208
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Score Inaccuracy: An Engine Weakening Algorithm

Post by Rebel »

syzygy wrote:
bob wrote:I don't like just adding a fixed bonus for each root move. If you make the bonus too large, play becomes random...
But isn't that the point here? The engine shouldn't be weakened down to completely random play, so don't make the random bonus / error term too high.
For each move the "Club Player" option in mine creates a random value for each root move between 0.00 - 2.00. The effect is surprising, it still plays its normal style but so now and then (2-3 times in a game) it blunders. Just like the average chess player.
User avatar
emadsen
Posts: 439
Joined: Thu Apr 26, 2012 1:51 am
Location: Oak Park, IL, USA
Full name: Erik Madsen

Re: Score Inaccuracy: An Engine Weakening Algorithm

Post by emadsen »

First of all I'd like to thank all of you for your responses. It's quite amazing that I can ask an obscure technical question on this forum and within a couple days get advice from experts in the field. Many of you have been at this hobby for years, so I appreciate you taking some time to help a newcomer like myself. :)
If all you do is add some sort of random bias to your scores, and don't change anything else, you will have a "floor" on your performance that is much higher than you might imagine. Don Beal tested a program with a purely random eval and found it did not play foolishly at all.
Bob, I remember reading a post of yours a while back that explained this. It makes sense to me. Piece mobility is desired by an engine. Mobility = greater number of moves. Greater number of moves = greater chance a good move is selected, even with a random error applied. That's why I rejected the technique of added error to evaluation scores. Instead I add error only to root moves.
And I also limit speed as Bob does: because at normal search speed you will get very high depths and if your score inaccuracy is not very large then the program will still find deep tactical shots that win material.
Bob and Jon both mentioned the importance of slowing down the search. I have a parameter in MadChess called Evaluation Delay that does that. It enforces a delay every time the evaluation function is called. I have not spent any time calibrating this parameter to ELO. I'm hoping the combination of Score Inaccuracy and Evaluation Delay can create fairly believable weak chess personalities.
For each move the "Club Player" option in mine creates a random value for each root move between 0.00 - 2.00. The effect is surprising, it still plays its normal style but so now and then (2-3 times in a game) it blunders. Just like the average chess player.
Ed, this sounds very similar to what I'm doing in my engine. I'll check out the personalities in ProDeo, maybe play a few games between ProDeo personalities and MadChess personalities. Though I'd have to weaken your engine much more than mine! ProDeo is much faster and stronger than my amateur engine. :wink:
Erik Madsen | My C# chess engine: https://www.madchess.net
syzygy
Posts: 5647
Joined: Tue Feb 28, 2012 11:56 pm

Re: Score Inaccuracy: An Engine Weakening Algorithm

Post by syzygy »

emadsen wrote:Bob and Jon both mentioned the importance of slowing down the search. I have a parameter in MadChess called Evaluation Delay that does that. It enforces a delay every time the evaluation function is called. I have not spent any time calibrating this parameter to ELO. I'm hoping the combination of Score Inaccuracy and Evaluation Delay can create fairly believable weak chess personalities.
Slowing down the search could also be done at the root: let the engine search for a fraction of the allocated time, then wait until the allocated time runs out.
PK
Posts: 895
Joined: Mon Jan 15, 2007 11:23 am
Location: Warsza

Re: Score Inaccuracy: An Engine Weakening Algorithm

Post by PK »

I went a different path in Rodent:

there is a random factor in leaf eval, there is a slowdown at the root, but the following function constitutes the core of my weakening algorithm. Basically it tells the search whether a move should be omitted.

Code: Select all

int sSearcher::Blunder(sPosition *p, int ply, int depth, int flag, int move, int lastMove, int flagInCheck) 
{

	// we're playing the best game we can, so don't bother with weakening
	// (speed optimization)
        // MAX_ELO set at 2500
	if (Data.elo == MAX_ELO || !Data.useWeakening ) return 0;
	
    // Weaker levels progressively use more heuristics that disallow
	// forgetting about certain moves. 

	// try all root moves
	if (!ply) return 0;

	// try to capture the last piece that moved
	if (Tsq(move) == Tsq(lastMove) && Data.elo > 999) return 0;

	// don't miss captures by a pawn
	if ( p->pc[Tsq(move)] == P 
	&& File(Tsq(move)) != File(Fsq(move)) 
	&& Data.elo > 1199) return 0; 

	// don't miss short captures
	if ( Data.distance[Fsq(move)][Tsq(move)] > 9
	&& flag > FLAG_KILLER_MOVE 
	&& flag != FLAG_HASH_MOVE 
	&& Data.elo > 1299 ) return 0;

    // don't miss a queen promotion
	if ( MoveType(move) == Q_PROM && Data.elo > 1399) return 0;

	// don't miss check evasions
	if (flagInCheck && Data.elo > 1499) return 0;

	// Stronger levels progressively push blunders away from the root
	// ( extra randomness is used to scale this property as smoothly as possible)

	if ( (Data.elo > 1599 && ply < 3)
	||   (Data.elo > 1899 && ply < 4) 
	||   (Data.elo > 2299 && ply < 5) ) return 0;

    if (Data.elo > 1300 && Data.elo <= 1600 && ply < 3)
	{
		int saveRate = p->hashKey % 300;
		if (saveRate > 1600 - Data.elo) return 0;
	}

    if (Data.elo > 1600 && Data.elo <= 1900 && ply < 4 )
	{
        int saveRate = p->hashKey % 300;
		if (saveRate > 1900 - Data.elo) return 0;
	}

    if (Data.elo > 1900 && Data.elo <= 2300 && ply < 5 )
	{
        int saveRate = p->hashKey % 300;
		if (saveRate > 2300 - Data.elo) return 0;
	}

	int blunderRate = p->hashKey % Data.elo;

	if (flag == FLAG_KILLER_MOVE 
	|| flag == FLAG_HASH_MOVE)
	{
       blunderRate *= 2;
	   blunderRate *= Data.elo;
	   blunderRate /= MAX_ELO;
	}
	
	// calculate blunder probability (less blunders near the root in deeper search)
// depth is counted so that ONE_PLY == 4
	if ( blunderRate + depth < MAX_ELO - Data.elo) return 1;

	return 0;
	
}
In short, stronger levels make less blunders and push them further away from the root. Also it is not ok to forget about specific moves. Not missing check evasions is extremely big contributor (levels that can miss it tend to play unsound sacrificial attacks).

I managed to tune this algorithm to perform like 2000 and 2200 Elo (CCRL scale, first data point requires internal Elo of about 1850). At first I thought that with enough data points some curve fit would be possible.

The real problem is that I am unable to tune it in such a way that strength becomes independent from evaluation style. In short, at weaker levels evaluation functions containing significant bias (i.e. geared towards King attack or King defense) clearly outperform their balanced counterparts.
Alexander Schmidt
Posts: 1235
Joined: Thu May 10, 2007 2:49 pm

Re: Score Inaccuracy: An Engine Weakening Algorithm

Post by Alexander Schmidt »

I added the limit_ELO feature to some Engines. All I did is to reduce the nodes/second with this kind of formula:

npslimit = x * pow(2,(EloLimit-999)/65)

and

while (nps > npslimit) Sleep 10

This doubles the nps for each 65 ELO increase.

You have to tune the forumla a little bit for different engines, fast timecontrols and very low nps.

After much tuning I got the following nps values for SlowChess compared to ssdf ELO:

1000: 0,2nps
1200: 12nps
1500: 150nps
2000: 3200nps
2500: 313.000nps

And it seems to work pretty good. Here some NUNN matches vs. Dedicated machines with the same ELO as adjusted:

Code: Select all

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1: Mephisto III                1464  27,5 / 50   11011111011=10=11001=100=00=10011=1=0=10010001=011   1500 +25
 2: Mysticum SlowChess ELO 1464 1464  22,5 / 50   00100000100=01=00110=011=11=01100=0=1=01101110=100   1428 -25
----------------------------------------------------------------------------------------------------------------
50 games: +24 =9 -17

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1: Mephisto MM IV              1904  26,5 / 50   0010===10011=011011011=011101==01001001=1011=01==0   1925 +15
 2: Mysticum SlowChess ELO 1904 1904  23,5 / 50   1101===01100=100100100=100010==10110110=0100=10==1   1883 -15
----------------------------------------------------------------------------------------------------------------
50 games: +22 =11 -17

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1: Mysticum SlowChess ELO 1927 1927  26,5 / 50   0011100=01=11100001=0=11==1===1==11=001=1====01001   1948 +15
 2: Mephisto Amsterdam          1927  23,5 / 50   1100011=10=00011110=1=00==0===0==00=110=0====10110   1906 -15
----------------------------------------------------------------------------------------------------------------
50 games: +22 =17 -11

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1: Mephisto Dallas 16Bit       1971  25,0 / 50   ===11==110=1=1111=01==0=100100000011=0=011=0000110   1971  +0
 2: Mysticum SlowChess ELO 1971 1971  25,0 / 50   ===00==001=0=0000=10==1=011011111100=1=100=1111001   1971  +0
----------------------------------------------------------------------------------------------------------------
50 games: +25 =14 -11

                                Rtng    Score     1234567890123456789012345678901234567890123   Perf Chg
---------------------------------------------------------------------------------------------------------
 1: Mephisto MM V               1974  22,0 / 43   1=1011101011=001010011=0110=0=000==11=00011   1981  +4
 2: Mysticum SlowChess ELO 1974 1974  21,0 / 43   0=0100010100=110101100=1001=1=111==00=11100   1967  -4
---------------------------------------------------------------------------------------------------------
43 games: +18 =8 -17

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1: Mysticum SlowChess ELO 2030 2030  29,0 / 50   ==10011=1011=0=01=1=101001=1111=1=1=1=0=11=0001010   2087 +40
 2: Mephisto Roma 32Bit         2030  21,0 / 50   ==01100=0100=1=10=0=010110=0000=0=0=0=1=00=1110101   1973 -40
----------------------------------------------------------------------------------------------------------------
50 games: +16 =14 -20

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1: Mysticum SlowChess ELO 2011 2011  25,5 / 50   0110==010===1=11010=010==1100101010100110=10=10110   2037  +5
 2: Mephisto Roma 32Bit         2030  24,5 / 50   1001==101===0=00101=101==0011010101011001=01=01001   2023  -5
----------------------------------------------------------------------------------------------------------------
50 games: +22 =11 -17

                                   Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
-------------------------------------------------------------------------------------------------------------------
 1: Mysticum SlowChess ELO 2445    2445  26,0 / 50   1001==01010=10=00110==0011=1=1=110==101=0=10=11==0   2459 +10
 2: Resurrection Fruit 2.1 203 MHz 2445  24,0 / 50   0110==10101=01=11001==1100=0=0=001==010=1=01=00==1   2431 -10
-------------------------------------------------------------------------------------------------------------------
50 games: +22 =16 -12
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Score Inaccuracy: An Engine Weakening Algorithm

Post by bob »

syzygy wrote:
bob wrote:I don't like just adding a fixed bonus for each root move. If you make the bonus too large, play becomes random...
But isn't that the point here? The engine shouldn't be weakened down to completely random play, so don't make the random bonus / error term too high.
How do you define "too high"? Do you taper the random bonus so that moves ordered later are less likely to become the best move?

This is a REALLY coarse adjustment. I looked for an approach that provides a fairly scalable performance that doesn't feel out of balance. I don't like positional idiot / tactical genius. I don't like tactical idiot / positional genius. I wanted both "components" to match up somewhat reasonably, so that the thing just feels like a weaker program that still plays normal-looking chess moves.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Score Inaccuracy: An Engine Weakening Algorithm

Post by bob »

Rebel wrote:
syzygy wrote:
bob wrote:I don't like just adding a fixed bonus for each root move. If you make the bonus too large, play becomes random...
But isn't that the point here? The engine shouldn't be weakened down to completely random play, so don't make the random bonus / error term too high.
For each move the "Club Player" option in mine creates a random value for each root move between 0.00 - 2.00. The effect is surprising, it still plays its normal style but so now and then (2-3 times in a game) it blunders. Just like the average chess player.
Don't you see the effect where it plays several moves like a GM, then one like a patzer? That was what I worked so hard to avoid.

Humans don't really play like that.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Score Inaccuracy: An Engine Weakening Algorithm

Post by bob »

emadsen wrote:First of all I'd like to thank all of you for your responses. It's quite amazing that I can ask an obscure technical question on this forum and within a couple days get advice from experts in the field. Many of you have been at this hobby for years, so I appreciate you taking some time to help a newcomer like myself. :)
If all you do is add some sort of random bias to your scores, and don't change anything else, you will have a "floor" on your performance that is much higher than you might imagine. Don Beal tested a program with a purely random eval and found it did not play foolishly at all.
Bob, I remember reading a post of yours a while back that explained this. It makes sense to me. Piece mobility is desired by an engine. Mobility = greater number of moves. Greater number of moves = greater chance a good move is selected, even with a random error applied. That's why I rejected the technique of added error to evaluation scores. Instead I add error only to root moves.
And I also limit speed as Bob does: because at normal search speed you will get very high depths and if your score inaccuracy is not very large then the program will still find deep tactical shots that win material.
Bob and Jon both mentioned the importance of slowing down the search. I have a parameter in MadChess called Evaluation Delay that does that. It enforces a delay every time the evaluation function is called. I have not spent any time calibrating this parameter to ELO. I'm hoping the combination of Score Inaccuracy and Evaluation Delay can create fairly believable weak chess personalities.
For each move the "Club Player" option in mine creates a random value for each root move between 0.00 - 2.00. The effect is surprising, it still plays its normal style but so now and then (2-3 times in a game) it blunders. Just like the average chess player.
Ed, this sounds very similar to what I'm doing in my engine. I'll check out the personalities in ProDeo, maybe play a few games between ProDeo personalities and MadChess personalities. Though I'd have to weaken your engine much more than mine! ProDeo is much faster and stronger than my amateur engine. :wink:
Slowing things down makes your approach sound more reasonable, because you have to reduce tactical acuity, and the only sensible way is to simply reduce the search depth. I also tune back (or eventually completely turn off) the other tricks in the search (LMR, null-move, forward pruning, which is another effective way of slowing things down.)
Uri Blass
Posts: 10486
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Score Inaccuracy: An Engine Weakening Algorithm

Post by Uri Blass »

bob wrote:
Rebel wrote:
syzygy wrote:
bob wrote:I don't like just adding a fixed bonus for each root move. If you make the bonus too large, play becomes random...
But isn't that the point here? The engine shouldn't be weakened down to completely random play, so don't make the random bonus / error term too high.
For each move the "Club Player" option in mine creates a random value for each root move between 0.00 - 2.00. The effect is surprising, it still plays its normal style but so now and then (2-3 times in a game) it blunders. Just like the average chess player.
Don't you see the effect where it plays several moves like a GM, then one like a patzer? That was what I worked so hard to avoid.

Humans don't really play like that.
I disagree that humans don't play like that when we talk about the first part(several moves like a GM).

I think that majority of the moves that I play in games is at GM level.
The minority of the moves that I do not play at GM level is the reason that I have a rating near 2000 and not a rating near 2600.

Note that it is rare that I play in games even one move
like a stupid patzer.

I may do a tactical mistake that lose material or the game but usually it is not a mistake that the computer can see that it is a mistake based on one ply search.