Score Inaccuracy: An Engine Weakening Algorithm

bob · Post by **bob** » Wed Nov 07, 2012 3:38 pm

Uri Blass wrote:
bob wrote:
Rebel wrote:
syzygy wrote:
bob wrote:I don't like just adding a fixed bonus for each root move. If you make the bonus too large, play becomes random...
But isn't that the point here? The engine shouldn't be weakened down to completely random play, so don't make the random bonus / error term too high.
For each move the "Club Player" option in mine creates a random value for each root move between 0.00 - 2.00. The effect is surprising, it still plays its normal style but so now and then (2-3 times in a game) it blunders. Just like the average chess player.
Don't you see the effect where it plays several moves like a GM, then one like a patzer? That was what I worked so hard to avoid.

Humans don't really play like that.
I disagree that humans don't play like that when we talk about the first part(several moves like a GM).

I think that majority of the moves that I play in games is at GM level.
The minority of the moves that I do not play at GM level is the reason that I have a rating near 2000 and not a rating near 2600.

Note that it is rare that I play in games even one move
like a stupid patzer.

I may do a tactical mistake that lose material or the game but usually it is not a mistake that the computer can see that it is a mistake based on one ply search.

This was my implication:

a 1700 player does NOT play like a GM for several moves, then make a blunder.

Yes, a 1700 player will make blunders. They will also make anti-positional moves, and essentially play like a 1700 player.

I suspect we agree on this, overall. That's why it is important, IMHO, that when weakening an engine, that the weakening occurs across the evaluation AND across the search, so that positional and tactical skills drop off somewhat proportionally...

syzygy · Post by **syzygy** » Wed Nov 07, 2012 9:58 pm

bob wrote:
syzygy wrote:
bob wrote:I don't like just adding a fixed bonus for each root move. If you make the bonus too large, play becomes random...
But isn't that the point here? The engine shouldn't be weakened down to completely random play, so don't make the random bonus / error term too high.
How do you define "too high"?

I define "too high" as whatever you meant by "too large". It'll need to be tuned, yes, like any method for weakening an engine.

Do you taper the random bonus so that moves ordered later are less likely to become the best move?

I don't see why that would be needed. At the start of each search, each root move gets a random bonus or error term which will the root move will keep throughout that search. If two moves A and B have bonus -0.2 and 0.2, and an unmodified search would have scored them as +0.4 and +0.3, then move B will be played (since 0.3+0.2 = 0.5 > 0.2 = 0.4 - 0.2).

What might need some consideration is the distribution from which the random terms are drawn. It might be that a Gaussian distribution is "better" than a uniform distribution, for example. It might also be a good idea to increase the standard deviation of the distribution if the number of legal moves is large (to give relatively worse moves a chance to end up on top).

This is a REALLY coarse adjustment. I looked for an approach that provides a fairly scalable performance that doesn't feel out of balance. I don't like positional idiot / tactical genius. I don't like tactical idiot / positional genius. I wanted both "components" to match up somewhat reasonably, so that the thing just feels like a weaker program that still plays normal-looking chess moves.

I don't know why this would be a coarse adjustment. Moves will be played that are both tactically and positionally suboptimal (relative to what the engine at full strength would choose), but that seems just fine to me.

This method is certainly perfectly scalable: just decrease/increase the standard deviation of the distribution of random error terms.

bob · Post by **bob** » Thu Nov 08, 2012 11:11 pm

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:I don't like just adding a fixed bonus for each root move. If you make the bonus too large, play becomes random...
But isn't that the point here? The engine shouldn't be weakened down to completely random play, so don't make the random bonus / error term too high.
How do you define "too high"?
I define "too high" as whatever you meant by "too large". It'll need to be tuned, yes, like any method for weakening an engine.

Do you taper the random bonus so that moves ordered later are less likely to become the best move?
I don't see why that would be needed. At the start of each search, each root move gets a random bonus or error term which will the root move will keep throughout that search. If two moves A and B have bonus -0.2 and 0.2, and an unmodified search would have scored them as +0.4 and +0.3, then move B will be played (since 0.3+0.2 = 0.5 > 0.2 = 0.4 - 0.2).

What might need some consideration is the distribution from which the random terms are drawn. It might be that a Gaussian distribution is "better" than a uniform distribution, for example. It might also be a good idea to increase the standard deviation of the distribution if the number of legal moves is large (to give relatively worse moves a chance to end up on top).

This is a REALLY coarse adjustment. I looked for an approach that provides a fairly scalable performance that doesn't feel out of balance. I don't like positional idiot / tactical genius. I don't like tactical idiot / positional genius. I wanted both "components" to match up somewhat reasonably, so that the thing just feels like a weaker program that still plays normal-looking chess moves.
I don't know why this would be a coarse adjustment. Moves will be played that are both tactically and positionally suboptimal (relative to what the engine at full strength would choose), but that seems just fine to me.

This method is certainly perfectly scalable: just decrease/increase the standard deviation of the distribution of random error terms.

The problem with +/- 0.2 is blunders. Or the lack of blunders. A 1600 player is going to make both positional and tactical mistakes. And not just mistakes that lose a pawn. So you need large delta values. Because you really have not hurt the tactical skill of the machine very much, just muddling it's positional judgement.

IMHO, "at the root" is too coarse. My goal is to be able to enter a command like this:

skill 2500

and have it play like a 2500-level human regardless of the hardware, unless (obviously) the hardware is too slow...

Not easy at all, but a goal...

bob · Post by **bob** » Fri Nov 09, 2012 5:29 pm

Alexander Schmidt wrote:I added the limit_ELO feature to some Engines. All I did is to reduce the nodes/second with this kind of formula:

npslimit = x * pow(2,(EloLimit-999)/65)

and

while (nps > npslimit) Sleep 10

This doubles the nps for each 65 ELO increase.

You have to tune the forumla a little bit for different engines, fast timecontrols and very low nps.

After much tuning I got the following nps values for SlowChess compared to ssdf ELO:

1000: 0,2nps
1200: 12nps
1500: 150nps
2000: 3200nps
2500: 313.000nps

And it seems to work pretty good. Here some NUNN matches vs. Dedicated machines with the same ELO as adjusted:

Code: Select all

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1&#58; Mephisto III                1464  27,5 / 50   11011111011=10=11001=100=00=10011=1=0=10010001=011   1500 +25
 2&#58; Mysticum SlowChess ELO 1464 1464  22,5 / 50   00100000100=01=00110=011=11=01100=0=1=01101110=100   1428 -25
----------------------------------------------------------------------------------------------------------------
50 games&#58; +24 =9 -17

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1&#58; Mephisto MM IV              1904  26,5 / 50   0010===10011=011011011=011101==01001001=1011=01==0   1925 +15
 2&#58; Mysticum SlowChess ELO 1904 1904  23,5 / 50   1101===01100=100100100=100010==10110110=0100=10==1   1883 -15
----------------------------------------------------------------------------------------------------------------
50 games&#58; +22 =11 -17

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1&#58; Mysticum SlowChess ELO 1927 1927  26,5 / 50   0011100=01=11100001=0=11==1===1==11=001=1====01001   1948 +15
 2&#58; Mephisto Amsterdam          1927  23,5 / 50   1100011=10=00011110=1=00==0===0==00=110=0====10110   1906 -15
----------------------------------------------------------------------------------------------------------------
50 games&#58; +22 =17 -11

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1&#58; Mephisto Dallas 16Bit       1971  25,0 / 50   ===11==110=1=1111=01==0=100100000011=0=011=0000110   1971  +0
 2&#58; Mysticum SlowChess ELO 1971 1971  25,0 / 50   ===00==001=0=0000=10==1=011011111100=1=100=1111001   1971  +0
----------------------------------------------------------------------------------------------------------------
50 games&#58; +25 =14 -11

                                Rtng    Score     1234567890123456789012345678901234567890123   Perf Chg
---------------------------------------------------------------------------------------------------------
 1&#58; Mephisto MM V               1974  22,0 / 43   1=1011101011=001010011=0110=0=000==11=00011   1981  +4
 2&#58; Mysticum SlowChess ELO 1974 1974  21,0 / 43   0=0100010100=110101100=1001=1=111==00=11100   1967  -4
---------------------------------------------------------------------------------------------------------
43 games&#58; +18 =8 -17

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1&#58; Mysticum SlowChess ELO 2030 2030  29,0 / 50   ==10011=1011=0=01=1=101001=1111=1=1=1=0=11=0001010   2087 +40
 2&#58; Mephisto Roma 32Bit         2030  21,0 / 50   ==01100=0100=1=10=0=010110=0000=0=0=0=1=00=1110101   1973 -40
----------------------------------------------------------------------------------------------------------------
50 games&#58; +16 =14 -20

                                Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
----------------------------------------------------------------------------------------------------------------
 1&#58; Mysticum SlowChess ELO 2011 2011  25,5 / 50   0110==010===1=11010=010==1100101010100110=10=10110   2037  +5
 2&#58; Mephisto Roma 32Bit         2030  24,5 / 50   1001==101===0=00101=101==0011010101011001=01=01001   2023  -5
----------------------------------------------------------------------------------------------------------------
50 games&#58; +22 =11 -17

                                   Rtng    Score     12345678901234567890123456789012345678901234567890   Perf Chg
-------------------------------------------------------------------------------------------------------------------
 1&#58; Mysticum SlowChess ELO 2445    2445  26,0 / 50   1001==01010=10=00110==0011=1=1=110==101=0=10=11==0   2459 +10
 2&#58; Resurrection Fruit 2.1 203 MHz 2445  24,0 / 50   0110==10101=01=11001==1100=0=0=001==010=1=01=00==1   2431 -10
-------------------------------------------------------------------------------------------------------------------
50 games&#58; +22 =16 -12

this was my first take on this as well. But after playing against Crafty, my description was "this feels like a positional GM, but a tactical patzer" when I watered it down that way. Even shallow searches won't wreck the pawn structure or commit gross positional gaffes, where a real 1500 player doesn't know very much about subtle positional chess. My intent was to dumb it down so that a 1500 player would not know anything about pawn majorities or weaknesses (subtle ones anyway).

Yes, you can make a program play like a 1500-level player just by slowing it down. But you are averaging two terms (tactical and positional) where you are not affecting the positional judgement very much, but killing the tactical skills to get the average down to 1500.

Just my $.02. And no, I have not yet gotten the "optimal" skill approach done. At present I whack search speed and evaluation accuracy pretty much equally, but I can't accurately predict how strongly it will play, although that is my ultimate goal for this command (skill).

Alexander Schmidt · Post by **Alexander Schmidt** » Sat Nov 10, 2012 12:40 pm

bob wrote: But after playing against Crafty, my description was "this feels like a positional GM, but a tactical patzer" when I watered it down that way. Even shallow searches won't wreck the pawn structure or commit gross positional gaffes, where a real 1500 player doesn't know very much about subtle positional chess. My intent was to dumb it down so that a 1500 player would not know anything about pawn majorities or weaknesses (subtle ones anyway).

I have a totally different feeling when I watch the games. Maybe it is true for Crafty, but I don't think for SlowChess. I think it was Kasparov who said playing against a computer is like playing against a ELO 2400 Player who doesn't do mistakes. Computers are stronger at tactics, humans positional. To make a computer play "human like" you have to limit the tactical abilities more than the positional. But reducing the speed also results in positional errors. And don't underestimate the knowledge of a ELO 1500 player, he knows a lot about pawn structures and good/bad pieces. Just he can't play it out correctly because of tactical disability.

Compared to dedicated machines the weak SlowChess plays much more human like.

If someone want to try it:

http://www.mediafire.com/?0cur55wqcpbbwbb

Score Inaccuracy: An Engine Weakening Algorithm

Re: Score Inaccuracy: An Engine Weakening Algorithm

Re: Score Inaccuracy: An Engine Weakening Algorithm

Re: Score Inaccuracy: An Engine Weakening Algorithm

Re: Score Inaccuracy: An Engine Weakening Algorithm

Re: Score Inaccuracy: An Engine Weakening Algorithm