Artificial stupidity - making a program play badly

Tord Romstad · Post by **Tord Romstad** » Tue May 20, 2008 7:55 pm

The last few days, I've been working on the most important missing feature in my chess program: Adjustable playing strength. Strange though it might seem, making my program play very badly is by far the most difficult and frustrating thing I have attempted to do in computer chess, and I am now close to giving up in despair.

At ratings above 2200, I achieve limited strength simply by reducing the speed of calculation. This works fairly well, as one would expect. Below 2200, I try to emulate typical human blunders and tactical mistakes. This is where the problems begin. My approach seems very reasonable to me: I just prune random moves everywhere in the tree, and the probability that a move is pruned depends on how hard the move would be to see for a human player. Underpromotions and moves with negative SEE value are pruned with very high probability, long diagonal moves also have quite high probability, obvious recaptures have very low probability of being pruned, and so on. Finally, the frequency of pruning of course depends on the playing strength.

Tuning this turned out to be much trickier than I thought. I used TSCP as my sparring partner. The simple task of adjusting the blunder frequency so that my program scored somewhere above 0% and below 100% took a lot of time. After days of work, I finally began to hit close to the mark. I managed to find various settings which scored around 10%, 25%, 50%, 75% and 90% against TSCP. I was also quite pleased with the look of the games: Glaurung played positionally stronger than TSCP, but lost by making quite human-looking blunders. Many of the games looked almost like I would expect a game between TSCP and a similarly rated human to look.

Proud and happy with my work, I started an account on the ICC last night, in order to test against human players. I started with the settings which scored 50% against TSCP, which I thought (based on the WBEC ratings) should have a strength around 1700. At this level, the programs plays positionally ugly chess, and makes plenty of tactical blunders, but rarely hangs a piece, or misses to capture a hanging piece. The result was terribly disappointing: Glaurung played about a dozen games against players around 1900-2100, and won all games except for a single draw. Apparently, 2000 rated players on the ICC make elementary tactical blunders all the time.

I then adjusted the rating down to 1300, and tried again. At this level, the program drops a piece about once or twice per game, on average (at blitz time controls). It turned out that this hardly made any difference: Glaurung still scored close to 100%. Glaurung was frequently hanging pieces, but half the time the human opponents didn't see it, and half the time they quickly paid back the favor by blundering a piece themselves. With a blitz rating of around 2200, I gave up in disgust, logged off and went to bed.

Today, I logged on with the strength set to 1000 -- the lowest implemented level, which scores 0% against TSCP. Glaurung makes several horrible blunders in every single games. It is painful to watch, and it is difficult to imagine how it is possible to play much weaker without playing completely random moves. To my immense frustration, Glaurung still wins most of its games. The current blitz rating, after 37 games, is 2098.

How is this possible? TSCP is rated around 1700, and even when I make my program weak enough to lose every single game against TSCP, it still wins easily against most human players on the ICC. Are the ICC ratings 1000 points too high, or something? How do I manage to lose against average human players, without playing completely random moves?

I'm not sure what the purpose of this post is, apart from venting my frustration, but any advice about how to achieve weak, but realistic-looking play by a computer program would be welcome.

Tord

Zach Wegner · Post by **Zach Wegner** » Tue May 20, 2008 8:16 pm

I have basically zero experience on _trying_ to make an engine weak, but it comes naturally to me.

Really though, this is a very interesting topic, and your story is quite amusing.

One idea that I read about a while back from Mike Byrne is to randomize the search depth. If on one move you search 10 plies, and 2 plies the next, it seems that it should approximate human play reasonably well. This also would be much easier than the way you describe.

Also, ICC ratings are pretty inflated, and not really balanced. The highest blitz ratings are about 3500...

Roman Hartmann · Post by **Roman Hartmann** » Tue May 20, 2008 8:24 pm

Hi Tord,
just some ideas to dumb the engine down:

-disable QS
-disable extensions (in check, recaptures)
-disable move ordering or search them in reverse order
-disable cut-offs, make it a mini-max-searcher

Any of that suggestions or a combination of them should reduce the playing strength quite a bit.

Regarding the ratings on ICC, I don't think they are any good to compare with engine ratings.
Even very weak engines that don't win a single game vs TSCP but don't throw away pieces on purpose and search at least 5 plies and have a QS tend to achieve a rating >2000 on ICC. One of my very first versions of roce had no QS at all but climbed up to 2500 in bullet although it played almost random moves when under time pressure due the horizon effect (it didn't win vs any comp on ICC ever though).

best regards
Roman

EDIT: my memory failed me, it climbed only to 2500 in bullet, not to 2600 ...

Carey · Post by **Carey** » Tue May 20, 2008 8:24 pm

Interesting....

Are you also cutting the search depth as well? If not, then it's still going to make some really good tactical moves.

A blunder can be saved by a brilliant tactical move.

Also, are you increasing the number (and significance) of blunders closer to the root? Deep blunders can easily be irrelevant for non-humans. Those are tactical and humans don't play deep tactics.

I think the problem is you are tuning against a program (that is poor positionally but will never make a tactical blunder) but playing against humans (who do miss tactical items but are good positionall). It's just a totally different situation.

I wonder how well the classic selective search programs would do against them.... Too bad I never could find the source for MacHack. AWIT & CHAOS would be stronger, though. (I really wonder just how well they'd do against modern program savy humans when running on modern hardware...)

Dann Corbit · Post by **Dann Corbit** » Tue May 20, 2008 8:31 pm

How about:

if (rand() % 17 == 0) eval = -eval;

I guess it will uncork a real funny one once in a while.
I didn't bother to try it.

Colin put a lot of effort into Beowulf to make it play at different levels. Have you looked at what he did?

Aleks Peshkov · Post by **Aleks Peshkov** » Tue May 20, 2008 9:05 pm

I think that 1-ply search with usual extensions and quiesearch can be a good approximation of human in blitz.

hgm · Post by **hgm** » Tue May 20, 2008 9:13 pm

Tord Romstad wrote:Underpromotions and moves with negative SEE value are pruned with very high probability, long diagonal moves also have quite high probability, obvious recaptures have very low probability of being pruned, and so on. Finally, the frequency of pruning of course depends on the playing strength.

Pruning apparently bad moves will probably not have much effect against Humans, as they prune those moves too. So you make the program more Human-like by it, and vulnarable against full-width searchers which do find non-obvious tactics, but youdon't weaken it against Humans.

I think it is very important that your prunings are correlated. If you independently decide if the same move will be pruned in some branches, but not in others, the search will simply seek out the branches where it is not pruned. And search is very clever at that.

Humans err very differently: if they miss a move (e.g., that a Knight fork against KQ is possible), they miss it in all branches. So their search will not work around it.

I think you have to decide in the root which moves you are going to prune, or at least globally. E.g. through some kine of history table. The first time you encounter a (Piece, From, To) combination you decide if you will prune that move or not, with a probability that depends on how far from the root this move becomes possible (e.g. how many moves has the piece to do before it gets there, how many pieces are blocking that move in the root). If you decide to prune it, you should then always prune that move, everywhere in the tree.

Try if you can tune it such that you manage to lose from NEG 0.3d! That is an engine that prunes every move!

Tord Romstad · Post by **Tord Romstad** » Tue May 20, 2008 9:16 pm

Zach Wegner wrote:I have basically zero experience on _trying_ to make an engine weak, but it comes naturally to me.

Trust me -- you wouldn't say that if you had tried to lose against 1900 rated humans on the ICC.

Really though, this is a very interesting topic, and your story is quite amusing.

It feels quite absurd to watch the games while they are played. I am nervous and excited, and always cheer for the opponent and hope my program will lose. Ultimately, I almost always end up disappointed. Here's a typical scenario:

Glaurung hangs a rook.
The opponent thinks for a few seconds, while I wait nervously and hope he will notice the hanging rook.
The opponent ignores the hanging rook, and makes an irrelevant move somewhere else on the board.
Glaurung thinks for a while, while I follow its PV, noticing that it still hasn't realized its rook is hanging, and hoping that it won't discover it at the last moment.
Phew. Glaurung decides to leave the rook hanging.
My relief is short-lasting, because the opponent moves instantly, leaving the rook untouched.
Repeat of step 4.
Repeat if step 5. There is still hope.
The opponent thinks for a very long time. I'm beginning to get sure he has finally spotted the hanging rook, and is just making sure there is no trap before he takes it.
He finally moves, doesn't take the rook, but instead walks into an instant back-rank-mate. Game over.

One idea that I read about a while back from Mike Byrne is to randomize the search depth. If on one move you search 10 plies, and 2 plies the next, it seems that it should approximate human play reasonably well. This also would be much easier than the way you describe.

I think my approach should simulate human play a bit better, but admittedly it doesn't seem very successful so far. In fact I do something slightly similar to what Mike Byrne does, just in a more complex way: Exactly how erroneous the search should be is decided by chance before every move.

Also, ICC ratings are pretty inflated, and not really balanced. The highest blitz ratings are about 3500...

That's some relief. Glaurung's ICC rating when playing at an Elo setting of 1000 seems to have stabilized around 2100. Right now, it's 2094, after 44 games (+33,-10,=1). It has a lost position against a 1400 rated player in the currently running game, but I have lost all faith in human players and expect Glaurung to win in the end.

Tord

Tord Romstad · Post by **Tord Romstad** » Tue May 20, 2008 9:25 pm

Hi Roman!

Roman Hartmann wrote:Hi Tord,
just some ideas to dumb the engine down:

-disable QS
-disable extensions (in check, recaptures)
-disable move ordering or search them in reverse order
-disable cut-offs, make it a mini-max-searcher

Any of that suggestions or a combination of them should reduce the playing strength quite a bit.

Sure they will, but I think they will result in artificial and not very human-like play. Disabling the QS will lead to very weird horizon effect blunders very different from human mistakes (no human would come up with a line like 1. e4 e5 2. Qh5 Nc6 3. Qxh7 from the opening position, for instance). Your other suggestions essentially all come down to the same thing: Limiting the search to a very shallow depth. This gives the typical computer behavior where the program sees everything up to a certain shallow depth, and absolutely nothing beyond it. This is very different from a weak human player, who often makes very shallow blunders, but occasionally spots a combination several moves long, and sometimes also spots a "combination" that doesn't work because he missed some obvious defensive move somewhere along the line. That's the kind of play I want to emulate. It seemed to work well when I tested against TSCP, but failed miserably against humans on the ICC.

Regarding the ratings on ICC, I don't think they are any good to compare with engine ratings.
Even very weak engines that don't win a single game vs TSCP but don't throw away pieces on purpose and search at least 5 plies and have a QS tend to achieve a rating >2000 on ICC. One of my very first versions of roce had no QS at all but climbed up to 2500 in bullet although it played almost random moves when under time pressure due the horizon effect (it didn't win vs any comp on ICC ever though).

After my own experiences over the last two days, I can easily believe that.

Tord

Tord Romstad · Post by **Tord Romstad** » Tue May 20, 2008 9:32 pm

Carey wrote:Interesting....

Are you also cutting the search depth as well? If not, then it's still going to make some really good tactical moves.

Yes, it happens that it finds a good tactical move. I think it should. Even very weak players have an occasional bright moment.

Also, are you increasing the number (and significance) of blunders closer to the root? Deep blunders can easily be irrelevant for non-humans. Those are tactical and humans don't play deep tactics.

I do the opposite: I decrease the number of blunders close to the root. Once again, this is an attempt to emulate human play. Humans are less likely to miss a move at ply 2 than at ply 5.

Nevertheless, at the lowest levels the probability of blunders is very high even near the root, as is evident from the fact that Glaurung frequently hangs pieces.

I think the problem is you are tuning against a program (that is poor positionally but will never make a tactical blunder) but playing against humans (who do miss tactical items but are good positionall). It's just a totally different situation.

Yes, it is -- but after playing against TSCP, I thought I had gotten the "bad at tactics, but good at positional play" thing right. In the games against TSCP, Glaurung always played positionally stronger, usually secured the advantage, but lost lots of games because of elementary tactical mistakes.

Tord

Artificial stupidity - making a program play badly

Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly