Artificial stupidity - making a program play badly

Michael Sherwin · Post by **Michael Sherwin** » Wed May 21, 2008 6:17 am

Hi Tord;

In the search you could try something like:

if(eval + margine >= beta) return beta;

Then the bigger the margine the weaker the play.

The margine can be varied to depth so you can control the tactics seperate from the strategy.

Just a vauge idea.

Anyway I could not help make Glaurung any stronger with my ideas.

Maybe I can help make it weaker!

Zach Wegner · Post by **Zach Wegner** » Wed May 21, 2008 6:41 am

Tord Romstad wrote:Strange. Glaurung instantly became extremely popular among human players after I registered yesterday night. There are rarely more than a few seconds of idle time between the games. Perhaps having "Glaurung 080519, Elo = 1000" in the finger notes help to attract players, but I doubt they are fooled when the program's ICC rating is around 2100.

After I was reading today, I decided to give it another try. It was pretty nice this time: I have consistently gotten games, many against humans. What's better, ZCT has won every game except for the two against Glaurung it played (our first matchup, it seems) and a draw against Symbolic. Most of the players were sub-2000, but at least my rating goes up.

ZCT is playing against Glaurung now too. Maybe it won't be a catastrophe, it's an almost symmetrical position. The games are pretty interesting though.

Uri Blass · Post by **Uri Blass** » Wed May 21, 2008 7:35 am

Tord Romstad wrote:The last few days, I've been working on the most important missing feature in my chess program: Adjustable playing strength. Strange though it might seem, making my program play very badly is by far the most difficult and frustrating thing I have attempted to do in computer chess, and I am now close to giving up in despair.

At ratings above 2200, I achieve limited strength simply by reducing the speed of calculation. This works fairly well, as one would expect. Below 2200, I try to emulate typical human blunders and tactical mistakes. This is where the problems begin. My approach seems very reasonable to me: I just prune random moves everywhere in the tree, and the probability that a move is pruned depends on how hard the move would be to see for a human player. Underpromotions and moves with negative SEE value are pruned with very high probability, long diagonal moves also have quite high probability, obvious recaptures have very low probability of being pruned, and so on. Finally, the frequency of pruning of course depends on the playing strength.

Tuning this turned out to be much trickier than I thought. I used TSCP as my sparring partner. The simple task of adjusting the blunder frequency so that my program scored somewhere above 0% and below 100% took a lot of time. After days of work, I finally began to hit close to the mark. I managed to find various settings which scored around 10%, 25%, 50%, 75% and 90% against TSCP. I was also quite pleased with the look of the games: Glaurung played positionally stronger than TSCP, but lost by making quite human-looking blunders. Many of the games looked almost like I would expect a game between TSCP and a similarly rated human to look.

Proud and happy with my work, I started an account on the ICC last night, in order to test against human players. I started with the settings which scored 50% against TSCP, which I thought (based on the WBEC ratings) should have a strength around 1700. At this level, the programs plays positionally ugly chess, and makes plenty of tactical blunders, but rarely hangs a piece, or misses to capture a hanging piece. The result was terribly disappointing: Glaurung played about a dozen games against players around 1900-2100, and won all games except for a single draw. Apparently, 2000 rated players on the ICC make elementary tactical blunders all the time.

I then adjusted the rating down to 1300, and tried again. At this level, the program drops a piece about once or twice per game, on average (at blitz time controls). It turned out that this hardly made any difference: Glaurung still scored close to 100%. Glaurung was frequently hanging pieces, but half the time the human opponents didn't see it, and half the time they quickly paid back the favor by blundering a piece themselves. With a blitz rating of around 2200, I gave up in disgust, logged off and went to bed.

Today, I logged on with the strength set to 1000 -- the lowest implemented level, which scores 0% against TSCP. Glaurung makes several horrible blunders in every single games. It is painful to watch, and it is difficult to imagine how it is possible to play much weaker without playing completely random moves. To my immense frustration, Glaurung still wins most of its games. The current blitz rating, after 37 games, is 2098.

How is this possible? TSCP is rated around 1700, and even when I make my program weak enough to lose every single game against TSCP, it still wins easily against most human players on the ICC. Are the ICC ratings 1000 points too high, or something? How do I manage to lose against average human players, without playing completely random moves?

I'm not sure what the purpose of this post is, apart from venting my frustration, but any advice about how to achieve weak, but realistic-looking play by a computer program would be welcome.

Tord

1)Tscp is clearly better than 2000 at blitz.

2)I think that you are wrong if you assume that humans are positionally better than computers.

Here is an example of positional error that cannot happen to computers and happened to me in my last tournament game(90+30 time control).

In my last tournament game I simply did not pay attention to the fact that the opponent has a passed pawn and I considered the pawn d4 as a weak pawn when it is both weak pawn and passed pawn.

position is equal but I evaluated black as better because I did not see that d4 is a passed pawn and only some moves later in the game I suddenly saw that d4 is a passed pawn.

Analysis shows that this did not cause me to make mistakes in the relevant game(I made mistakes because of different reasons) but this type of mistake can also cause positional mistakes.

[d]r2r2k1/pp3pp1/4bn1p/3q4/2pP4/6NP/PPBQ1PP1/3RR1K1 b - - 0 1

Analysis by Rybka 2.3.2a 32-bit :

20...Qd5-b5 21.Ng3-e4
= (0.00) Depth: 5 00:00:00
20...Qd5-b5 21.Ng3-e4 Nf6xe4
= (-0.06) Depth: 6 00:00:00 7kN
20...Qd5-b5 21.Ng3-e4 Nf6xe4 22.Bc2xe4 Rd8-d7
= (0.06) Depth: 7 00:00:00 10kN
20...Qd5-b5 21.Ng3-e4 Nf6xe4 22.Bc2xe4 Rd8-d7 23.Qd2-c3
= (0.01) Depth: 8 00:00:00 26kN
20...Qd5-b5 21.Ng3-e4 Nf6xe4 22.Bc2xe4 Be6-d5 23.Be4xd5 Rd8xd5 24.Re1-e7 Ra8-e8
= (0.05) Depth: 9 00:00:00 44kN
20...Qd5-b5 21.Re1-e5 Nf6-d5 22.Bc2-e4 Qb5-b6 23.Ng3-f5 Nd5-f6
= (0.07) Depth: 10 00:00:03 225kN
20...Qd5-c6 21.Ng3-e2 Nf6-d5 22.Ne2-f4 Rd8-e8 23.Re1-e5 f7-f6
= (-0.01) Depth: 10 00:00:07 524kN
20...Qd5-c6 21.Ng3-e2 Nf6-d5 22.Ne2-f4 Rd8-e8 23.Re1-e5 Nd5xf4 24.Qd2xf4 Be6-d5
= (-0.03) Depth: 11 00:00:08 593kN
20...Qd5-c6 21.Ng3-e4 Nf6xe4 22.Bc2xe4 Be6-d5 23.Qd2-e3 Rd8-d6 24.Qe3-f4 Rd6-f6 25.Qf4-e5
= (-0.07) Depth: 12 00:00:18 1274kN
20...Qd5-c6 21.Ng3-e4 Nf6xe4 22.Bc2xe4 Be6-d5 23.Qd2-e3 Rd8-d6 24.Qe3-f4 Bd5xe4 25.Re1xe4 Rd6-f6 26.Qf4-e3 Ra8-d8
= (-0.06) Depth: 13 00:00:23 1635kN
20...Qd5-c6 21.Ng3-e4 Nf6-d5 22.Ne4-c5 Be6-c8 23.Nc5-a4 Bc8-e6 24.Na4-c5 Be6-c8 25.Nc5-a4 Bc8-e6 26.Na4-c5 Be6-c8 27.Nc5-a4
= (0.00) Depth: 14 00:00:39 2739kN
20...Qd5-c6 21.Ng3-e4 Nf6-d5 22.Ne4-c3 b7-b5 23.Bc2-e4 Ra8-b8 24.a2-a3 Qc6-b6 25.Be4xd5 Be6xd5 26.Qd2-f4 Qb6-b7
= (0.07) Depth: 15 00:01:05 4314kN
20...Qd5-b5 21.Re1-e5 Qb5-b6 22.Ng3-e4 Nf6xe4 23.Bc2xe4 f7-f6 24.Re5-c5 Rd8-d7 25.Qd2-c3 Ra8-d8 26.g2-g3 Be6-f7
= (0.04) Depth: 15 00:01:33 6528kN
20...Qd5-b5 21.Re1-e5 Qb5-b6 22.Bc2-f5 Be6-d5 23.Qd2-c3 g7-g6 24.Bf5-c2 Qb6-c6 25.Rd1-e1 Rd8-e8 26.f2-f3 Ra8-d8
= (0.01) Depth: 16 00:03:38 14184kN
20...Qd5-b5 21.Re1-e5 Qb5-b6 22.Bc2-f5 Be6-d5 23.Qd2-c3 g7-g6 24.Bf5-c2 Qb6-c6 25.Rd1-e1 Rd8-e8 26.f2-f3 Ra8-d8 27.Ng3-e2
= (0.05) Depth: 17 00:04:32 17302kN
20...Qd5-b5 21.Re1-e5 Qb5-b6 22.Bc2-f5 Be6-d5 23.Ng3-e4 Bd5xe4 24.Bf5xe4 Nf6xe4 25.Re5xe4 Rd8-e8 26.Rd1-e1 Re8xe4 27.Re1xe4
= (0.05) Depth: 18 00:07:37 29141kN
20...Qd5-c6 21.Ng3-e4 Be6-d5 22.Ne4xf6+ Qc6xf6 23.Re1-e5 Qf6-c6 24.f2-f3 f7-f6 25.Re5-e1 Rd8-e8 26.Qd2-b4 a7-a5 27.Qb4-a3
= (0.00) Depth: 18 00:10:49 44655kN

(so k, 21.05.2008)

Uri Blass · Post by **Uri Blass** » Wed May 21, 2008 7:50 am

I can add that the position 2 plies earlier was

[d]r2r2k1/pp3pp1/4bn1p/2bq4/2pB4/2P3NP/PPBQ1PP1/3RR1K1 b - - 0 19

Bxd4 is probably the best move inspite of the fact that white get a passed pawn at d4.

I believe that a typical human mistake is not to pay attention to positional factors that were not in the root position(but later in the game the human may pay attention to them).

Uri

PK · Post by PK » Wed May 21, 2008 10:14 am

perhaps finding a really bad way of implementing late move reduction would help in achieving the "dark tunnel effect" of too narrow tactical vision. for example, reducing checks, forks and passed pawn pushes might do the trick. the same goes for reducing everything that might possibly be a sacrifice.

as far as eval is concerned - there is a lot of stuff designed explicitly to avoid blunders, like trapped piece detection. IMHO all of that should be turned off.

but in the first place - what is the bad play about? I can think about a couple of approximations, the most promising being ignore what the opponent does. this can be reflected in eval by decreasing opponent's mobility and king attack values, but it would breed only a mad cow variety of a weak player.

Also, bad play is about basing a game on unfounded beliefs. So perhaps eval ought to have a "quirk mode", with the false heuristics turned on randomly at the beginning of the game. A couple of examples, most of them based on my family and club opponents from the school days:

- "delay castling, as it is showing your opponent where to attack"
- "try to castle long as Black, it will disrupt Your opponent's plans"
- "major pieces are overrated, go for imbalances just for the fun of it"
- "in semi-open games, white always goes for the kingside pawn storm"
- "a3 and h3 should be played just in case"
- "knights are better than bishops"
- "exchange pieces just in case, less wood = less errors"

in short: not a planless game, but playing with a wrong plan

Edsel Apostol · Post by **Edsel Apostol** » Wed May 21, 2008 11:11 am

Hi Tord,

Maybe you should randomize your eval factors. For example, at a search, disable king safety, at the next search disable pawn structure, etc. Or make the weight of an eval factor higher on that particular search, and another eval factor in the next search. That was how I played before when I was young and a weak player. Just shuffle around the pieces without much plan, looking for more or less 2 to 3 ply of tactics to capture a pawn or a piece.

Another idea, adjust the eval factors according to the rating you want to achieve. At lower ratings, lower the king safety, passed pawns, mobility and give a higher weight for material. Adjust accordingly as the rating goes higher.

This is just for the positional aspects though. You need to limit the search for lowering the tactical strength.

By the way, I think that if you would support a feature like the personalities in the Chessmaster software, I'm sure that your engine would become extremely popular.

Kempelen · Post by **Kempelen** » Wed May 21, 2008 11:30 am

Hi Tord,

maybe you could find interesting to play "non-standard blitz" game. In this book you will find certain rules for play blitz against human that would be "weak rules" at standar time controls. Is a very good, non-standard chess book which main target is to play good blitz, no perfect chess game. Of course it could non perform the same against computers.....
If you are interestin maybe I could give you more details about the book by private mail.
Best regards,
FS

yoshiharu · Post by **yoshiharu** » Wed May 21, 2008 2:42 pm

Tord Romstad wrote: That's some relief. Glaurung's ICC rating when playing at an Elo setting of 1000 seems to have stabilized around 2100. Right now, it's 2094, after 44 games (+33,-10,=1). It has a lost position against a 1400 rated player in the currently running game, but I have lost all faith in human players and expect Glaurung to win in the end.

First of all, with all due respect to your frustration, I must admit this thread is one of the funniest I've ever read

Then, I am curious about the "game profile" of crippled-Glaurung games versus humans: I understand that there are many middlegame positions in your games where the human has a winning advantage and nonetheless manages to quickly lose the game; but clear of this kind of games, what are the stats when the game is balanced and the middlegame wears out? How are the endings of these games?

BTW, it is probably not so frequent that a club-level player drops a piece, sometimes weak play is the result of positional misjudgement. To make an engine weaker against humans (as somebody else was already saying) I would limit the deep tactics (therefore limit extensions), and "detune" somehow the positional scores (and the material ones, maybe to a lesser extent). But also, since part of the positional eval is given by search, one could try to mix the approach exposed by HGM and yours, so that some move gets consistently pruned in all the search, _and_ for all the others there is a pruning probability, with the blend of the two depending on depth and kind of position (closed, open, etc.). To simulate not only generic blurred vision, but also some blind spot, so to speak. Is it worth a try? Well, at least I don't think it would make your engine any stronger

Cheers, Mauro

Tony · Post by **Tony** » Wed May 21, 2008 3:40 pm

I've been working on this now for a while.

I wanted to make a nice engine for my pupils. Unfortunately they start at 800 elo.

I found that at least 4 things are important.

1) searchdepth. Limit search.

2) amount of eval. My youngest pupils haven't got a clue what a passed pawn is so I shouldn't score them (except when they are on the 7th rank, because they do know what promotion is) so throw a lot of stuff out.

3) unbalanced evaluation. When is an advantage worth the disadvantage ? Add (or subtract) a random percentage (based on the level) to every score to mess it up.

4) smooth/gradual transition from one level to the next. No fun if you beat the engine you go to the next level, you get killed, drop a level, easily win etc.

This messed up engine is capable of amusing my 6 year old son as well as a 2200 elo club player. ( At different levels, that is my son is at level 7, the club player at 1200)

Writing this weak engine has given me more fun the last few month, than trying to write a strong has given me last year.

Tony

bob · Post by **bob** » Wed May 21, 2008 5:25 pm

Marek Soszynski wrote:Tord,

Here's my idea...
Randomly exclude a number of moves from consideration altogether. For very weak play half the number of possible moves (or more) could be excluded. For a very slight reduction in playing level a single move (selected at random) could be excluded on every other turn. In addition, for a realistically reduced level of play, the engine should start its calculations after an interval.

This is _very_ difficult to make work. The trees are huge, and just choosing random moves to not search doesn't work very well in trying to weaken the engine. When I started trying to do this, that was my first idea, to weaken the search and leave everything else along. I turned off all extensions/reductions, and then started to just pick a random point at each node to say "no more moves are available". And I could not get it to play (and look) like a 1200 or 1400 player. Beginners will push pawns without regard to creating weak pawns, weak squares, or weak king safety. Etc. I could not get below 2000 in testing on my cluster or ICC, and in watching the games, they just didn't look right. I began to realize that a crippled search would affect one aspect ot the game (tactics) but not do a thing for positoinal judgement. Hence my later effort at cripling the evaluation. But just toning down the positional scoring was not enough without factoring in the random value I ended up adding.

This is a lot harder to do than it appears...

And tuning against computers is not a good idea. Doesn't take much of a weakness in tactics to allow chess engines to win 99% of the games, but humans won't come anywhere near that...

Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly

Re: Artificial stupidity - making a program play badly