Contempt and the ELO model.

Daniel Shawul · Post by **Daniel Shawul** » Thu Sep 05, 2013 4:27 pm

Michel wrote: What is CG? Probably obvious but I feel a bit stupid right now...

My fault I should learn to use less abbreviations. CG=conjugate gradient, MM=minorization-maximization.

Having an extra parameter is a nuissance but unavoidable I think.

It would be nice if the model would quantify that certain engines play "drawish" or "aggressive".

The way I have it in the code assumes agg1 & agg2 to be known parameters (constants) to be provided with each player. So given Houdini's rating with high contempt (20 centi-pawns?), we might be able to answer question like, what its rating would have been without without it. I reckon that may be interesting to some here

. But I don't know if we can make contempt a real parameter and figure them out for all players from game results. An engine performing much better against a certain opponent could be either due to luck or use of contempt and it seems hard to distinguish...

Michel · Post by **Michel** » Thu Sep 05, 2013 4:48 pm

agg1 & agg2 to be known parameters (constants) to be provided with each player.

A player would only have "agg" (besides elo). I used the notation agg1 and agg2 to refer to the first and second player in a game.

An engine performing much better against a certain opponent could be either due to luck or use of contempt and it seems hard to distinguish...

Well the same goes for elo. Naturally you would have to assume the same contempt against all opponents.

Daniel Shawul · Post by **Daniel Shawul** » Thu Sep 05, 2013 5:35 pm

A player would only have "agg" (besides elo). I used the notation agg1 and agg2 to refer to the first and second player in a game.

Typo, that is what I had in mind, one agg parameter per player. So the function I posted would have passed agg1 and agg2 for pelo and oelo.

I guess it may be possible to figure it out by looking at results of an engine against all players (not just one), and finding out a pattern of extremely good performance against lower ranked engines and bad results against equally ranked. The best scenario would be a round-robin tournament. Say agg=1 for all players at the start of iterations, what is the constraint equation to be used to update the agg parameter at each iteration? Edit: Maybe it is not necessary, and we can just move it in maximum gradient of the likelihood function like I did when I added slopes for drawelo/homeadvantage

Michel · Post by **Michel** » Thu Sep 05, 2013 5:43 pm

Well to solve the equations it it is perhaps best to introduce

ELO=agg*elo

Then the formula becomes

agg1*ELO2-agg2*ELO1

(this is a nice determinant!).

So I think you could first update ELO1, then agg1 with the MM algorithm, then go to the second player, third player and continue cyclicly until convergence.

EDIT: Although nice perhaps it it not necessary to introduce ELO. The iterative method
might work just as well with the original "elo".

Don · Post by **Don** » Thu Sep 05, 2013 6:02 pm

Uri Blass wrote:
hgm wrote:Setting contempt would simply decrease the DrawElo parameter of the model, right? You would get fewer draws, and force them to become either wins or losses. What they become depends (statistically) on the Elo difference with your opponent; you are not going to play any better just because you refuse to take draws.

To use this in an analysis you should allow a two-parameter description of players (strength, predisposition for contempt), where the DrawElo should be a function of the rating difference and predisposition for contempt of both players.
I think that things are not so simple.

I can imagine that contempt that is too high may increase the number of draws against some weaker opponent.

Imagine that you play against some opponent and you need to choose between move A and move B.

After move A the opponent can force a draw and you see it but the opponent does not see the draw or evaluates some alternative as better than draw for it.
After move B the opponent has an advantage of 0.5 pawn but no forced draw that you can see.

With very high comtempt you can prefer move B and later fight for a draw and get the draw when practically you could win by move A because in case of move A the choice of the opponent is not to force the draw but to play a losing move.

It's not clear whether your reasoning is sound or not. Essentially, there is an ideal contempt factor to set for a given pairing which says if the score gets this bad, the chances are exactly even. If you make the contempt higher than this, you are instructing the program not to take a draw even though it's chances of winning are less than 50%. You have basically "fixed" the odds against this program. And yet you are saying this will cause the program to draw more and this is because the stronger program is going to fight for a draw now even though we instructed it to reject a draw? The issues are complex and I'm not trying to refute the concept, but the reasoning I do not follow.

I am always very suspicious of any chain of reasoning based on constructed scenario's. For example you seem to assume that the opponent "does not see the draw" and the rest of your argument flows from that.

Most fallacious reasoning (if that is what this is) is based on an obsession with a subset of possible scenarios that "might" happen. I think I do this myself sometimes. The individual's thoughts are dominated with those scenarios and given too much weight. Because they can imagine it happening they develop a fixation - for some reason the possibility appeals to them. That is how you get unreasonable conspiracy theories for example. The idea of a conspiracy is more appealing that rational thought sometimes.

I am sort of an advocate of Occam's Razor. I do not religious subscribe to it, but my very first pass at a problem is very simple because I don't try too hard to envision every possible scenario that "might" invalidate my test setup and I don't initially obsess over trying to cover them. I have seen such tests performed where every superstition the tester has is covered at great complexity, most of them quite silly. I would go absolutely insane if I tried to do that. I would basically have to run every test hundreds of times - obsessing over the book (maybe the book is "favorable" to one change over the other?), the time control, lack of transitivity (maybe the change works against version X but not version Y?) and the list goes on and on and on. Maybe it has something to do with the order I configured the player, or the computer the test was run on? But over time sometimes things come to light that I now know I must take into consideration, and if I suspect something I will test it in order to put it to rest.

The things that really matter will generally show themselves by conflicting results and you can then analyze them with hypothesis testing. We discovered several things of that nature and address them as they happen. One example is that we can over-provision our tester if we are running Komodo but not if we are running against windows programs - the test will be highly favorable to Komodo and therefore misleading.

Anecdote alert:

Here is something that happened to me once. As the stronger player I once played someone and left my queen hanging by accident - and the opponent did NOT take the queen - so I was off the hook. When I later asked him why he didn't take my queen I discovered that he had way too much respect for me. His response was that if I left my queen hanging it must have been a trap of some kind and he did not want to take the chance!!! It's a bit humorous but unfortunately this is how the person thought all the time - and yes, he was big on conspiracy theories. The first thought that came into his brain is the one he "followed" and built a belief system around.

Daniel Shawul · Post by **Daniel Shawul** » Thu Sep 05, 2013 6:07 pm

Why don't we just do the following. I guess that the reason we needed to incorporate 'contempt' into ratings is to figure out 'real' performance of an engine against equally ranked opponents. As it is Houdini's rating is correct for the given conditions so there is nothing to correct, but now we moved our interest to ratings against equally ranked opponents. So instead of trying to find out contempt for each player, why not use one parameter for all players to weigh good scores against close opponents as more important

. Lets call it again 'contempt' for lack of better term. Then taking out the bottom 100 opponents brings down Houdini's rating while increasing Stockfish/Komodo rating and so on. So the new contempt param we introduced describes our interest in what kind of rating we want to see. This is much easier to program and achieves the same goal so far as I understand...

Daniel Shawul · Post by **Daniel Shawul** » Thu Sep 05, 2013 6:27 pm

I found a better name for this "Importance factor" , I. So this would be a function of elo_difference, where I=1 for say abs(elo_diff) <= 50, and then tapers off with larger differences (like a gaussian curve). This shape can be specified for rating calculation, and we get different ratings based on shape of I. Right now I=1 for all rating differences, which can be considered default.

Laskos · Post by **Laskos** » Thu Sep 05, 2013 7:02 pm

Daniel Shawul wrote:I found a better name for this "Importance factor" , I. So this would be a function of elo_difference, where I=1 for say abs(elo_diff) <= 50, and then tapers off with larger differences (like a gaussian curve). This shape can be specified for rating calculation, and we get different ratings based on shape of I. Right now I=1 for all rating differences, which can be considered default.

First step would be using a rectangular function for I

Daniel Shawul · Post by **Daniel Shawul** » Thu Sep 05, 2013 7:23 pm

Indeed. It has a curious analogy in turbulence modeling where a certain cutoff-frequency is used to separate the important eddies from the rest. Incidentally there is a 'box' filter and 'gaussian' filter there as well besides many others!
Anyway how do you implement this idea? I am thinking first to produce a preliminary rating with I=1 (default). Then go to game results between two individual and modify percentage scores by multiplying with I, which is known now that we have ratings. Then calculate final ratings. So basically a two step process, any ideas?
Also I am not clear about the discussion on contempt here. Is it something inherent that changes rating calculation or simply a description of our wish to see certain kind of rating. I am more inclined to the later now...

Michel · Post by **Michel** » Thu Sep 05, 2013 7:25 pm

Why don't we just do the following. I guess that the reason we needed to incorporate 'contempt' into ratings is to figure out 'real' performance of an engine against equally ranked opponents. As it is Houdini's rating is correct for the given conditions so there is nothing to correct, but now we moved our interest to ratings against equally ranked opponents. So instead of trying to find out contempt for each player, why not use one parameter for all players to weigh good scores against close opponents as more important Idea. Lets call it again 'contempt' for lack of better term. Then taking out the bottom 100 opponents brings down Houdini's rating while increasing Stockfish/Komodo rating and so on. So the new contempt param we introduced describes our interest in what kind of rating we want to see. This is much easier to program and achieves the same goal so far as I understand...

Well this is not my motivation... I would like to resolve the incompatibility of contempt with the elo model by suitably augmenting the elo model.

What you are proposing is different: it is to change the standard maximum likelihood estimator to one which is _less sensitive_ to contempt settings in engines (if I understand correctly you are proposing some kind of weighted maximum likelihood estimator).

Contempt and the ELO model.

Re: Contempt and the ELO model.

Re: Contempt and the ELO model.

Re: Contempt and the ELO model.

Re: Contempt and the ELO model.

Re: Contempt and the ELO model.

Re: Contempt and the ELO model.

Re: Contempt and the ELO model.

Re: Contempt and the ELO model.

Re: Contempt and the ELO model.

Re: Contempt and the ELO model.