"Intrinsic Chess Ratings" by Regan, Haworth -- seq

modolief · Post by **modolief** » Mon Nov 20, 2017 12:53 am

Is there any more recent version of this paper, hopefully one that would use a modern open source engine stronger than Rybka 3?

https://www.cse.buffalo.edu//~regan/pap ... eHa11c.pdf

Abstract:

This paper develops and tests formulas for representing playing strength at chess by the quality of moves played, rather than by the results of games. Intrinsic quality is estimated via evaluations given by computer chess programs run to high depth, ideally so that their playing strength is sufficiently far ahead of the best human players as to be a ‘relatively omniscient’ guide. Several formulas, each having intrinsic skill parameters s for sensitivity and c for consistency, are argued theoretically and tested by regression on large sets of tournament games played by humans of varying strength as measured by the internationally standard Elo rating system. This establishes a correspondence between Elo rating and the parameters. A smooth correspondence is shown between statistical results and the century points on the Elo scale, and ratings are shown to have stayed quite constant over time. That is, there has been little or no ‘rating inflation’. The theory and empirical results are transferable to other rational choice settings in which the alternatives have well-defined utilities, but in which complexity and bounded information constrain the perception of the utility values.

walter koroljow · Post by **walter koroljow** » Mon Nov 20, 2017 3:41 pm

There is an "update", and Stockfish is involved: https://content.iospress.com/articles/i ... al/icg0012

Good Luck!

KWRegan · Post by **KWRegan** » Mon Nov 20, 2017 6:26 pm

Jean-Marc Alliot's paper linked by Walter Koroljow is a good one, especially for the idea of applying a Markov process. What he calls "ponderated conformance" is the same basic idea as what my papers call "scaled difference"; the main technical difference is that I treat essentially what he calls 1/(1+v_b/k_1) as a differential to be integrated rather than a direct multiplier. The basic point about scaling is made in my weblog article "When Data Serves Turkey" (https://rjlipton.wordpress.com/2016/11/ ... es-turkey/) a year ago. The followup https://rjlipton.wordpress.com/2016/12/ ... y-grinder/ is about issues with the win-expectation curve, which provides a natural way to scale the effect of mistakes that is closely aligned with what Alliot says about his Markov method at the end of the paper's section 3.2.3. The paper also uses Diogo Ferreira's notion of "gain" which I believe really needs to be treated as a quantity across depths, not just one reference depth such as 20.

There are multiple updates to my 2011 paper with Guy Haworth on my publications page https://www.cse.buffalo.edu//~regan/pub ... html#chess --- some also involving Stockfish and Komodo and Houdini. I can draw special attention to the papers with my student Tamal Biswas. One of them is linked and described in my popular article https://rjlipton.wordpress.com/2015/10/ ... tisficing/, among whose implications is that we can estimate your rating while looking only at moves where you made big mistakes as judged by engines. My page has full citation information.

What most notably hasn't appeared yet is an update to my "Intrinsic Ratings Compendium" https://www.cse.buffalo.edu//~regan/pap ... 12IPRs.pdf using the newer engines. This is pending resolution of issues which I described earlier this year at https://rjlipton.wordpress.com/2017/05/ ... analytics/ on the same blog. The main point is that one wants a model that gives reliable projections for all the moves in a position, not just the best move(s) (versus the rest lumped together) as currently. The latter suffices for cheating detection but gives only partial answers to questions such as how traps really work and whether we all tend to move Knights more often than we "should"---or would be expected to if moves were based purely on perceptions of value.

Vinvin · Post by **Vinvin** » Mon Nov 20, 2017 8:03 pm

Some papers on close subjects :
http://www.chessanalysis.ee/a%20study%2 ... rength.pdf
https://hal.inria.fr/hal-01307091/file/RT-479.pdf
https://ailab.si/matej/doc/Computer_Ana ... mpions.pdf

modolief · Post by **modolief** » Mon Nov 20, 2017 10:17 pm

Wow guys, thanks so much!! I was seeing some discussion of the "rating inflation" question over on the Chessbrah forum (on a "Discord" server). Some basic googling led me to the 2011 paper but nothing more recent. Your feedback (particularly from an author!) is much appreciated!

--Kai M.

KWRegan · Post by **KWRegan** » Tue Nov 21, 2017 1:54 pm

On the specific topic of rating inflation, the paper you want is actually the 2011 Regan-Haworth followup in which GM Bartlomiej Macieja joined us: https://www.cse.buffalo.edu/~regan/pape ... RMH11b.pdf Plus this chart on my main page: https://www.cse.buffalo.edu//~regan/che ... reg4yr.jpg But as we say in the paper, it depends on what one means by "inflation".

nimh · Post by **nimh** » Fri Nov 24, 2017 12:05 am

Vinvin wrote:Some papers on close subjects :
http://www.chessanalysis.ee/a%20study%2 ... rength.pdf
https://hal.inria.fr/hal-01307091/file/RT-479.pdf
https://ailab.si/matej/doc/Computer_Ana ... mpions.pdf

The study I created you refer to is outdated by now. The updated versioon can be seen here: http://www.chessanalysis.ee/Quality%20o ... suring.pdf

While Regan finds that there has been no rating inflation, my conclusion is different; there has been inflation 5p per decade in FIDE ratings (due to skills evolving slower than ratings) and 38p per decade in Chessmetrics ratings (top ratings relatively stable, skills steadily rising).

KWRegan · Post by **KWRegan** » Tue Nov 28, 2017 9:02 pm

Erik, it is interesting that your Graph 24 in section 4.3 shows the same general shape as mine at https://www.cse.buffalo.edu/~regan/ches ... reg4yr.jpg which is for the bellwether 2600 level only. That is, it shows a fairly smooth downward from the late 1980s through about 2002 then a perk up for about 8 years. My graph does not have the final movement down, but it is a 4-year moving average ending with 2010-2013.

Both graphs are significant as evidence against the particular commonly-voiced hypothesis of 100--150 points inflation sine about 1980. However, your study does not give enough information about the error bars of the regression line shown in your diagram to say whether the 5 points per decade (about 20 points of inflation overall) is significant. Your regression package should give you coefficients which determine the error of the slope---if that error is bigger than the slope itself then your results are consistent with zero inflation. Moreover, the shape of the graph itself suggests that a simple linear model may not be appropriate. Both of our graphs agree with one and possibly two systematic factors:

(1) The elimination of adjournments and wider movement toward the faster G/90+30"/move time controls especially in Swisses, from which much of our data comes (and Olympiads and World Cups and similar events). after 1990.

(2) The greater use of computers in preparation, mostly since 2000 on a wide scale.

The question is whether to try to correct for (1) & (2) before applying the regression. This feeds into a larger point: it is not clear from my reading how the data in your sets is controlled. I gather from other charts that Rapid and Blitz games were segregated at least in the cohort for 2700, but even that should be clearer. My chart uses all available games with both players rated between 2590 and 2610 (widened to +-15 or +- 20 for some early years) in round-robin and small Swiss (<= 64 players over 9+ rounds) events at standard time controls.

althus · Post by **althus** » Wed Nov 29, 2017 3:17 am

Ken, do you anywhere address the thoughts of John Nunn on performance of past players, as in his chapter "Test of Time"? You two are about the only people to have written about the topic intelligently. It would be interesting to put you in the same room and see what came of it. Apart from a lot of math I wouldn't comprehend

nimh · Post by **nimh** » Sun Dec 03, 2017 11:42 pm

Since our graphs represent absolutely different things, it's perhaps nothing but a coincidence.

I agree with you that 100-150 points inflation per 4 decades is an exaggeration. There are people who hold that chess skills have not much improved in the course of time. And they see that Karpov was rated near 2700 most of the 70-ties...Erroneous conclusions are easy to draw.

I didn't add error bars, beause I did not want it to look too complicated. It is after all just an amateur research, a hobby or so. My knowledge on mathematical statistics is of beginner level. No doubt this affects the clarity of what I want to impart on my paper. I must say that I never have thoroughly read your papers on chess, just dipped into them, because the math is too difficult to understand.

Linear trend has the best correlation, that's why I chose it. BTW, if I removed data from 1970-1974, the linear trend would indicate 14.4 elo inflation per decade. Perhaps that is even more credible?

I tried to correct for varying time controls and adjournments, though I dealt with the the latter a bit arbitrarily; just adder 1 hr thinking time to each game adjourned. See ELO + time bars on graph 19.

"Intrinsic Chess Ratings" by Regan, Haworth -- seq

"Intrinsic Chess Ratings" by Regan, Haworth -- seq

Re: "Intrinsic Chess Ratings" by Regan, Haworth --

Re: "Intrinsic Chess Ratings" by Regan, Haworth --

Re: "Intrinsic Chess Ratings" by Regan, Haworth --

Re: "Intrinsic Chess Ratings" by Regan, Haworth --

Re: "Intrinsic Chess Ratings" by Regan, Haworth --

Re: "Intrinsic Chess Ratings" by Regan, Haworth --

Re: "Intrinsic Chess Ratings" by Regan, Haworth --

Re: "Intrinsic Chess Ratings" by Regan, Haworth --

Re: "Intrinsic Chess Ratings" by Regan, Haworth --