"Intrinsic Chess Ratings" by Regan, Haworth  seq
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
"Intrinsic Chess Ratings" by Regan, Haworth  seq
Is there any more recent version of this paper, hopefully one that would use a modern open source engine stronger than Rybka 3?
https://www.cse.buffalo.edu//~regan/pap ... eHa11c.pdf
Abstract:
This paper develops and tests formulas for representing playing strength at chess by the quality of moves played, rather than by the results of games. Intrinsic quality is estimated via evaluations given by computer chess programs run to high depth, ideally so that their playing strength is sufficiently far ahead of the best human players as to be a ‘relatively omniscient’ guide. Several formulas, each having intrinsic skill parameters s for sensitivity and c for consistency, are argued theoretically and tested by regression on large sets of tournament games played by humans of varying strength as measured by the internationally standard Elo rating system. This establishes a correspondence between Elo rating and the parameters. A smooth correspondence is shown between statistical results and the century points on the Elo scale, and ratings are shown to have stayed quite constant over time. That is, there has been little or no ‘rating inflation’. The theory and empirical results are transferable to other rational choice settings in which the alternatives have welldefined utilities, but in which complexity and bounded information constrain the perception of the utility values.
https://www.cse.buffalo.edu//~regan/pap ... eHa11c.pdf
Abstract:
This paper develops and tests formulas for representing playing strength at chess by the quality of moves played, rather than by the results of games. Intrinsic quality is estimated via evaluations given by computer chess programs run to high depth, ideally so that their playing strength is sufficiently far ahead of the best human players as to be a ‘relatively omniscient’ guide. Several formulas, each having intrinsic skill parameters s for sensitivity and c for consistency, are argued theoretically and tested by regression on large sets of tournament games played by humans of varying strength as measured by the internationally standard Elo rating system. This establishes a correspondence between Elo rating and the parameters. A smooth correspondence is shown between statistical results and the century points on the Elo scale, and ratings are shown to have stayed quite constant over time. That is, there has been little or no ‘rating inflation’. The theory and empirical results are transferable to other rational choice settings in which the alternatives have welldefined utilities, but in which complexity and bounded information constrain the perception of the utility values.

 Posts: 8
 Joined: Fri Oct 15, 2010 9:06 pm
 Location: Delaware, USA
Re: "Intrinsic Chess Ratings" by Regan, Haworth 
There is an "update", and Stockfish is involved: https://content.iospress.com/articles/i ... al/icg0012
Good Luck!
Good Luck!
Re: "Intrinsic Chess Ratings" by Regan, Haworth 
JeanMarc Alliot's paper linked by Walter Koroljow is a good one, especially for the idea of applying a Markov process. What he calls "ponderated conformance" is the same basic idea as what my papers call "scaled difference"; the main technical difference is that I treat essentially what he calls 1/(1+v_b/k_1) as a differential to be integrated rather than a direct multiplier. The basic point about scaling is made in my weblog article "When Data Serves Turkey" (https://rjlipton.wordpress.com/2016/11/ ... esturkey/) a year ago. The followup https://rjlipton.wordpress.com/2016/12/ ... ygrinder/ is about issues with the winexpectation curve, which provides a natural way to scale the effect of mistakes that is closely aligned with what Alliot says about his Markov method at the end of the paper's section 3.2.3. The paper also uses Diogo Ferreira's notion of "gain" which I believe really needs to be treated as a quantity across depths, not just one reference depth such as 20.
There are multiple updates to my 2011 paper with Guy Haworth on my publications page https://www.cse.buffalo.edu//~regan/pub ... html#chess  some also involving Stockfish and Komodo and Houdini. I can draw special attention to the papers with my student Tamal Biswas. One of them is linked and described in my popular article https://rjlipton.wordpress.com/2015/10/ ... tisficing/, among whose implications is that we can estimate your rating while looking only at moves where you made big mistakes as judged by engines. My page has full citation information.
What most notably hasn't appeared yet is an update to my "Intrinsic Ratings Compendium" https://www.cse.buffalo.edu//~regan/pap ... 12IPRs.pdf using the newer engines. This is pending resolution of issues which I described earlier this year at https://rjlipton.wordpress.com/2017/05/ ... analytics/ on the same blog. The main point is that one wants a model that gives reliable projections for all the moves in a position, not just the best move(s) (versus the rest lumped together) as currently. The latter suffices for cheating detection but gives only partial answers to questions such as how traps really work and whether we all tend to move Knights more often than we "should"or would be expected to if moves were based purely on perceptions of value.
There are multiple updates to my 2011 paper with Guy Haworth on my publications page https://www.cse.buffalo.edu//~regan/pub ... html#chess  some also involving Stockfish and Komodo and Houdini. I can draw special attention to the papers with my student Tamal Biswas. One of them is linked and described in my popular article https://rjlipton.wordpress.com/2015/10/ ... tisficing/, among whose implications is that we can estimate your rating while looking only at moves where you made big mistakes as judged by engines. My page has full citation information.
What most notably hasn't appeared yet is an update to my "Intrinsic Ratings Compendium" https://www.cse.buffalo.edu//~regan/pap ... 12IPRs.pdf using the newer engines. This is pending resolution of issues which I described earlier this year at https://rjlipton.wordpress.com/2017/05/ ... analytics/ on the same blog. The main point is that one wants a model that gives reliable projections for all the moves in a position, not just the best move(s) (versus the rest lumped together) as currently. The latter suffices for cheating detection but gives only partial answers to questions such as how traps really work and whether we all tend to move Knights more often than we "should"or would be expected to if moves were based purely on perceptions of value.
Re: "Intrinsic Chess Ratings" by Regan, Haworth 
Wow guys, thanks so much!! I was seeing some discussion of the "rating inflation" question over on the Chessbrah forum (on a "Discord" server). Some basic googling led me to the 2011 paper but nothing more recent. Your feedback (particularly from an author!) is much appreciated!
Kai M.
Kai M.
Re: "Intrinsic Chess Ratings" by Regan, Haworth 
On the specific topic of rating inflation, the paper you want is actually the 2011 ReganHaworth followup in which GM Bartlomiej Macieja joined us: https://www.cse.buffalo.edu/~regan/pape ... RMH11b.pdf Plus this chart on my main page: https://www.cse.buffalo.edu//~regan/che ... reg4yr.jpg But as we say in the paper, it depends on what one means by "inflation".
Re: "Intrinsic Chess Ratings" by Regan, Haworth 
The study I created you refer to is outdated by now. The updated versioon can be seen here: http://www.chessanalysis.ee/Quality%20o ... suring.pdfVinvin wrote:Some papers on close subjects :
http://www.chessanalysis.ee/a%20study%2 ... rength.pdf
https://hal.inria.fr/hal01307091/file/RT479.pdf
https://ailab.si/matej/doc/Computer_Ana ... mpions.pdf
While Regan finds that there has been no rating inflation, my conclusion is different; there has been inflation 5p per decade in FIDE ratings (due to skills evolving slower than ratings) and 38p per decade in Chessmetrics ratings (top ratings relatively stable, skills steadily rising).
Re: "Intrinsic Chess Ratings" by Regan, Haworth 
Erik, it is interesting that your Graph 24 in section 4.3 shows the same general shape as mine at https://www.cse.buffalo.edu/~regan/ches ... reg4yr.jpg which is for the bellwether 2600 level only. That is, it shows a fairly smooth downward from the late 1980s through about 2002 then a perk up for about 8 years. My graph does not have the final movement down, but it is a 4year moving average ending with 20102013.
Both graphs are significant as evidence against the particular commonlyvoiced hypothesis of 100150 points inflation sine about 1980. However, your study does not give enough information about the error bars of the regression line shown in your diagram to say whether the 5 points per decade (about 20 points of inflation overall) is significant. Your regression package should give you coefficients which determine the error of the slopeif that error is bigger than the slope itself then your results are consistent with zero inflation. Moreover, the shape of the graph itself suggests that a simple linear model may not be appropriate. Both of our graphs agree with one and possibly two systematic factors:
(1) The elimination of adjournments and wider movement toward the faster G/90+30"/move time controls especially in Swisses, from which much of our data comes (and Olympiads and World Cups and similar events). after 1990.
(2) The greater use of computers in preparation, mostly since 2000 on a wide scale.
The question is whether to try to correct for (1) & (2) before applying the regression. This feeds into a larger point: it is not clear from my reading how the data in your sets is controlled. I gather from other charts that Rapid and Blitz games were segregated at least in the cohort for 2700, but even that should be clearer. My chart uses all available games with both players rated between 2590 and 2610 (widened to +15 or + 20 for some early years) in roundrobin and small Swiss (<= 64 players over 9+ rounds) events at standard time controls.
Both graphs are significant as evidence against the particular commonlyvoiced hypothesis of 100150 points inflation sine about 1980. However, your study does not give enough information about the error bars of the regression line shown in your diagram to say whether the 5 points per decade (about 20 points of inflation overall) is significant. Your regression package should give you coefficients which determine the error of the slopeif that error is bigger than the slope itself then your results are consistent with zero inflation. Moreover, the shape of the graph itself suggests that a simple linear model may not be appropriate. Both of our graphs agree with one and possibly two systematic factors:
(1) The elimination of adjournments and wider movement toward the faster G/90+30"/move time controls especially in Swisses, from which much of our data comes (and Olympiads and World Cups and similar events). after 1990.
(2) The greater use of computers in preparation, mostly since 2000 on a wide scale.
The question is whether to try to correct for (1) & (2) before applying the regression. This feeds into a larger point: it is not clear from my reading how the data in your sets is controlled. I gather from other charts that Rapid and Blitz games were segregated at least in the cohort for 2700, but even that should be clearer. My chart uses all available games with both players rated between 2590 and 2610 (widened to +15 or + 20 for some early years) in roundrobin and small Swiss (<= 64 players over 9+ rounds) events at standard time controls.
Re: "Intrinsic Chess Ratings" by Regan, Haworth 
Ken, do you anywhere address the thoughts of John Nunn on performance of past players, as in his chapter "Test of Time"? You two are about the only people to have written about the topic intelligently. It would be interesting to put you in the same room and see what came of it. Apart from a lot of math I wouldn't comprehend
Re: "Intrinsic Chess Ratings" by Regan, Haworth 
Since our graphs represent absolutely different things, it's perhaps nothing but a coincidence. I agree with you that 100150 points inflation per 4 decades is an exaggeration. There are people who hold that chess skills have not much improved in the course of time. And they see that Karpov was rated near 2700 most of the 70ties...Erroneous conclusions are easy to draw.
I didn't add error bars, beause I did not want it to look too complicated. It is after all just an amateur research, a hobby or so. My knowledge on mathematical statistics is of beginner level. No doubt this affects the clarity of what I want to impart on my paper. I must say that I never have thoroughly read your papers on chess, just dipped into them, because the math is too difficult to understand.
Linear trend has the best correlation, that's why I chose it. BTW, if I removed data from 19701974, the linear trend would indicate 14.4 elo inflation per decade. Perhaps that is even more credible?
I tried to correct for varying time controls and adjournments, though I dealt with the the latter a bit arbitrarily; just adder 1 hr thinking time to each game adjourned. See ELO + time bars on graph 19.
I didn't add error bars, beause I did not want it to look too complicated. It is after all just an amateur research, a hobby or so. My knowledge on mathematical statistics is of beginner level. No doubt this affects the clarity of what I want to impart on my paper. I must say that I never have thoroughly read your papers on chess, just dipped into them, because the math is too difficult to understand.
Linear trend has the best correlation, that's why I chose it. BTW, if I removed data from 19701974, the linear trend would indicate 14.4 elo inflation per decade. Perhaps that is even more credible?
I tried to correct for varying time controls and adjournments, though I dealt with the the latter a bit arbitrarily; just adder 1 hr thinking time to each game adjourned. See ELO + time bars on graph 19.