Questions regarding rating systems of humans and engines

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Ozymandias
Posts: 1535
Joined: Sun Oct 25, 2009 2:30 am

Re: Questions regarding rating systems of humans and engines

Post by Ozymandias »

If I read your tables correctly, the weakest engine on CCRL 40/40 (Ziguurat 0.22) should have a FIDE rating in excess of 2400. After playing a few blitz games against it, I can assure you that's not the case (2000, best case).
PK
Posts: 893
Joined: Mon Jan 15, 2007 11:23 am
Location: Warsza

Re: Questions regarding rating systems of humans and engines

Post by PK »

As Juan Molina said, Your method can be faulty empirically. A 1900 Elo player should under normal conditions beat a 1600 CCRL engine rather convincingly without resorting to anti-computer strategies. These little things aren't exactly tactical monsters. They don't hang pieces, which is already a lot, since 1900 Elo is far from blunder-free, but it should not bring them to human 2300. On the other hand, they miss lots of important knowledge (reading TSCP eval function can learn a lot about these holes).

Speaking of TSCP: it has 1704 Elo on CCRL 40/4 list. Norman Blais has engineered a version with null move in 2003, which should bring another 100 Elo or so. How does it play?

[pgn]
[Event "Computer chess game"]
[Site "home"]
[Date "2014.12.08"]
[Round "?"]
[White "Pawel"]
[Black "Tscp181null"]
[Result "1-0"]
[BlackElo "2200"]
[ECO "D11"]
[Opening "Slav"]
[Time "12:37:43"]
[Variation "3.Nf3 dxc4 4.e3 b5 5.a4 e6"]
[WhiteElo "2400"]
[TimeControl "900"]
[Termination "normal"]
[PlyCount "81"]
[WhiteType "human"]
[BlackType "program"]

1. d4 d5 {(d7d5 e2e3 g8f6 g1f3 e7e6 b1c3 b8c6 f1d3) -0.25/8 44} 2. c4 dxc4
{(d5c4 e2e3 e7e5 f1c4 e5d4 d1d4 b8c6 d4d8 c6d8) +0.05/8 42} 3. Nf3 Nf6
{(g8f6 d1a4 b8c6 e2e3 c8d7 a4c4 d7e6 c4d3) +0.22/7 40} 4. e3 b5 {(b7b5 b1c3
c8d7 f1e2 e7e6 e1g1 b5b4 c3a4) +0.11/8 38} 5. a4 c6 {(c7c6 b1c3 c8d7 f1e2
e7e6 e1g1 b5b4 c3a2) +0.20/8 36} 6. axb5 cxb5 {(c6b5 b1c3 c8d7 f1e2 e7e6
e1g1 b5b4 c3a4) +0.39/8 34} 7. b3 a5 {(a7a5 b3c4 b5b4 f1d3 e7e6 e1g1 b8c6)
+0.26/7 33} 8. bxc4 b4 {(b5b4 b1d2 c8g4 f1d3 e7e6 e1g1 b8c6 c1b2 g4f3 d2f3)
0.00/8 31} 9. Bd3 Bb7 {(c8b7 b1d2 e7e6 e1g1 b8c6 c1b2 f8d6) +0.21/7 29} 10.
O-O e6 {(e7e6 b1d2 b7f3 d2f3 b8c6 c4c5 f8e7) +0.17/7 28} 11. Re1 Nc6 {(b8c6
b1d2 f8e7 c1b2 e8g8 e3e4 a8c8) +0.48/7 26} 12. Bb2 Be7 {(f8e7 b1d2 e8g8
e3e4 a8c8 d4d5 c6a7) +0.37/7 25} 13. e4 O-O {(e8g8 e4e5 f6g4 d1c2 g7g6 b1d2
a8c8) +0.15/7 24} 14. d5 exd5 {(e6d5 e4d5 c6a7 d1a4 a7c8 a4c2 h7h6 b2f6
e7f6) -0.28/8 23} 15. exd5 Na7 {(c6a7 b2e5 a7c8 d1c2 h7h6 b1d2 f6d5 c4d5
b7d5) -0.30/7 21} 16. Nbd2 Nc8 {(a7c8 d1c2 h7h6 b2f6 e7f6 d3h7 g8h8 f3e5
b7d5 c4d5 d8d5) -0.32/7 20} 17. Ng5 Bxd5 {(b7d5 c4d5 h7h6 g5e4 f6d5 d1g4
e7g5) -0.25/7 19} 18. cxd5 h6 {(h7h6 g5f3 f6d5 d2c4 e7f6 b2f6 d5f6 a1c1)
-0.34/8 18} 19. Nge4 Nxd5 {(f6d5 d2c4 f8e8 b2d4 d5f4 d3c2 f4d5) -0.20/7 17}
20. Qg4 g6 {(g7g6 d3b5 c8b6 b5c6 a8c8 a1c1) -0.21/6 16} 21. Bc4 a4 {(a5a4
g4h3 h6h5 b2d4 f7f5 e4c5 e7c5 d4c5) -0.16/7 16} 22. Nf3 Ncb6 {(c8b6 c4d5
b6d5 b2d4 b4b3 g4h3 f7f5) +0.09/7 15} 23. Bxd5 Nxd5 {(b6d5 f3e5 a8a6 g4e2
d8a8 b2d4 f8e8) 0.00/7 14} 24. Ne5 h5 {(h6h5 g4d1 a4a3 e5c6 a3b2 a1a8 d8a8
d1d5) -0.03/6 13} 25. Qf3 a3 {(a4a3 b2d4 a3a2 e5c6 d8d7 c6e5 d7a4) +0.11/7
13} 26. Bd4 Rc8 {(a8c8 a1a2 h5h4 a2e2 e7d6 e5d3) +0.15/6 12} 27. Qb3 Rc7
{(c8c7 a1d1 d8a8 d4a1 f8d8 a1d4) +0.15/6 11} 28. Rad1 Qa8 {(d8a8 e5f3 f8e8
d4e5 c7d7 f3d4) +0.15/6 11} 29. Ba1 Rd8 {(f8d8 a1d4 e7d6 e5c4 d6f4 b3c2)
+0.25/6 10} 30. Qg3 h4 {(h5h4 g3f3 f7f5 e4d2 d8d6 a1d4) +0.27/6 10} 31. Qg4
Rc2 {(b4b3 g4f3 e7b4 e1e2 f7f5 e5g6 f5e4 f3b3) +0.61/6 9} 32. Nxg6 Bg5
{(e7g5 g6h4 b4b3 g4g5 g8f8 d1d3 a8a5) -4.58/6 9} 33. Nxh4 Rb2 {(c2b2 g4g5
g8f8 a1b2 a3b2 d1d2 f7f6 e4f6 d5f6 g5f6) -8.48/6 8} 34. Qxg5+ Kf8 {(g8f8
e4f6 d5f6 g5f6 a8g2 h4g2 d8d1 e1d1) -12.77/6 8} 35. Nf6 Nxf6 {(d5f6 g5f6
f8g8 d1d8 a8d8 f6d8 g8h7 a1b2 a3b2) -16.27/7 7} 36. Qxf6 Kg8 {(f8g8 d1d8
a8d8 f6d8 g8h7 d8f6 h7g8 e1e8 g8h7 a1b2 a3b2 f6b2) -18.17/7 7} 37. Rxd8+
Qxd8 {(a8d8 f6d8 g8g7 d8g5 g7h7 h4f5 b2f2 g5g7) -M4/8 4} 38. Qxd8+ Kh7
{(g8h7 d8f6 b4b3 f6f7 h7h8 e1e8) -M3/7 5} 39. Qg5 b3 {(b4b3 g5f6 h7g8 e1e8
g8h7 f6h8) -M3/6 0} 40. Nf5 Rc2 {(b2c2 g5g7) -M1/3} 41. Qg7# 1-0
[/pgn]
nimh
Posts: 46
Joined: Sun Nov 30, 2014 12:06 am

Re: Questions regarding rating systems of humans and engines

Post by nimh »

The fact that actual results may differ from the ideal prediction is already acknowledged in my study:

Page 17:
Note:
comparisons were made on the assumption that humans play against engines as they would against other humans; i. e.
not using any anti-computer strategies. Unfortunately there is not yet a reliable way to emulate anti-computer play and
its effects.
In your game you played a closed opening and already had built up a winning advantage before the position became open on move 18. In other words - a good example of anti-computer strategy.

To have a balanced view, why not playing several games as black in the Muzio gambit against the same engine, for example? :)
PK
Posts: 893
Joined: Mon Jan 15, 2007 11:23 am
Location: Warsza

Re: Questions regarding rating systems of humans and engines

Post by PK »

Muzio is not exactly to my taste, but I will try the black side of Evans Gambit and / or Danish Gambit.

Anyhow, here comes a hypothesis: average error might give skewed predictions at lower levels because of "the comfort zone". Human players, especially at the weaker levels, have relatively narrow sets of positions which they play markedly better than their Elo indicates. They try to play into that zone, just as I did in the given game. Your notion of complexity and player's reaction to it captures the part of that phenomenon (minus the fact that a human player is usually better in handling some subclass of complex position than in handling them in general). Anyhow, error rate is an average of player's performance across the positions both in and out the comfort zone. The difference increases as player's skill goes down, and the comfort zone grows smaller as well.
PK
Posts: 893
Joined: Mon Jan 15, 2007 11:23 am
Location: Warsza

Re: Questions regarding rating systems of humans and engines

Post by PK »

Evans goes to the machine, as it doesn't suffer from time trouble

[pgn]
[Event "Computer chess game"]
[Site "home"]
[Date "2014.12.08"]
[Round "?"]
[White "Tscp181null"]
[Black "Pawel"]
[Result "1-0"]
[BlackElo "2400"]
[ECO "C51"]
[Opening "Evans Gambit"]
[Time "16:57:52"]
[Variation "Cordel Variation"]
[WhiteElo "2200"]
[TimeControl "900"]
[Termination "unterminated"]
[PlyCount "141"]
[WhiteType "program"]
[BlackType "human"]

1. e4 e5 2. Nf3 Nc6 3. Bc4 Bc5 4. b4 Bxb4 5. c3 Be7 6. Qb3 {(d1b3 g8h6 d2d4
c6a5 b3a4 a5c4 a4c4 d7d5 e4d5) -0.38/8 44} Nh6 7. d4 {(d2d4 c6a5 b3b5 a5c4
c1h6 c4d6 b5e5 f7f6 e5h5) +0.08/9 42} Na5 8. Qb5 {(b3b5 a5c4 c1h6 c4d6 b5e5
f7f6 e5h5 g7g6 h5g4 f6f5) +0.11/9 40} Nxc4 9. Bxh6 {(c1h6 c4d6 b5e5 f7f6
e5h5 g7g6 h5g4 d6f7 h6f4 d7d5) -0.14/9 38} Nd6 10. Qxe5 {(b5e5 f7f6 e5h5
e8f8 h6f4 d6e4 f3h4 h8g8 h5h7 d7d5) -0.09/9 36} f6 11. Qh5+ {(e5h5 e8f8
e4e5 d6e8 h6f4 d7d6 e1g1 f8g8 b1d2 d6e5 d4e5 f6e5 f3e5) +0.12/8 34} Kf8 12.
e5 {(e4e5 d6e8 h6f4 d7d6 e1g1 d6e5 d4e5 f8g8) -0.13/8 33} Nf7 13. Be3
{(h6e3 f6e5 f3e5 f7e5 d4e5 d7d6 b1d2 f8g8) -0.12/8 31} d6 14. exf6 {(e5f6
e7f6 b1d2 f8g8 e1g1 d6d5 f3e5 c8e6 f1e1) -0.01/9 29} Bxf6 15. Nbd2 {(b1d2
f8g8 e1g1 d6d5 f3e5 c8e6 f1e1) -0.01/7 28} g6 16. Qd5 {(h5d5 c7c6 d5b3 f8g8
e1g1 d6d5 a1e1 f7d6) +0.26/8 26} c6 17. Qb3 {(d5b3 d6d5 e1g1 f8g8 f1e1 b7b6
c3c4 c8e6) +0.23/8 25} Qe7 18. Rb1 {(a1b1 d6d5 e1g1 f8g8 a2a4 f7d6 f1e1)
+0.38/7 24} b6 19. O-O {(b3a3 d6d5 a3e7 f8e7 e1g1 c8f5 b1e1) +0.37/6 23}
Bf5 20. Rb2 {(b1b2 f8g8 c3c4 e7b7 f1e1 d6d5 c4d5 c6d5) +0.30/7 21} Kg7 21.
Qa4 {(b3a4 f5d3 f1e1 e7b7 c3c4 g7g8) +0.51/6 20} Bd7 22. Re1 {(f1e1 c6c5
a4a3 d7e6 d2e4 h8e8) +0.93/6 19} c5 23. Qb3 {(a4b3 g7g8 e3f4 e7f8 d2e4 f6e7
c3c4 c5d4 f3d4) +0.71/7 18} Qd8 24. Qd5 {(b3d5 h8f8 d2e4 d7g4 e4f6 g4f3)
+0.86/6 17} Qc7 25. Ne4 {(d2e4 d7c6 d5e6 c6e4 e6e4 a8e8) +0.81/6 16} Bc6
26. Qe6 {(d5e6 c6e4 e6e4 a8e8 e4f4 c7d8 e3d2 d6d5) +0.50/8 16} Bxe4 27.
Qxe4 {(e6e4 a8e8 e4f4 c7d8 d4c5 d6c5 b2d2) +0.48/7 15} Rae8 28. Qf4 {(e4f4
c7d8 b2e2 g7g8 c3c4 a7a6 d4d5) +0.37/7 14} Re7 29. Rf1 {(b2e2 d6d5 f4c7
e7c7 e3f4 c7c8) +0.54/6 13} Rhe8 30. Rfb1 {(f1b1 c5d4 c3d4 e7e4 f4g3 g7g8)
+0.17/6 13} Re4 31. Qg3 {(f4g3 c5d4 c3d4 g7g8 b2b3 c7d7 f3d2 e4e7) +0.12/8
12} Qd7 32. Nd2 {(f3d2 e4g4 g3h3 g7g8 f2f3 c5d4 c3d4) +0.26/6 11} R4e7 33.
Qf3 {(g3f3 d6d5 b2b3 f6g5 d2f1 g5e3 f1e3) +0.28/6 11} Ng5 34. Bxg5 {(e3g5
f6g5 f3d5 g5d2 b2d2 e7e2 d2b2) +0.22/7 10} Bxg5 35. Nf1 {(d2f1 g5f6 f3d3
c5d4 c3d4 g7g8 f1e3) +0.11/7 10} Rf8 36. Qd5 {(f3d5 f8f5 d5b3 f5f7 f1d2
g7g8 d2c4) +0.18/7 9} Rf5 37. Qb3 {(d5b3 g7h8 f1g3 f5f4 b3d5 g5f6 g3e2)
+0.21/7 9} Bf4 38. g3 {(g2g3 f4g5 h2h4 g5h6 g3g4 f5f7) +0.25/6 8} Bh6 39.
f4 {(f2f4 g7h8 f1d2 h6g7 d4d5 e7e2) -0.04/6 8} Qc6 40. Nd2 {(f1d2 g7h8 b3b5
c6b5 b2b5 e7e2) -0.03/6 7} d5 41. Qa3 {(b3a3 g7g8 d4c5 b6c5 b2b5 e7c7 d2f3)
+0.07/7 7} Rff7 42. Nf3 {(d2f3 c5d4 c3d4 c6c4 b2b4 c4c2 f3e5) +0.45/7 7}
Qc7 43. Ne5 {(f3e5 f7f6 b2b5 c5d4 c3d4 c7d6 a3b2) +0.45/7 6} Rf5 44. dxc5
{(d4c5 c7c5 a3c5 b6c5 b1d1 g6g5 d1d5 g5f4) +0.83/7 6} bxc5 45. Rb5 {(b2b5
h6f4 g3f4 f5f4 e5d3 f4g4 g1h1 e7e2) +1.06/7 6} Rfxe5 46. fxe5 {(f4e5 h6e3
g1g2 c7e5 b5b7 e5e4 g2h3 e4f5 h3g2 f5f2) +1.12/7 5} Qxe5 47. Qb2 {(a3b2
h6e3 g1h1 d5d4 c3d4 e5e4 b2g2 e4g2 h1g2 c5d4) +0.48/6 5} Be3+ 48. Kh1
{(g1h1 d5d4 c3d4 e5e4 b2g2 e4g2 h1g2 c5d4 b5b7) +0.68/6 5} d4 49. c4 {(c3c4
e5e4 b2g2 e4g2 h1g2 d4d3 b5b7 e7b7 b1b7) +0.12/7 4} Rf7 50. Rb7 {(b2g2 d4d3
b5b7 d3d2 b7f7 g7f7 b1b7) +0.10/6 4} Rxb7 51. Qxb7+ {(b2b7 g7h6 h1g2 e5e6
b7b3 e6e4 g2h3 d4d3 g3g4) +0.45/7 4} Kh6 52. Kg2 {(h1g2 e5e6 b7b3 e6e4 g2h3
d4d3 g3g4) +0.45/6 4} Qf5 53. Rb2 {(b1b2 f5e6 b7b3 e6e4 g2h3 d4d3 a2a4)
+0.45/6 4} d3 54. Qe7 {(b7e7 e3g5 e7e8 d3d2 e8e2 f5d7 b2b1) +0.24/7 3} Bg5
55. Qe8 {(e7e8 d3d2 e8e2 f5d7 b2b1 d7e7 e2d1) +0.24/7 3} d2 56. Qe2 {(e8e2
f5d7 b2b1 d7c6 g2f2 c6f6 f2g1 f6e7 e2f2) +0.14/7 3} Qf6 57. Rb1 {(b2b1 h6g7
h2h4 f6e7 e2d1 g5e3) +0.12/6 3} Qc3 58. h4 {(h2h4 g5e3 g2f3 e3d4 f3g4 c3c2
b1d1) +0.31/6 3} Be3 59. g4 {(g3g4 c3d4 g4g5 h6g7 b1b7 g7f8 b7b8 f8e7 b8b7)
+0.32/6 2} Qd4 60. Kg3 {(g2g3 h6g7 a2a4 d4f4 g3h3 g7g8 b1b7) +0.18/6 2}
Qf4+ 61. Kh3 {(g3h3 f4f2 e2d1 f2f4 b1b3 a7a5 a2a4) +0.31/6 2} Qf2 62. Qd1
{(e2d1 f2f7 b1b3 e3f4 d1f1 h6g7) +0.22/6 2} Bf4 63. g5+ {(g4g5 h6g7 d1b3
f2e3 h3g4 e3b3 a2b3 f4e5) +0.08/6 2} Kg7 64. Qb3 {(d1b3 f4e5 b3b7 f2f7 a2a4
g7g8 b7c8) -0.11/6 2} Be5 65. Qb7+ {(b3b7 f2f7 a2a4 f7b7 b1b7 g7f8 b7d7)
-0.14/5 2} Qf7 66. Kg2 {(a2a4 f7b7 b1b7 g7f8 b7b1 f8e7 b1b7) -0.29/6 2}
Qxb7+ 67. Rxb7+ {(b1b7 g7f8 b7b1 f8f7 g2f3 f7e6 f3e2 e5c3 b1b7) -0.16/7 1}
Kf8 68. Rb1 {(b7b1 f8e8 g2f3 e5c3 b1d1 e8e7 f3e3 e7e6 d1d2 c3d2 e3d2)
-0.04/8 1} Ke8 69. Kf3 {(g2f3 e5c3 f3e2 c3b4 b1b4 d2d1q e2d1 c5b4) +0.34/7
1} Bc3 70. Ke2 {(f3e2 c3b4 a2a3 b4a3 e2d2 a7a5 b1b7 a3b4) +1.19/8 1} Kd7
71. Rb7+ {(b1b7 d7d6 b7a7 d6e6 a7h7 e6e5 h4h5 g6h5 h7h5) +1.94/8 1} 1-0

[/pgn]
PK
Posts: 893
Joined: Mon Jan 15, 2007 11:23 am
Location: Warsza

Re: Questions regarding rating systems of humans and engines

Post by PK »

Open Sicilian (from TSCP's book) - engine eats the poisoned pawn and chokes on it. This is not 2300, not even 1700. Comparing with the previous game, comfort zone theory seems ho hold.

[pgn]
[Event "Computer chess game"]
[Site "home"]
[Date "2014.12.08"]
[Round "?"]
[White "Pawel"]
[Black "Tscp181null"]
[Result "1-0"]
[BlackElo "2200"]
[ECO "B57"]
[Opening "Sicilian"]
[Time "17:35:03"]
[Variation "Sozin, Benko Variation, 6.Bc4 Qb6"]
[WhiteElo "1900"]
[TimeControl "900"]
[Termination "unterminated"]
[PlyCount "108"]
[WhiteType "human"]
[BlackType "program"]

1. e4 c5 2. Nf3 Nc6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 d6 6. Bc4 Nxd4 {(c6d4
d1d4 e7e5 d4d3 f8e7 c1e3 e8g8 e1c1) -0.51/8 44} 7. Qxd4 e5 {(e7e5 d4d3 f8e7
c1e3 e8g8 e1c1 c8d7) -0.31/7 42} 8. Qd3 Qb6 {(c8e6 c4e6 f7e6 d3b5 d8d7 b5d7
e8d7 c1e3 d7c8) -0.41/7 40} 9. Bg5 Qxb2 {(b6b2 a1b1 b2a3 g5f6 g7f6 c4b5
e8e7 e1g1) -0.22/7 38} 10. Rb1 Qa3 {(b2a3 g5f6 g7f6 c4d5 h8g8 e1g1 a8b8
f1d1) -0.16/8 36} 11. Bxf6 gxf6 {(g7f6 c3d5 a3a5 d3d2 a5d2 e1d2 e8d8 d5f6
f8e7 f6d5) -0.51/8 34} 12. O-O Be7 {(f8e7 b1b3 a3c5 f1b1 b7b6 d3d5 c5d5
e4d5) -0.15/7 33} 13. Rb3 Qc5 {(a3c5 c3d5 a7a6 d5e7 e8e7 c4d5 a8b8) -0.09/7
31} 14. Nd5 Qa5 {(c5a5 b3a3 a5c5 d3b3 e7d8 c4b5 e8f8 f1d1) -0.37/7 29} 15.
Qf3 Qd8 {(a5d8 c4b5 e8f8 f3e3 f8g8 d5e7 d8e7 e3g3 g8f8 f1d1) -0.19/7 28}
16. Bb5+ Kf8 {(e8f8 f3e3 f8g8 b3a3 f6f5 e4f5 c8f5 a3a7 a8a7 e3a7 f5c2 a7b7)
-0.15/7 26} 17. Qh5 h6 {(h7h6 b5c4 h8h7 f1b1 f8g8 b3g3 g8h8 g3b3) -0.17/7
25} 18. Bc4 Rh7 {(h8h7 f1b1 f8e8 c4b5 e8f8 b3g3 c8e6 g3b3) -0.19/7 24} 19.
Rfb1 Kg8 {(f8g8 b3g3 g8h8 d5c3 d8e8 c4d5 a8b8 g3d3) -0.17/7 23} 20. f4 b6
{(b7b6 b3g3 g8h8 f4f5 c8b7 d5c3 d8e8 g3d3) +0.08/7 21} 21. Rg3+ Kh8 {(g8h8
c2c3 c8d7 g3d3 a8c8 c4b5 d7e6) +0.16/6 20} 22. f5 Bb7 {(c8b7 c4b3 a8c8 c2c4
b7c6 a2a4 a7a6 g3d3) +0.24/8 19} 23. Qe2 Bxd5 {(b7d5 c4d5 a8c8 c2c4 d8c7
e2f2 c7c5 g3e3) +0.25/8 18} 24. Bxd5 Rc8 {(a8c8 c2c3 d8c7 g3e3 c7c5 b1b5
c5a3 c3c4) +0.25/8 17} 25. c4 Qc7 {(d8c7 g3d3 c7c5 g1h1 c8g8 b1b5 c5c8 h1g1
h7g7) +0.25/8 16} 26. Qh5 Qc5+ {(c7c5 g1f1 c5d4 h5e2 c8g8 g3d3 d4c5 f1e1
h7g7) +0.35/8 16} 27. Kh1 Qf2 {(c5f2 g3a3 c8c7 h5f3 f2f3 a3f3 h7g7 f3b3)
+0.42/8 15} 28. Bxf7 Qxa2 {(f2a2 g3b3 a2c2 f7d5 c2f2 b1b2 f2e1) +0.43/7 14}
29. Rf1 a5 {(a7a5 h1g1 a2d2 f1d1 d2b4 f7e6 c8c7) +0.87/7 13} 30. Be6 Rb8
{(c8b8 h5d1 a2b2 e6d5 b2d4 g3d3 d4b2) +0.85/7 13} 31. Rg6 Qd2 {(a2d2 f1d1
d2f4 h5e2 f4h4 g6g4 h4h5 h1g1) +0.77/8 12} 32. Qg4 Qg5 {(d2g5 g6g5 h6g5
f1a1 h7h4 g4e2 g5g4) -2.76/7 11} 33. Rxg5 hxg5 {(h6g5 g4e2 h7h4 f1a1 b8a8
e6d5 a8c8 a1b1 e7d8) -2.85/9 11} 34. Qd1 Rh4 {(h7h4 h1g1 h8g7 d1c2 b8h8
h2h3 g5g4 f1b1) -2.85/8 10} 35. Bd5 Bd8 {(e7d8 h1g1 h4h6 d1c2 b8c8 f1d1
b6b5) -2.83/7 10} 36. g3 Rh7 {(h4h7 h1g1 d8c7 d1g4 b6b5 c4b5 b8b5 g4e2
c7b6) -2.86/9 9} 37. Kg2 Bc7 {(d8c7 d1g4 b8c8 d5e6 c8d8 h2h4 g5h4 g3h4)
-2.85/8 9} 38. Qa4 Rc8 {(b8c8 f1d1 g5g4 g2g1 c8d8 d5e6 h7h6 e6d5) -2.79/8
8} 39. Rh1 g4 {(g5g4 g2g1 h7g7 h2h4 g4h3 h1h3 g7h7 h3h4 c8d8) -2.82/8 8}
40. h3 Rd8 {(c8d8 a4c6 h7e7 h3g4 h8g7 g4g5 f6g5 h1h5 g5g4 g2g1) -4.11/9 7}
41. hxg4 Rxh1 {(h7h1 g2h1 b6b5 c4b5 h8g7 h1g2 g7h6 a4c4 d8c8 b5b6) -5.35/10
7} 42. Kxh1 b5 {(b6b5 c4b5 h8g7 a4c4 d8c8 g4g5 f6g5 b5b6 c8h8) -5.30/9 7}
43. cxb5 Kg7 {(h8g7 a4c4 d8h8 h1g2 c7d8 c4c6 g7h6 c6d6 h6g5 g2f3) -5.95/9
6} 44. Qc4 Rh8+ {(d8h8 h1g2 c7d8 g4g5 f6g5 c4c8 g7f6 c8e6 f6g7 e6g6 g7f8
g6d6) -6.30/9 6} 45. Kg2 Bb8 {(c7b8 g2f3 h8h7 b5b6 h7h2 c4a4 h2b2 a4a5
b2c2) -6.77/9 6} 46. b6 Kh7 {(g7h7 g2f3 h7h6 c4a2 h8c8 a2a5 c8c5 a5a4)
-6.97/8 5} 47. Qb5 Rf8 {(h7h6 b5a5 h6g5 g2f3 h8h2 a5c3 h2h3 b6b7) -6.99/8
5} 48. Qxa5 Kh8 {(h7g7 a5d2 f8c8 g4g5 c8h8 g5g6 h8c8 b6b7) -8.11/8 5} 49.
Qb5 Rd8 {(f8d8 g4g5 f6g5 b5e2 h8g7 e2h5 d8c8 h5g5 g7f8 f5f6) -8.29/9 4} 50.
Qc6 Rf8 {(d8f8 g2f3 f8d8 c6b7 d8e8 b7f7 e8c8 f7f6 h8h7 g4g5) -10.02/9 4}
51. Qb7 Rg8 {(f8g8 d5g8 h8g8 b7b8 g8g7 b8d6 g7f7 b6b7 f7g7 b7b8q g7f7)
-20.39/10 4} 52. Qf7 Rxg4 {(g8g4 f7f6 h8h7 g2f3 g4g3 f3g3 b8a7 f6g6 h7h8
g6g8) -M5/9 3} 53. Qh5+ Kg7 {(h8g7 h5g4 g7f8 g4h5 f8e7 h5f7 e7d8 f7f6 d8d7
f6f7 d7d8 f7b7 d8e8 b7b8) -16.01/8 4} 54. Qxg4+ Kf8 {(g7f8 g4g8 f8e7 g8f7
e7d8 d5c6 b8a7 f7e8) -M4/8 2} 1-0

[/pgn]
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: Questions regarding rating systems of humans and engines

Post by carldaman »

nimh wrote:The fact that actual results may differ from the ideal prediction is already acknowledged in my study:

Page 17:
Note:
comparisons were made on the assumption that humans play against engines as they would against other humans; i. e.
not using any anti-computer strategies. Unfortunately there is not yet a reliable way to emulate anti-computer play and
its effects.
In your game you played a closed opening and already had built up a winning advantage before the position became open on move 18. In other words - a good example of anti-computer strategy.

To have a balanced view, why not playing several games as black in the Muzio gambit against the same engine, for example? :)
In my experience, I do well to only draw against engines like Mint (1700+ CCRL), even though I'm close to 400 points above it on the human scale.
I'm naturally a speculative player and always make-believe I'm playing another human, taking risks rather than employing anti-computer strategies (unless the specific opening played is in itself conducive to that).

My performance against such engines suggests Erik's results are valid. I think the underlying assumption for this whole experiment should be that the human should NOT know that (s)he's playing an engine. In that case, we should expect similar results for most players, confirming Erik's conclusions on the whole.

CL
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Questions regarding rating systems of humans and engines

Post by Uri Blass »

Looking at the games that you won tscp suffered from too short opening book in the second game and no book in the first game.

Humans know that you do not try to defend c4 pawn in the queen gambit and in the game of the Open Sicilian 6...Nxd4(first move not in book) is first mistake and humans know that you do not play like that.

You cannot have a book to cover everything but I think that positions that humans already played thousands of time should be in book and chesstempo suggest that humans got the position after Bc4 10448 times and played Nxd4 only in 11 games out of them(it is 1 out of 6818 when I look at games when both players had rating above 2200).
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Questions regarding rating systems of humans and engines

Post by lkaufman »

nimh wrote:Perhaps against humans who employ anti-computer strategy gains would be much more than indicated in that study, but unfortunately I know no methods how to take that into account. That may explain differences in our conclusions.

Could I have a link to your studies?

How can a method be faulty, if it compares the relationship between ratings and accuracy in both systems, and determining the accuracy of
play is rigorously conducted in the same manner for all games?
I've now taken the time to read your paper in full, and I do feel that it is a real contribution to the field. I find your historical comparisons of players to be pretty believable and your methodology in this regard to be reasonable. There are some things I would do differently, for example instead of using raw score differences in centipawns I would convert them to expected scores and use the differences in expected scores, but this would probably not change your broad conclusions, it would just improve accuracy. I think your estimates of how the top players from past eras would rate on today's elo scale are pretty reasonable, and while I was of the same opinion as Ken Regan that there has been no real inflation of FIDE ratings in terms of true strength, I wouldn't argue with your estimate of 5 elo points per decade; I doubt that either your methodology or his can be more accurate than that.
Where I strongly disagree is with your comparison of FIDE and CCRL ratings. There are huge problems with your methodology here. First of all, while I find the curve fitting the CCRL data to be more or less reasonable, the curve you fit to the FIDE data is quite absurd. First of all, it seems that you drew conclusions about the shape of the curve from a mere handful of data points. Second, since your data is based on scoring by an obsolete engine (Rybka 3) that was only around 3000 FIDE elo strength (in my opinion, as co-author and having run many handicap matches vs. GMs), it is simply not strong enough to use for measuring errors of players approaching that level, as you acknowledge earlier in the article. So a curve that "projects" an error rate dropping to zero at 2950 or so is simply wrong; I think this is called "overfitting". Furthermore, as an experienced GM and chess teacher, I can say with certainty that the error rate for human players climbs with declining rating as in the exponential curve shown for engines. If you have ever watched the games of kids with ratings below 1000 (OK, these are USCF ratings but I'm sure the same would be true if FIDE rated beginners) they blunder pieces left and right; I can normally give queen odds to players with ratings around 1000, who themselves can give queen odds to players rated around 500 or so. Maybe the curve is not exactly like for engines, but certainly it slopes upward to the right like the engine curve does. I don't think it is possible to prove this with just a handful of data points, but it should be pretty obviously true. The weaker a player is, engine or human, the bigger the mistakes they make and the greater the chance for an upset when a mistake does occur.
If you force-fit an exponential curve to the human data, I think you can determine a reasonably valid FIDE to CCRL formula.
My own study was never published, but basically using data from SSDF over many years and comparing results of engines that obtained ratings in human events with their SSDF ratings, I concluded that 400 elo on SSDF scale is about 300 on FIDE scale. Probably CCRL ratings are a bit more stretched than SSDF due to shorter time limit, so perhaps in this case a ratio of 300 CCRL points = 200 FIDE points is a better estimate.
Finally, as an estimate of the rating of a recent SF version under tournament conditions, we have only to look at the 3 to 1 victory over near-2800 rated Nakamura despite the huge handicaps of pawn in two games and the use of Rybka 3 in the other two. Surely this shows that the rating of the top two engines on the CCRL 40/40 list (Komodo followed by Stockfish) should be far above 3000 on the FIDE scale, perhaps 3200 or more, depending in the number of cores. After all 75% vs. 2800 is already a 3000 rating, without even considering the huge handicaps. I know this is a tiny sample, but the many handicap matches I played with Rybka 3 vs. GMs clearly implied a human rating of around 3000, Stockfish proved strong enough to beat the combination of human 2800 plus Rybka 3 in that match.
Despite the above issue, thanks for your excellent contribution to our understanding of how human players have improved over the years. Congratulations for a great article! Maybe you can update it using the next version of Komodo or Stockfish for the evaluations.
Komodo rules!
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Questions regarding rating systems of humans and engines

Post by Uri Blass »

Note that in comp-comp games the computers do not play for a draw
and top humans earn money from beating other humans so they have no reason to prepare for themselves an opening repertoire that help them to draw with white because this opening repertoire is not going to help them against other humans who are not stronger than them.

It is possible that with the right preparation some human can usually get draws in his games with white against computers simply because chess is not complex enough to win against him with black(assuming the human does not try to win) and it may be interesting if somebody offer some open challenge with big prize money for humans(at least 1 million dollar) when the target is simply to get at least 2.5 out of 6 against computers
at 90+30 time control when the human get the white pieces in all games(of course the computers are free to use opening book).

It is possible that top humans with the right preparation can do it not because computers are not strong enough but because chess is not complex enough.