Stockfish evaluation elo rating

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Stockfish evaluation elo rating

Post by Michel »

What is the play level of "easy" mode?
You can just try it out....

I am a quite weak chess player and I can win from it if I pay some attention. Still the engine is not making obvious blunders like dropping pieces or such. It seems to play quite reasonable.
How do you expect that to work?
I don't know.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Stockfish evaluation elo rating

Post by Daniel Shawul »

Michel wrote:
What is the play level of "easy" mode?
You can just try it out....

I am a quite weak chess player and I can win from it if I pay some attention. Still the engine is not making obvious blunders like dropping pieces or such. It seems to play quite reasonable.
But it just did with a massive hardware like TCEC's 43-core in a super hard mode. The tactics coming from a neural network can maybe identify a queisece level stuff. Maybe it can identify general rules like not to leave your pieces hanging, or move your attacked peices to safety, but do you expect it solve precise tactics? You are on a sinking ship Michel :)
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Stockfish evaluation elo rating

Post by Milos »

Michel wrote:
But even LC0 doesn't just evaluate a non-quiescent position and use that eval to choose moves. My understanding is it does a form of Monte Carlo tree search and that does not prematurely terminate at a non-quiescent position.
I don't think so since that would mean using domain knowledge.

Easy mode on play.lczero.org means doing a _single_ playout per move. So in fact it is simply picking the move suggested by its policy head which is part of the NN.
Just try playing few matches against SFdev depth 1 and you'll see how crappy LC0 NN policy and value networks are.
And then please come back once it actually manages to beat SF depth 1 search in a match. Just to give you a heads up, this won't happen any time soon unless the size of NN is increased ;).
fierz
Posts: 72
Joined: Mon Mar 07, 2016 4:41 pm
Location: Zürich, Switzerland

Re: Stockfish evaluation elo rating

Post by fierz »

jdart wrote:
So in fact it is simply picking the move suggested by its policy head which is part of the NN
How do you expect that to work? What is the play level of "easy" mode?

--Jon
It's actually surprisingly strong. I'm a semi-retired FM at chess, and I managed to lose a couple of games on the easy level (I also did win some). I don't know what exactly it is doing on that easy level, but it is far better than what I would have expected (I played against ID180).

Unfortunately, there seems to be no possibility to export games, or to see what L0 was doing/thinking

best regards
Martin
sovaz1997
Posts: 261
Joined: Sun Nov 13, 2016 10:37 am

Re: Stockfish evaluation elo rating

Post by sovaz1997 »

"In easy mode, Leela is not doing any calculations and is playing purely by insight, just taking the board and evaluating what looks interesting."
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish evaluation elo rating

Post by Uri Blass »

Milos wrote:
Michel wrote:
But even LC0 doesn't just evaluate a non-quiescent position and use that eval to choose moves. My understanding is it does a form of Monte Carlo tree search and that does not prematurely terminate at a non-quiescent position.
I don't think so since that would mean using domain knowledge.

Easy mode on play.lczero.org means doing a _single_ playout per move. So in fact it is simply picking the move suggested by its policy head which is part of the NN.
Just try playing few matches against SFdev depth 1 and you'll see how crappy LC0 NN policy and value networks are.
And then please come back once it actually manages to beat SF depth 1 search in a match. Just to give you a heads up, this won't happen any time soon unless the size of NN is increased ;).
I can confirm that SF detph 1 won 1.5-0.5 in a match that I did manually.

[pgn][Event "PGN Import"]
[Site "?"]
[Date "?"]
[Round "?"]
[White "Stockfish"]
[Black "LCzero181"]
[Result "*"]
[ECO "B21"]
[Opening "Sicilian"]
[Variation "Smith-Morra, 1.e4 c5 2.d4 cxd4"]
[Termination "unterminated"]
[PlyCount "148"]
[WhiteType "human"]
[BlackType "human"]

1. e4 c5 2. d4 cxd4 3. Qxd4 Nc6 4. Qc3 Nf6 5. e5 Nd5 6. Qd2 e6 7. Nf3 d6 8.
c4 Nb6 9. Nc3 d5 10. cxd5 Nxd5 11. Bb5 Be7 12. O-O O-O 13. Bd3 f6 14. Nxd5
exd5 15. exf6 Bxf6 16. Qc2 Bg4 17. Bxh7+ Kh8 18. h3 Bxf3 19. gxf3 Nd4 20.
Qd3 Qd7 21. Kg2 Rae8 22. Be3 Nxf3 23. Bc5 Nh4+ 24. Kh1 Rf7 25. Rab1 g6 26.
Bxg6 Nxg6 27. Rg1 Nf4 28. Qf3 Qxh3+ 29. Qxh3+ Nxh3 30. Rg4 Rh7 31. Bxa7
Nxf2+ 32. Kg2 Nxg4 33. Rd1 Re2+ 34. Kg3 Rg7 35. Rh1+ Nh6+ 36. Kf3 Rgg2 37.
Rxh6+ Kg7 38. Rh3 Ref2+ 39. Bxf2 Rxf2+ 40. Kxf2 Bxb2 41. Ke3 Kf6 42. Rh2
Be5 43. Rh7 b5 44. Rh5 b4 45. Kd3 Ke6 46. Rh6+ Bf6 47. Rg6 Kf5 48. Rh6 Ke5
49. Rh5+ Kd6 50. Rh6 Ke5 51. Rg6 Kf5 52. Rg8 Ke6 53. Rg2 Kd6 54. Rg6 Ke6
55. Kc2 Kf5 56. Rh6 Kg5 57. Rh1 Bc3 58. a4 bxa3 59. Kxc3 a2 60. Kd4 Kf4 61.
Kxd5 Ke3 62. Kc4 Kd2 63. Rh2+ Ke3 64. Rxa2 Ke4 65. Ra1 Ke5 66. Rb1 Kd6 67.
Ra1 Ke5 68. Rb1 Kf5 69. Kd5 Kf4 70. Ra1 Ke3 71. Rb1 Kd3 72. Ra1 Ke3 73. Rb1
Kd3 74. Ra1 Ke3 *
[/pgn]

[pgn][Event "PGN Import"]
[Site "?"]
[Date "?"]
[Round "?"]
[White "LCzero181"]
[Black "Stockfish"]
[Result "*"]
[ECO "C00"]
[Opening "French"]
[Variation "2.d4"]
[Termination "unterminated"]
[PlyCount "112"]
[WhiteType "human"]
[BlackType "human"]

1. e4 e6 2. d4 Qh4 3. Nc3 Bb4 4. Bd3 Nc6 5. Nf3 Qg4 6. h3 Qxg2 7. Rg1 Qxh3
8. Rg3 Qh1+ 9. Rg1 Qh5 10. d5 exd5 11. exd5 Nce7 12. Qe2 Nf6 13. Bg5 Nxd5
14. Bd2 Bxc3 15. bxc3 g6 16. c4 Nb6 17. Bc3 Rg8 18. Bf6 Qc5 19. Rg5 d5 20.
Re5 Be6 21. Ng5 Nd7 22. Nxe6 fxe6 23. Rxe6 Nxf6 24. Rxf6 Qd4 25. Re6 Qxa1+
26. Kd2 Rg7 27. cxd5 Qxa2 28. Bb5+ Kf8 29. Qf3+ Nf5 30. Rf6+ Kg8 31. Bd3
Nd4 32. Qf4 Qxd5 33. Be4 Qb5 34. Bd3 Qa5+ 35. Kd1 Qb4 36. Bc4+ Qxc4 37. Qe4
Rd8 38. Kc1 Ne2+ 39. Qxe2 Qxe2 40. Kb2 Qe5+ 41. Kb3 Qxf6 42. Ka2 Qxf2 43.
Kb3 Qxc2+ 44. Kxc2 Kf7 45. Kc1 Kf6 46. Kc2 Ke5 47. Kc3 Rd4 48. Kb2 g5 49.
Kc3 g4 50. Kb3 Kd5 51. Kc3 g3 52. Kb3 Ke5 53. Kc3 g2 54. Kb3 g1=Q 55. Kc3
Rg2 56. Kb3 Qe3# *
[/pgn]
fierz
Posts: 72
Joined: Mon Mar 07, 2016 4:41 pm
Location: Zürich, Switzerland

Re: Stockfish evaluation elo rating

Post by fierz »

Those two games actually show nicely how "unhuman" SFs moves are (2...Qh4 and also Qxd4-c3 in the white game). When playing L0 on easy level, I was really impressed by the moves that it chose, which look much better than 2....Qh4!
jorose
Posts: 358
Joined: Thu Jan 22, 2015 3:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: Stockfish evaluation elo rating

Post by jorose »

I dont think that it is fair to look that early in the game. Stockfish has no prior knowledge of that position whereas LC0 has spent training iterations on that exact position.

In general I am rather skeptical about any of these super low depth comparisons. It seems like a very arbitrary thing to compare and it is unclear to me what information you could hope to gain out of it.
fierz
Posts: 72
Joined: Mon Mar 07, 2016 4:41 pm
Location: Zürich, Switzerland

Re: Stockfish evaluation elo rating

Post by fierz »

Perhaps it's true that the training of the NN is biased because the starting position + early positions occur more often in the games; and perhaps that is a huge part of the reason that alphazero won against stockfish (because its NN is basically an opening book)?

The super-low-depth comparisons are very interesting though. I remember that Ed Schröder once made the comment that as engine developer you should play your engine at depth 0 or 1, because that basically shows you what the move ordering is doing or what your evaluation is doing, so simply by playing a couple of games you can see if your evaluation has big holes. The problem is that a good search on top of a bad or buggy evaluation will still likely produce GM level performance, so you never notice.

In the same sense it's interesting to see what the NN does when deprived of its search. In the game as black, LC0 was totally winnng but blundered by giving a rook check next to the king (Rf2+) on a square that was protected. It certainly makes a lot of sense to check this move if you have a search available (which would resolve that it's a bad move), but it also shows that the NN of LC0 at least at the moment can make huge tactical blunders; and that's what this discussion is partly about - can the NN resolve shallow tactics or not.