Example of RL in action for programmers

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Example of RL in action for programmers

Post by Michael Sherwin »

I trained RomiChess 2000 games using the Hert500.pgn test suite. Then another 2000 games using the Noomen 3-move Testsuite.pgn. Then I logged onto HGM's chess server and challenged rpiStockfish two games. I know nothing about rpiStockfish except it says, "This is Stockfish 030914". It is rated 1738. By comparison Floyd is rated 1593.

[pgn][Event "ICS rated blitz match"]
[Site "winboard.nl"]
[Date "2017.12.20"]
[Round "-"]
[White "RomiChess"]
[Black "rpiStockfish"]
[Result "1/2-1/2"]
[BlackElo "1738"]
[ECO "A55"]
[Opening "Old Indian"]
[Variation "5.e4 c6 6.Be2"]
[WhiteElo "1531"]
[TimeControl "300+1"]
[Termination "normal"]
[PlyCount "103"]
[WhiteType "human"]
[BlackType "human"]

1. e4 e5 2. Nf3 Nf6 3. c4 Nc6 4. Nc3 Bc5 5. Nxe5 Nxe5 6. d4 Bxd4 7. Qxd4 d6
8. Bg5 O-O 9. O-O-O Bg4 10. f3 Nc6 11. Bxf6 gxf6 12. Qf2 Bd7 13. c5 dxc5
14. Qxc5 a6 15. f4 Kh8 16. Rd2 Rc8 17. Be2 Nb8 18. Bg4 b6 19. Qa3 Qe8 20.
Bxd7 Nxd7 21. Qxa6 Qe6 22. Qd3 Nc5 23. Qd4 Rcd8 24. Qxd8 Rxd8 25. Rxd8+ Kg7
26. f5 Qe5 27. Kc2 c6 28. Rd2 b5 29. a3 Na4 30. Re2 Qc5 31. Ra1 h5 32. g3
Kh7 33. h4 Kh6 34. Rd1 Nxc3 35. bxc3 Qxa3 36. Rd3 c5 37. Rde3 c4 38. Kd1
Qa1+ 39. Kd2 Qb1 40. Rf3 Qa1 41. Ree3 Kg7 42. Ke2 Qb2+ 43. Ke1 Qh2 44. Rf2
Qh1+ 45. Rf1 Qg2 46. Rff3 Qg1+ 47. Ke2 Qg2+ 48. Rf2 Qg1 49. Rf1 Qg2+ 50.
Rf2 Qg1 51. Rf1 Qg2+ 52. Rf2 1/2-1/2[/pgn]

[pgn][Event "ICS rated blitz match"]
[Site "winboard.nl"]
[Date "2017.12.20"]
[Round "-"]
[White "rpiStockfish"]
[Black "RomiChess"]
[Result "1/2-1/2"]
[BlackElo "1536"]
[ECO "B51"]
[Opening "Sicilian"]
[Variation "3.Bb5+ Nd7 4.O-O Nf6"]
[WhiteElo "1736"]
[TimeControl "300+1"]
[Termination "normal"]
[PlyCount "123"]
[WhiteType "human"]
[BlackType "human"]

1. e4 c5 2. Nf3 d6 3. Bb5+ Bd7 4. Nc3 Bxb5 5. Nxb5 e5 6. d3 Nf6 7. Bg5 Qb6
8. a4 Nbd7 9. Nd2 Be7 10. O-O Qc6 11. Nc4 h6 12. Bxf6 Nxf6 13. Nc3 Rb8 14.
b3 b6 15. Ne3 Qd7 16. Qf3 Rb7 17. Ncd5 O-O 18. Nxe7+ Qxe7 19. Nf5 Qe6 20.
Qg3 Ne8 21. a5 b5 22. f4 a6 23. h3 b4 24. Kh2 Rb8 25. Qh4 Rb7 26. fxe5
Qxe5+ 27. Qf4 Rd7 28. g3 Kh8 29. Rad1 Nc7 30. d4 Qxf4 31. gxf4 Rfd8 32. Kg3
Nb5 33. dxc5 dxc5 34. Rxd7 Rxd7 35. e5 Kh7 36. Ne3 Rd2 37. Rf2 Rxf2 38.
Kxf2 Nd4 39. Kg3 Kg6 40. Kg4 h5+ 41. Kg3 f6 42. Kh4 fxe5 43. fxe5 Kh6 44.
Kg3 g5 45. Nd5 Nxc2 46. e6 Kg6 47. Nc7 Nd4 48. e7 Nf5+ 49. Kf3 Nxe7 50.
Nxa6 Nc6 51. Nxc5 Nxa5 52. Ke4 h4 53. Nd3 Nxb3 54. Nxb4 g4 55. hxg4 Kg5 56.
Nd5 Nd2+ 57. Ke3 h3 58. Kf2 Kxg4 59. Ne3+ Kh4 60. Nf5+ Kg5 61. Kg3 Ne4+ 62.
Kxh3 {Insufficient material} 1/2-1/2[/pgn]
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
giovanni
Posts: 142
Joined: Wed Jul 08, 2015 12:30 pm

Re: Example of RL in action for programmers

Post by giovanni »

Hi Mike. Could you tell us a little bit more about the learning phase. Were they speed games? Also, is the learn.dat available to outside users? In positive case, I would love to get a copy.

Michael Sherwin wrote:I trained RomiChess 2000 games using the Hert500.pgn test suite. Then another 2000 games using the Noomen 3-move Testsuite.pgn. Then I logged onto HGM's chess server and challenged rpiStockfish two games. I know nothing about rpiStockfish except it says, "This is Stockfish 030914". It is rated 1738. By compariso. Floyd is rated 1593.

[pgn][Event "ICS rated blitz match"]
[Site "winboard.nl"]
[Date "2017.12.20"]
[Round "-"]
[White "RomiChess"]
[Black "rpiStockfish"]
[Result "1/2-1/2"]
[BlackElo "1738"]
[ECO "A55"]
[Opening "Old Indian"]
[Variation "5.e4 c6 6.Be2"]
[WhiteElo "1531"]
[TimeControl "300+1"]
[Termination "normal"]
[PlyCount "103"]
[WhiteType "human"]
[BlackType "human"]

1. e4 e5 2. Nf3 Nf6 3. c4 Nc6 4. Nc3 Bc5 5. Nxe5 Nxe5 6. d4 Bxd4 7. Qxd4 d6
8. Bg5 O-O 9. O-O-O Bg4 10. f3 Nc6 11. Bxf6 gxf6 12. Qf2 Bd7 13. c5 dxc5
14. Qxc5 a6 15. f4 Kh8 16. Rd2 Rc8 17. Be2 Nb8 18. Bg4 b6 19. Qa3 Qe8 20.
Bxd7 Nxd7 21. Qxa6 Qe6 22. Qd3 Nc5 23. Qd4 Rcd8 24. Qxd8 Rxd8 25. Rxd8+ Kg7
26. f5 Qe5 27. Kc2 c6 28. Rd2 b5 29. a3 Na4 30. Re2 Qc5 31. Ra1 h5 32. g3
Kh7 33. h4 Kh6 34. Rd1 Nxc3 35. bxc3 Qxa3 36. Rd3 c5 37. Rde3 c4 38. Kd1
Qa1+ 39. Kd2 Qb1 40. Rf3 Qa1 41. Ree3 Kg7 42. Ke2 Qb2+ 43. Ke1 Qh2 44. Rf2
Qh1+ 45. Rf1 Qg2 46. Rff3 Qg1+ 47. Ke2 Qg2+ 48. Rf2 Qg1 49. Rf1 Qg2+ 50.
Rf2 Qg1 51. Rf1 Qg2+ 52. Rf2 1/2-1/2[/pgn]

[pgn][Event "ICS rated blitz match"]
[Site "winboard.nl"]
[Date "2017.12.20"]
[Round "-"]
[White "rpiStockfish"]
[Black "RomiChess"]
[Result "1/2-1/2"]
[BlackElo "1536"]
[ECO "B51"]
[Opening "Sicilian"]
[Variation "3.Bb5+ Nd7 4.O-O Nf6"]
[WhiteElo "1736"]
[TimeControl "300+1"]
[Termination "normal"]
[PlyCount "123"]
[WhiteType "human"]
[BlackType "human"]

1. e4 c5 2. Nf3 d6 3. Bb5+ Bd7 4. Nc3 Bxb5 5. Nxb5 e5 6. d3 Nf6 7. Bg5 Qb6
8. a4 Nbd7 9. Nd2 Be7 10. O-O Qc6 11. Nc4 h6 12. Bxf6 Nxf6 13. Nc3 Rb8 14.
b3 b6 15. Ne3 Qd7 16. Qf3 Rb7 17. Ncd5 O-O 18. Nxe7+ Qxe7 19. Nf5 Qe6 20.
Qg3 Ne8 21. a5 b5 22. f4 a6 23. h3 b4 24. Kh2 Rb8 25. Qh4 Rb7 26. fxe5
Qxe5+ 27. Qf4 Rd7 28. g3 Kh8 29. Rad1 Nc7 30. d4 Qxf4 31. gxf4 Rfd8 32. Kg3
Nb5 33. dxc5 dxc5 34. Rxd7 Rxd7 35. e5 Kh7 36. Ne3 Rd2 37. Rf2 Rxf2 38.
Kxf2 Nd4 39. Kg3 Kg6 40. Kg4 h5+ 41. Kg3 f6 42. Kh4 fxe5 43. fxe5 Kh6 44.
Kg3 g5 45. Nd5 Nxc2 46. e6 Kg6 47. Nc7 Nd4 48. e7 Nf5+ 49. Kf3 Nxe7 50.
Nxa6 Nc6 51. Nxc5 Nxa5 52. Ke4 h4 53. Nd3 Nxb3 54. Nxb4 g4 55. hxg4 Kg5 56.
Nd5 Nd2+ 57. Ke3 h3 58. Kf2 Kxg4 59. Ne3+ Kh4 60. Nf5+ Kg5 61. Kg3 Ne4+ 62.
Kxh3 {Insufficient material} 1/2-1/2[/pgn]
CheckersGuy
Posts: 273
Joined: Wed Aug 24, 2016 9:49 pm

Re: Example of RL in action for programmers

Post by CheckersGuy »

Isn't this just book learning ? Or does RomiChess learn to generalize ?
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Example of RL in action for programmers

Post by Michael Sherwin »

CheckersGuy wrote:Isn't this just book learning ? Or does RomiChess learn to generalize ?
It is experiential data modified by reinforcement learning values that is loaded into the hash file that affects the values in the nodes of the alpha-beta search which result in a feedback value to the piece-square-tables of the evaluator that result in a lingering generalized influence.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Example of RL in action for programmers

Post by Michael Sherwin »

giovanni wrote:Hi Mike. Could you tell us a little bit more about the learning phase. Were they speed games? Also, is the learn.dat available to outside users? In positive case, I would love to get a copy.
Hi Giovanni, The training games were played at 10+1. The example games were played at 300+1. Some games were self play. Some were against SF 8.

If you send me a pm with your email address I can send the learn file. In this test that is currently running there are just over 8,000 games to go. Do you want the learn file with the two additional example games in it or would you prefer the learn file when the training is complete or both?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
flok

Re: Example of RL in action for programmers

Post by flok »

That is stockfish 5.0.dd+git20140823-1 on a raspberry pi 1.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Example of RL in action for programmers

Post by Michael Sherwin »

flok wrote:That is stockfish 5.0.dd+git20140823-1 on a raspberry pi 1.
Thanks. :D

Approximate CCRL equivalent rating?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through