Deep Learning Chess Engine ?

brtzsnr · Post by **brtzsnr** » Thu Jul 21, 2016 9:36 pm

Geneva was the first version to use tensorflow. Glarus and the current development branch improved evaluation quiet a lot using this (e.g. backward pawns, king safety, knight & bishop psqt).

The NN I use is simply:

Code: Select all

WM = tf.Variable(tf.random_uniform([len(x_data[0]), 1]))
WE = tf.Variable(tf.random_uniform([len(x_data[0]), 1]))

xm = tf.matmul(x_data, WM)
xe = tf.matmul(x_data, WE)

P = tf.constant(p_data)
y = xm*(1-P)+xe*P
y = tf.sigmoid(y/2)

loss = tf.reduce_mean(tf.square(y - y_data)) + 1e-4*tf.reduce_mean(tf.abs(WM) + tf.abs(WE))
optimizer = tf.train.AdamOptimizer(learning_rate=0.1)
train = optimizer.minimize(loss)

It takes 10min for the weights to converge and I need to train it twice: once with any search disabled, and once with quiescence search enabled. Before this, I implemented a general hill-climbing algorithm, but it was converging very slow (1 day) and the results were not always very good.

Training this way is much faster than playing 100k games for SPSA, but it has the disadvantage that it somehow limits the set of usable features. The NN should compute the same value as your evaluation function - without the sigmoid. For example a linear NN won't be able to compare values as in the following code from Stockfish. Probably here you need a deeper NN.

Code: Select all

        else if (    abs(eg) <= BishopValueEg
                 &&  ei.pi->pawn_span(strongSide) <= 1
                 && !pos.pawn_passed(~strongSide, pos.square<KING>(~strongSide)))
            sf = ei.pi->pawn_span(strongSide) ? ScaleFactor(51) : ScaleFactor(37);

thomasahle · Post by **thomasahle** » Wed Aug 03, 2016 10:44 am

There is also Spawkfish: http://spawk.fish
Unfortunately it's not open source, but the author wrote a fee posts on how it works.

Its interesting because it doesn't try to learn evaluation, but tries to learn the correct move directly. Similar to the policy network in alphago. It also uses a quote deep network.

ZirconiumX · Post by **ZirconiumX** » Wed Aug 03, 2016 11:40 am

thomasahle wrote:There is also Spawkfish: http://spawk.fish
Unfortunately it's not open source, but the author wrote a fee posts on how it works.

Its interesting because it doesn't try to learn evaluation, but tries to learn the correct move directly. Similar to the policy network in alphago. It also uses a quote deep network.

Spawkfish is quite interesting. Very weak though - even Dorpsgek (~1500 Elo) beats it.

[Event "Computer Chess Game"] 
[Site "THUNDERBIRD"] 
[Date "2016.08.03"] 
[Round "-"] 
[White "Dorpsgek Ambrosia 3"] 
[Black "Dan"] 
[Result "1-0"] 
[TimeControl "300"] 
[Annotator "9. +0.26"] 
 
1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 a6 6. Be3 e5 7. Nb3 Be6 8. 
f3 Be7 9. Nd5 {+0.26/11 6} Nxd5 10. exd5 {+0.63/12 7} Bf5 11. c4 
{+0.57/11 7} Nd7 12. Bd3 {+0.50/11 5} Bxd3 13. Qxd3 {+0.39/11 4} O-O 14. 
O-O {+0.39/10 7} Bg5 15. Bxg5 {+0.56/11 5} Qxg5 16. Qc3 {+0.35/11 14} Rac8 
17. Na5 {+0.75/10 3} b5 18. b3 {+0.75/11 5} Nb6 19. Nc6 {+1.09/11 5} bxc4 
20. bxc4 {+1.26/11 6} Na4 21. Qc1 {+1.20/12 12} Qxc1 22. Rfxc1 {+1.14/12 6} 
Rc7 23. Rab1 {+1.16/12 9} f5 24. h3 {+1.08/11 2.6} Nc5 25. Rb6 {+1.02/11 5} 
Rff7 26. a3 {+1.15/10 2.9} Rb7 27. Rcb1 {+1.13/11 2.5} Rxb6 28. Rxb6 
{+1.35/12 4} Rb7 29. Rxb7 {+1.09/14 4} Nxb7 30. Ne7+ {+1.10/12 3} Kf7 31. 
Nxf5 {+1.14/13 3} Kg6 32. g4 {+1.47/13 4} Kf6 33. Ng3 {+1.43/14 4} g6 34. 
Ne4+ {+2.01/14 2.4} Ke7 35. g5 {+1.86/14 7} Na5 36. Nd2 {+1.93/15 5} Kd7 
37. Kf2 {+2.13/14 2.3} Kc7 38. Ke3 {+2.28/14 2.3} Kb7 39. Kd3 {+2.91/15 4} 
Kb6 40. Ne4 {+2.90/15 4} Nb7 41. Kd2 {+2.74/14 1.8} Ka5 42. Kc2 
{+2.83/14 2.2} Ka4 43. Kb2 {+2.66/15 3} Ka5 44. Kb1 {+2.71/14 2.4} Ka4 45. 
Ka2 {+2.81/15 2.6} Ka5 46. Kb2 {+2.61/15 3} Kb6 47. Kc3 {+2.76/14 2.8} Kc7 
48. Nf6 {+3.10/14 1.9} Nc5 49. Nxh7 {+3.22/13 1.6} Kd7 50. Nf8+ 
{+4.17/14 2.2} Ke7 51. Nxg6+ {+4.06/14 4} Kf7 52. Nh4 {+4.10/14 2.8} a5 53. 
Kc2 {+3.96/13 2.2} Kg7 54. Kc3 {+4.09/13 1.8} Kf7 55. Kc2 {+3.98/13 1.4} 
Kg7 56. Kd2 {+3.93/13 1.5} Nb3+ 57. Kc3 {+4.08/13 1.3} Nd4 58. Kd3 
{+4.36/13 1.4} Nb3 59. Ke3 {+4.16/14 2.9} Nd4 60. Ke4 {+4.33/14 3} Nc2 61. 
Nf5+ {+5.02/13 1.5} Kg6 62. c5 {+8.26/13 1.7} dxc5 63. d6 {+11.42/14 2.8} 
Nxa3 64. d7 {+1000.11/12 1.1} Nb5 65. d8=Q {+1000.07/12 2.5} Nc3+ 66. Kxe5 
{+1000.05/12 1.6} Ne2 67. Qf6+ {+1000.03/11 1.4} Kh5 68. Qh6# 
{+1000.01/11 1.1} 
{Xboard adjudication: Checkmate} 1-0

matthewlai · Post by **matthewlai** » Thu Aug 04, 2016 1:24 pm

brtzsnr wrote:Geneva was the first version to use tensorflow. Glarus and the current development branch improved evaluation quiet a lot using this (e.g. backward pawns, king safety, knight & bishop psqt).

The NN I use is simply:
Code: Select all
WM = tf.Variable(tf.random_uniform([len(x_data[0]), 1]))
WE = tf.Variable(tf.random_uniform([len(x_data[0]), 1]))

xm = tf.matmul(x_data, WM)
xe = tf.matmul(x_data, WE)

P = tf.constant(p_data)
y = xm*(1-P)+xe*P
y = tf.sigmoid(y/2)

loss = tf.reduce_mean(tf.square(y - y_data)) + 1e-4*tf.reduce_mean(tf.abs(WM) + tf.abs(WE))
optimizer = tf.train.AdamOptimizer(learning_rate=0.1)
train = optimizer.minimize(loss)
It takes 10min for the weights to converge and I need to train it twice: once with any search disabled, and once with quiescence search enabled. Before this, I implemented a general hill-climbing algorithm, but it was converging very slow (1 day) and the results were not always very good.

Training this way is much faster than playing 100k games for SPSA, but it has the disadvantage that it somehow limits the set of usable features. The NN should compute the same value as your evaluation function - without the sigmoid. For example a linear NN won't be able to compare values as in the following code from Stockfish. Probably here you need a deeper NN.
Code: Select all
        else if (    abs(eg) <= BishopValueEg
                 &&  ei.pi->pawn_span(strongSide) <= 1
                 && !pos.pawn_passed(~strongSide, pos.square<KING>(~strongSide)))
            sf = ei.pi->pawn_span(strongSide) ? ScaleFactor(51) : ScaleFactor(37);

With no non-linearity each layer is just a matrix multiplication, and you can actually collapse all layers into one, and get an equivalent linear function. Like you said, there are many things that cannot be modeled with a linear function.

With ReLU activation I found the sweet spot for Giraffe to be about 3 hidden layers. It took roughly 72 hours to converge, but 24-48 hours to get to a pretty good level.

Deep Learning Chess Engine ?

Re: Deep Learning Chess Engine ?

Re: Deep Learning Chess Engine ?

Re: Deep Learning Chess Engine ?

Re: Deep Learning Chess Engine ?