## tensorflow

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 3:02 pm
Contact:

### tensorflow

Google has just released a new ML library called TensorFlow http://tensorflow.org/get_started . I decided to test it because i've been looking to improve my evaluation function. I adapted the example to optimize y = sigmoid((1-p)*w_m*x + p*w_e*x). I extracted ~125 features (in the example below I only use figure value for exemplification) and run it.

Code: Select all

``````mport tensorflow as tf
import numpy as np

f = open&#40;'testdata')
x_data, y_data, p_data = &#91;&#93;, &#91;&#93;, &#91;&#93;
for l in f&#58;
a = &#91;float&#40;e&#41; for e in l.split&#40;)&#93;
y_data.append&#40;a&#91;&#58;1&#93;)
p_data.append&#40;a&#91;1&#58;2&#93;)
x_data.append&#40;a&#91;2&#58;9&#93;)

print "read %d (%d&#41; records" % &#40;len&#40;x_data&#41;, len&#40;y_data&#41;)
print "read %d inputs" % len&#40;x_data&#91;0&#93;)

WM = tf.Variable&#40;tf.random_uniform&#40;&#91;len&#40;x_data&#91;0&#93;), 1&#93;))
WE = tf.Variable&#40;tf.random_uniform&#40;&#91;len&#40;x_data&#91;0&#93;), 1&#93;))

xm = tf.matmul&#40;x_data, WM&#41;
xe = tf.matmul&#40;x_data, WE&#41;

P = tf.constant&#40;p_data&#41;
y = xm*&#40;1-P&#41;+xe*P
y = tf.sigmoid&#40;y/2&#41;

loss = tf.reduce_mean&#40;tf.square&#40;y - y_data&#41;)
train = optimizer.minimize&#40;loss&#41;

init = tf.initialize_all_variables&#40;)

sess = tf.Session&#40;)
sess.run&#40;init&#41;

# Fit the plane.
for step in xrange&#40;0, 1000000&#41;&#58;
sess.run&#40;train&#41;
if step % 10 == 0&#58;
print step, sess.run&#40;loss&#41;
if step % 100 == 0&#58;
l, m, e = &#91;&#93;, sess.run&#40;WM&#41;, sess.run&#40;WE&#41;
for i in range&#40;len&#40;m&#41;)&#58;
l.append&#40;&#40;int&#40;m&#91;i&#93;&#91;0&#93;*100&#41;, int&#40;e&#91;i&#93;&#91;0&#93;*100&#41;))
print step, l
``````
For 700.000 positions this converges fast to

Code: Select all

``````1000 0.116354
1000 &#91;&#40;27, 19&#41;, &#40;73, 188&#41;, &#40;353, 407&#41;, &#40;379, 452&#41;, &#40;535, 738&#41;, &#40;1212, 1379&#41;, &#40;10, 58&#41;&#93;
``````
That is a list of pairs or piece values in midgame and endgame.

Pawn = 73/188
Knight = 353/407
Bishop = 379/452
Rook = 535/748
Queen = 1212/1339

Looks very promising. With all features enabled I can get down to loss 0.109577. This is impressive considering I spent in total of 2h learning the framework and trying different functions to optimize. The best algorithm was AdamOptimizer which converges wicked fast.

I tried using more than one layer, but the loss was not better than 0.109577.

brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 3:02 pm
Contact:

### Re: tensorflow

2000 hyper-bullet games show basically no change:

Score of zurichess vs basic: 791 - 794 - 415 [0.499] 2000
ELO difference: -1

Steve Maughan
Posts: 1055
Joined: Wed Mar 08, 2006 7:28 pm
Location: Florida, USA
Contact:

### Re: tensorflow

So this was effectively a one-layer (i.e. linear) logistic regression?

- Steve
http://www.chessprogramming.net - Maverick Chess Engine

brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 3:02 pm
Contact:

### Re: tensorflow

Yes. I used tensor flow to tune the weights in zurichess. Most engine's evaluation functions can be modeled as a single layer nn (y=w.x) so the idea should apply easily to other chess engines. I tried using a 2 layer nn with relu as activation function of the hidden layer (as in Girraffe) but the minimum final loss was the same.

jdart
Posts: 3720
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

### Re: tensorflow

I have used AdaGrad with some success. It is also fast to converge and is very simple to code.

See https://github.com/jdart1/arasan-chess/ ... /tuner.cpp.

--Jon

matthewlai
Posts: 789
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

### Re: tensorflow

jdart wrote:I have used AdaGrad with some success. It is also fast to converge and is very simple to code.

See https://github.com/jdart1/arasan-chess/ ... /tuner.cpp.

--Jon
There is a newer sub-gradient algorithm called AdaDelta. It's an improvement on AdaGrad, and seems to perform a little bit better in most applications. It's what I use in Giraffe. I have also implemented AdaGrad.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.

jdart
Posts: 3720
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

### Re: tensorflow

Some reports I have read indicate AdaDelta is not better than AdaGrad. Or at least, which one is better may depend on the problem. See for example https://www.quora.com/Why-is-AdaDelta-n ... D-variants.

--Jon

matthewlai
Posts: 789
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

### Re: tensorflow

jdart wrote:Some reports I have read indicate AdaDelta is not better than AdaGrad. Or at least, which one is better may depend on the problem. See for example https://www.quora.com/Why-is-AdaDelta-n ... D-variants.

--Jon
It depends on the problem. For example, in reinforcement learning, AdaDelta is much better than AdaGrad because reinforcement learning is about chasing a moving minimum, and step size shouldn't decrease as training goes on.

In any situation where the minimum moves, AdaDelta will be much better.

AdaDelta also has fewer constants that require tuning (and the constants aren't really important anyways).

In my experience, well-tuned AdaGrad performs about the same as AdaDelta for stationary tasks, but AdaDelta is always at least almost as good, and is pretty much foolproof without any tuning.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.

brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 3:02 pm
Contact:

### Re: tensorflow

FWIW, here I used ADAM, http://arxiv.org/pdf/1412.6980v8.pdf which is inspired by AdaGrad and RMSProp. Comparing to AdaGrad it converges a lot faster. Tensorflow doesn't provide an implementation of AdaDelta.

matthewlai
Posts: 789
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

### Re: tensorflow

brtzsnr wrote:FWIW, here I used ADAM, http://arxiv.org/pdf/1412.6980v8.pdf which is inspired by AdaGrad and RMSProp. Comparing to AdaGrad it converges a lot faster. Tensorflow doesn't provide an implementation of AdaDelta.
Adam looked interesting. I've always wanted to give it a try.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.