Troubles with Texel Tuning

Daniel Shawul · Post by **Daniel Shawul** » Sun Oct 01, 2017 7:31 pm

AlvaroBegue wrote:
Daniel Shawul wrote:Using a first order method such as the conjugate-gradient method seem to suffer from the same problem. I think there is something in the way mini batches are processed in SGD that i am not doing here...
I think of conjugate-gradient as a second order method. I am talking about simpler algorithms than that. Adam is very popular these days (perhaps because it's the default learning algorithm in TensorFlow), but even plain gradient descent works well.

Ah! scipy do not anything like the steepest descent algorithm that is first order, i will test those later.

AndrewGrant · Post by **AndrewGrant** » Sun Oct 01, 2017 7:50 pm

Here is what I get from your positions:

Code: Select all

Material &#58; &#123;  66,  106&#125;, &#123; 259,  222&#125;, &#123; 245,  220&#125;, &#123; 284,  421&#125;, &#123; 446,  802&#125;, &#123;   0,    0&#125;, 

PawnPSQT &#58; 
&#123;   0,    0&#125;, &#123;   0,    0&#125;, &#123;   0,    0&#125;, &#123;   0,    0&#125;,
&#123; -10,   -9&#125;, &#123;  -3,   -7&#125;, &#123;   3,   -9&#125;, &#123;  -3,   -4&#125;,
&#123;  -8,   -9&#125;, &#123;  -2,   -8&#125;, &#123;  -2,  -10&#125;, &#123;   0,   -4&#125;,
&#123; -11,   -8&#125;, &#123;  -9,   -6&#125;, &#123;  -3,  -10&#125;, &#123;  -2,  -12&#125;,
&#123;  -4,   -1&#125;, &#123;  -3,    0&#125;, &#123;  -3,   -7&#125;, &#123;  -3,  -12&#125;,
&#123;   9,    6&#125;, &#123;  14,    6&#125;, &#123;   4,    1&#125;, &#123;   9,   -2&#125;,
&#123;  34,   52&#125;, &#123;  24,   57&#125;, &#123;  14,   55&#125;, &#123;  24,   52&#125;,
&#123;   0,    0&#125;, &#123;   0,    0&#125;, &#123;   0,    0&#125;, &#123;   0,    0&#125;,

KnightPSQT&#58;
&#123; -33,  -26&#125;, &#123; -25,  -17&#125;, &#123; -14,  -11&#125;, &#123; -11,   -3&#125;,
&#123; -25,  -18&#125;, &#123; -17,    2&#125;, &#123;  22,    5&#125;, &#123;  12,   15&#125;,
&#123; -18,   -5&#125;, &#123;   8,   13&#125;, &#123;  20,   16&#125;, &#123;  26,   41&#125;,
&#123;  -8,  -13&#125;, &#123;  18,   15&#125;, &#123;  32,   29&#125;, &#123;  34,   34&#125;,
&#123;   8,    2&#125;, &#123;  30,   17&#125;, &#123;  38,   28&#125;, &#123;  50,   33&#125;,
&#123;   3,    5&#125;, &#123;  18,   11&#125;, &#123;  23,   30&#125;, &#123;  60,   30&#125;,
&#123;   3,    1&#125;, &#123;  13,    0&#125;, &#123;  16,   17&#125;, &#123;  19,   26&#125;,
&#123; -38,  -43&#125;, &#123;  -7,  -14&#125;, &#123;   0,    1&#125;, &#123;  -2,   -3&#125;,

Have the other PSQTs...

Isolated Pawn &#58;  &#123; -10,   -7&#125;,

Stacked Pawn &#58; &#123;   0,  -19&#125;,

Mobilty Knight&#58;
 &#123; -17,  -28&#125;, &#123; -15,  -37&#125;, &#123; -11,  -34&#125;,
 &#123;  -7,  -38&#125;, &#123; -16,  -32&#125;, &#123; -17,  -31&#125;,
 &#123; -29,   -3&#125;, &#123;   0,    0&#125;, &#123;   0,    0&#125;,

Mobility Bishop&#58;
&#123; -15,  -10&#125;, &#123; -28,  -14&#125;, &#123;  -8,   -9&#125;, &#123;   4,    8&#125;,
&#123;  11,   16&#125;, &#123;  20,   18&#125;, &#123;  27,   24&#125;, &#123;  30,   27&#125;,
&#123;  35,   31&#125;, &#123;  38,   28&#125;, &#123;  41,   36&#125;, &#123;  51,   18&#125;,
&#123;  24,   40&#125;, &#123;  13,    5&#125;,

Mobility Rook&#58;
 &#123;  -6,   -2&#125;, &#123; -32,  -12&#125;, &#123;   7,  -14&#125;,  &#123;  15,   -3&#125;, &#123;  25,    9&#125;,
 &#123;  26,   13&#125;,&#123;  22,   29&#125;, &#123;  24,   31&#125;, &#123;  27,   37&#125;, &#123;  29,   42&#125;,
 &#123;  32,   47&#125;, &#123;  34,   52&#125;, &#123;  34,   59&#125;,  &#123;  28,   65&#125;, &#123;  13,   64&#125;,

Mobility Queen&#58;
 &#123;   0,    0&#125;, &#123;   0,    0&#125;, &#123;  -1,    0&#125;, &#123;  -6,   -1&#125;, &#123; -20,   -5&#125;, &#123; -19,   -2&#125;, &#123; -14,    0&#125;,
 &#123;  -1,    3&#125;, &#123;   0,   14&#125;,&#123;   7,   13&#125;,  &#123;  12,   23&#125;, &#123;  16,   24&#125;, &#123;  19,   29&#125;, &#123;  24,   29&#125;, 
 &#123;  24,   38&#125;, &#123;  29,   42&#125;, &#123;  35,   43&#125;, &#123;  34,   54&#125;, &#123;  37,   58&#125;, &#123;  38,   68&#125;, &#123;  41,   73&#125;,
 &#123;  48,   73&#125;, &#123;  43,   67&#125;, &#123;  40,   63&#125;, &#123;  29,   46&#125;, &#123;  17,   28&#125;, &#123;   6,    9&#125;,&#123;   3,    6&#125;,

Tests with these values (generated with all params starting at 0) showed a loss of ~30 elo. The values I got started from my current ones also showed a loss of about ~20 ELO.

Tuning seems futile at this point

Evert · Post by **Evert** » Sun Oct 01, 2017 8:10 pm

Some obvious issues with those values. The MG values for the Rook and Queen are far too low (I had the same issue, it goes away with more knowledge) and the value of the minors goes down in the end game (which leads to bad play). The mobility tables look very noisy, you may want to verify that each value is at least represented in the test positions (I suspect the high Queen mobility entries are not represented, for instance). A better idea than tuning individual values is probably to tune a formula that builds up the table.
Do you have passed pawn evaluation and king safety?

AndrewGrant · Post by **AndrewGrant** » Sun Oct 01, 2017 8:29 pm

Do you have passed pawn evaluation and king safety?

Yes and Yes. I tuned them as well. Every param in the evaluation is in the tuner.

Daniel Shawul · Post by **Daniel Shawul** » Sun Oct 01, 2017 9:27 pm

AlvaroBegue wrote:
Daniel Shawul wrote:Using a first order method such as the conjugate-gradient method seem to suffer from the same problem. I think there is something in the way mini batches are processed in SGD that i am not doing here...
I think of conjugate-gradient as a second order method. I am talking about simpler algorithms than that. Adam is very popular these days (perhaps because it's the default learning algorithm in TensorFlow), but even plain gradient descent works well.

I tried the first order steepest descent algorithm now which is not available in scipy. With a very high learning rate (alpha) it converges to good piece values quickly, however, it still can not handle the sampling scheme I had. One reason could be SGD requres a much lower learning rate so i have to decrease that. But I am more inclined to belive that the sampling scheme i have is flawed. To compute the gradient i need f(x) and f(x+delta) but those two are computed on two different samples. The idea of the SGD is to compute the gradient from a mini-batch (both f(x) and f(x+delta) from same mini-batch) and use that for update as if it was computed from the whole dataset.

Code: Select all

Engine&#40;1506885786&#41; <<< mse 0.01 0 0.00575646273249
Engine&#40;1506885786&#41; >>> 0.0658148119181500
Engine&#40;1506885786&#41; <<< 
QUEEN_MG 921.995596819
ROOK_MG 526.493363712
BISHOP_MG 358.684044774
KNIGHT_MG 334.052506437
PAWN_MG 113.445753437

AlvaroBegue · Post by **AlvaroBegue** » Sun Oct 01, 2017 9:34 pm

Daniel Shawul wrote:To compute the gradient i need f(x) and f(x+delta) but those two are computed on two different samples. The idea of the SGD is to compute the gradient from a mini-batch (both f(x) and f(x+delta) from same mini-batch) and use that for update as if it was computed from the whole dataset.

Why are you not computing the gradient directly, instead of doing this f(x) and f(x+delta) business?

Daniel Shawul · Post by **Daniel Shawul** » Sun Oct 01, 2017 10:58 pm

AlvaroBegue wrote:
Daniel Shawul wrote:To compute the gradient i need f(x) and f(x+delta) but those two are computed on two different samples. The idea of the SGD is to compute the gradient from a mini-batch (both f(x) and f(x+delta) from same mini-batch) and use that for update as if it was computed from the whole dataset.
Why are you not computing the gradient directly, instead of doing this f(x) and f(x+delta) business?

It needs more work but i guess it would be comparable to modifying my eval to work with double precision. Infact I actually started incorporating the three header files and modifying my eval to accept a template argument but it got pretty messy quickly. However, it could be a good play ground for testing individual evaluation terms.

The stochastic issue is solved after i provided the finite difference gradient functor to the optimizer, in which i used the same random number seeds for f(x) and f(x+delta). This worked for the first order method i.e. gradient descent, but it didn't work for the second order methods CG and BFGS as expected.

With SDG

Code: Select all

Engine&#40;1506891302&#41; <<< mse 0.01 21788 0.00575646273249
Engine&#40;1506891303&#41; >>> 0.0670008220841582
Engine&#40;1506891303&#41; <<< 
QUEEN_MG 903.193388403
ROOK_MG 523.035474633
BISHOP_MG 354.517504783
KNIGHT_MG 334.146614028
PAWN_MG 113.134921652

This might not still be a real SGD though as the samples may overlap in my scheme.

Daniel

D Sceviour · Post by **D Sceviour** » Mon Oct 02, 2017 12:18 am

AndrewGrant wrote:Tuning seems futile at this point

I got 0 elo change out of tuning. The tuning exercise did expose some gross errors in the evaluator that would have continued to go undetected without going over each of the variables. For example, I used to have a value for Kaufman's redundant rooks. Texel's tuning did not like this and returned unstable results from -500 to 500. Perhaps the formula was incorrect, but it was deleted anyway. The king safety calculation was completely re-written. In many other cases, Texel's tuning verified the optimal value the first time!

Every program already finds its own balance of evaluation with intuitive hand tuning. Overall, the additional tuning exercise changed the style of play but not the strength. It seems more predictable now. My program is less likely to beat stronger engines with wild attacks. On the other hand, the score is more stable against weaker programs.

Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning

Re: Troubles with Texel Tuning