Texel Tuning - Success!

AndrewGrant · Post by **AndrewGrant** » Tue Oct 24, 2017 7:50 pm

Original Thread http://www.talkchess.com/forum/viewtopic.php?t=65290

I want this thread to serve as another example of how to get Texel Tuning going. I'll talk about what I tried (and what worked) in hopes that the guy after me can make use of it.

For tuning I used the quiet positions found here:
http://www.talkchess.com/forum/viewtopic.php?t=61427
http://www.talkchess.com/forum/viewtopi ... 594#733594

Had a great deal of difficulty getting Texel Tuning to work for my engine. After reading the responses to the above thread, I implemented a simple SGD method for the tuner. This required the ability to reconstruct the eval as a linear function of terms, so I wrapped a method of "tracing" each evaluation term added during the evaluation calls.

Once I cleaned up the bugs with this, I was still having no luck. I then wrote a method to scale the learning rates of each parameter, relative to how often they occur (by game phases) in the test positions. This extremely simple method allowed much quicker convergence, and I believe a better result.

Code: Select all

void calculateLearningRates&#40;TexelEntry * tes, double rates&#91;NT&#93;&#91;PHASE_NB&#93;)&#123;
    
    int i, j;
    double avgByPhase&#91;PHASE_NB&#93; = &#123;0&#125;;
    double occurances&#91;NT&#93;&#91;PHASE_NB&#93; = &#123;&#123;0&#125;, &#123;0&#125;&#125;;

    // NP = Number Of Test Positions
    // NT = Number Of Evaluation Terms
    // tes&#91;i&#93;.coeffs is the coefficient for each evaluation. IE, -2 if black up two pawns, 1 if white up one pawn...
    // tes&#91;i&#93;.factors is just the midgame phase factor, and endgame phase factor, a la Fruit
    
    for &#40;i = 0; i < NP; i++)&#123;
        for &#40;j = 0; j < NT; j++)&#123;
            occurances&#91;j&#93;&#91;MG&#93; += abs&#40;tes&#91;i&#93;.coeffs&#91;j&#93;) * tes&#91;i&#93;.factors&#91;MG&#93;;
            occurances&#91;j&#93;&#91;EG&#93; += abs&#40;tes&#91;i&#93;.coeffs&#91;j&#93;) * tes&#91;i&#93;.factors&#91;EG&#93;;
            avgByPhase&#91;MG&#93;    += abs&#40;tes&#91;i&#93;.coeffs&#91;j&#93;) * tes&#91;i&#93;.factors&#91;MG&#93;;
            avgByPhase&#91;EG&#93;    += abs&#40;tes&#91;i&#93;.coeffs&#91;j&#93;) * tes&#91;i&#93;.factors&#91;EG&#93;;
        &#125;
    &#125;
    
    avgByPhase&#91;MG&#93; /= NT;
    avgByPhase&#91;EG&#93; /= NT;
        
    for &#40;i = 0; i < NT; i++)&#123;
        if &#40;occurances&#91;i&#93;&#91;MG&#93; >= 1.0&#41;
            rates&#91;i&#93;&#91;MG&#93; = avgByPhase&#91;MG&#93; / occurances&#91;i&#93;&#91;MG&#93;;
        if &#40;occurances&#91;i&#93;&#91;EG&#93; >= 1.0&#41;
            rates&#91;i&#93;&#91;EG&#93; = avgByPhase&#91;EG&#93; / occurances&#91;i&#93;&#91;EG&#93;;
    &#125;
&#125;

This was still not enough to get a good result from the tuner. I found that certain parameters would overpower others, IE queen value would mess with the queen mobility and PSQT. Since, starting from zero, QueenValue must climb ~900 centipawns, where as the PSQT fluctuates around +-50centipawns. So, I run the tuner using my current (hand tuned) terms as the starting point.

This implementation can be found in texel.c, texel.h, and evaluate.c in my source directory (once I add them, in a few hours, for future readers)

I tuned in two stages (I was able to get good PSQT values from a different tuner) and found the elo gains of +24, and then +48 when tested at 30s+.03s.

brtzsnr · Post by **brtzsnr** » Tue Oct 24, 2017 9:52 pm

Cool. I need to update the README with all successes so far

.

Given that you've just started with texel tuning can I share another training set with you?

I'm having trouble coming up with another set, and I suspect that my current eval function is over fitted on the training set your are using. Since you didn't have time to change your eval function based on this training set, your engine is a good candidate to test the new training data.

AndrewGrant · Post by **AndrewGrant** » Tue Oct 24, 2017 10:00 pm

I would be happy to try out any other data sets.

brtzsnr · Post by **brtzsnr** » Tue Oct 24, 2017 10:31 pm

Here:

v6 has more quiet positions and all positions are played twice.
v7 has more violent positions and the quality of the results is lower (games played at lower time control).

v7: https://bitbucket.org/zurichess/tuner/d ... .v7.epd.gz
v6: https://bitbucket.org/zurichess/tuner/d ... .v6.epd.gz

Please, if you or anybody else tests these training sets, report your experience and results

.

One thing to try is to remove 3men, 4men, 5men positions because the search is more important for them. FWIW, CCRL adjudicates at 5men. You can add them later with new eval terms.

AndrewGrant · Post by **AndrewGrant** » Tue Oct 24, 2017 10:33 pm

I'll note that I have not yet written the code to work for non quiet positions. Right now I get my eval coefficients just from evaluateBoard(). If I want to use non quiet positions, I will need set my qsearch up to track the expected position. Something which will take me some thought.

Also, why played twice? all this is going to do is effectively double the learning rate?

Set 7 contains noisy positions, but set 6 does not?

brtzsnr · Post by **brtzsnr** » Tue Oct 24, 2017 10:36 pm

AndrewGrant wrote:I'll note that I have not yet written the code to work for non quiet positions. Right now I get my eval coefficients just from evaluateBoard(). If I want to use non quiet positions, I will need set my qsearch up to track the expected position. Something which will take me some thought.

Set 7 contains noisy positions, but set 6 does not?

For all positions I resolved the captures.

v6 plays 4 shallow moves and, IIRC, expects all moves be quiet.
*EDIT* v7 does a simple QS to resolve the captures and outputs the final position after captures. This is similar to the current set you've already used.

v7 should have more positions for which king safety eval is important.

AndrewGrant · Post by **AndrewGrant** » Tue Oct 24, 2017 10:39 pm

I had problems tuning the king safety attack tables, so I disabled that in the tuner for now. If I manage a work around I'll let you know. About to run on set 6.

Would you like me to run starting at the values i got after the first tuning set, or my values BEFORE all tuning. The first is easier for me ATM, but I will do either.

jdart · Post by **jdart** » Wed Oct 25, 2017 3:33 am

scale the learning rates of each parameter, relative to how often they occur (by game phases) in the test positions

Varying the learning rates per parameter is a good idea, but I don't think basing it on frequency is well-founded. Most common methods use gradient information to set the per-parameter rate (Adagrad, ADAM, Nestorov momentum, for example).

--Jon

AndrewGrant · Post by **AndrewGrant** » Wed Oct 25, 2017 9:33 pm

Tuned with set6. Did one tuning session starting from the old params, and one starting from the currently tuned.

Both did quiet poorly against my base version of Ethereal. Some params went really extreme. I imagine the test set contains only a few positions where they are present, and in those positions, they are VERY good or VERY bad.

Here are the results of 192,000 iterations (The ones I tuned with succesfully where tuned with just 5,000...) https://pastebin.com/zyVHTT1B

Piece values really got dragged down, and spread out. I know I could adjust the PSQT so that the piece values are closer to expected, but the goal with this tuner was to have no human input.

For some reason all the Knight and Bishop outpost values got really cranked up. Also a great deal of odd scaling in the Passed Pawn values. (Indexed by canAdvance, safeAdvace, relativeRank).

brtzsnr · Post by **brtzsnr** » Wed Oct 25, 2017 10:25 pm

AndrewGrant wrote:Piece values really got dragged down, and spread out. I know I could adjust the PSQT so that the piece values are closer to expected, but the goal with this tuner was to have no human input..

Thank you for testing.

Quick observation. For tuning I also use L1 regularization which helps with averaging PSQT around 0 and giving proper values to the piece values.

What are the values after 5000 iterations? How does the new version perform against oldver versions in a 1000 games match?

Texel Tuning - Success!

Texel Tuning - Success!

Re: Texel Tuning - Success!

Re: Texel Tuning - Success!

Re: Texel Tuning - Success!

Re: Texel Tuning - Success!

Re: Texel Tuning - Success!

Re: Texel Tuning - Success!

Re: Texel Tuning - Success!

Re: Texel Tuning - Success!

Re: Texel Tuning - Success!