Texel tuning speed

Joost Buijs · Post by **Joost Buijs** » Mon Sep 03, 2018 8:08 am

xr_a_y wrote: ↑Mon Sep 03, 2018 7:04 am What may be too big or too small for changes in piece value ?

For now I am near 1 indeed, that may be too small ?

A change of 1 in piece value should be fine.

I do something like:

Code: Select all


for (i = 0; i < nterms; i++)
{
	term[i] += gradient_step;
	gradient[i] = (new_error() - old_error) / gradient_step;
	term[i] -= gradient_step;
}

for (i = 0; i < nterms; i++)
term[i] -= gradient[i] * learning_rate;

Basically very simple, gradient_step is fixed at 2 in my case, I never change that, and the learning rate varies between 1.0 and 10.0 depending upon the number of iterations and some other criteria.

When I apply very strange terms to start with, and with a high learning_rate the process will sometimes go wacko, with normal parameters it usually converges between 100 and 200 interations.

xr_a_y · Post by **xr_a_y** » Tue Sep 04, 2018 8:58 pm

hmmm that's more or less what I do too, without success. Even starting from stupid piece value, the algorithm is stuck very soon ...

Here's the code

Code: Select all

namespace Texel {

    class TexelInput {
    public:
        TexelInput(Position * p, int r) :p(p), result(r) {} // not taking position ownership
        ~TexelInput() {}
        Position * p;
        int result;
        TexelInput(const TexelInput & t) {
            p = t.p;
            result = t.result;
        }
        TexelInput & operator =(const TexelInput & t) {
            p = t.p;
            result = t.result;
            return *this;
        }
        TexelInput & operator =(TexelInput && t) {
            p = t.p;
            result = t.result;
            return *this;
        }
    };

    template < typename T >
    class TexelParam {
    public:
        TexelParam(const std::string & a, const T& inf, const T& sup) :accessor(a), inf(inf), sup(sup) {}
        std::string accessor;
        T inf;
        T sup;
        T& operator()() {
            return Definitions::GetValue<T>(accessor);
        }
        void Set(const T & value) {
            Definitions::SetValue(accessor, std::to_string(value));
        }
    };

    template<class Iter >
    Iter random_unique(Iter begin, Iter end, size_t num_random) {
        size_t left = std::distance(begin, end);
        while (num_random--) {
            Iter r = begin;
            std::advance(r, rand() % left);
            std::swap(*begin, *r);
            ++begin;
            --left;
        }
        return begin;
    }

    double Sigmoid(Position * p) {
        static Searcher searcher;
        const double s = searcher.NegaQuiesce(-Definitions::scores.infScore,Definitions::scores.infScore,50,0,*p,p->Turn(),true,TimeMan::eTC_off);
        static const double K = 0.75;
        return 1. / (1. + std::pow(10, -K * s / 400.));
    }

    double E(const std::vector<Texel::TexelInput> &data, size_t miniBatchSize) {
        double e = 0;
        for (size_t k = 0; k < miniBatchSize; ++k) {
            e += std::pow(double((data[k].result+1)*0.5 - Sigmoid(data[k].p)),2);
        }
        e /= data.size();
        return e;
    }

    void Randomize(std::vector<Texel::TexelInput> & data, size_t miniBatchSize) {
        random_unique(data.begin(), data.end(), miniBatchSize);
    }

    std::vector<double> ComputeGradient(std::vector<TexelParam<ScoreType> > x0, std::vector<Texel::TexelInput> &data, size_t gradientBatchSize) {
        LOG(logINFO) << "Computing gradient";
        std::vector<double> g;
        const ScoreType delta = 1;
        for (size_t k = 0; k < x0.size(); ++k) {
            LOG(logINFO) << "... " << k;
            double grad = 0;
            ScoreType oldvalue = x0[k]();
            x0[k].Set(oldvalue + delta);
            grad = E(data, gradientBatchSize);
            x0[k].Set(oldvalue - delta);
            grad += E(data, gradientBatchSize);
            x0[k].Set(oldvalue);
            grad -= 2*E(data, gradientBatchSize);
            g.push_back(grad/(2*delta));
            LOG(logINFO) << "Gradient " << k << " " << grad;
        }
        double norm = 0;
        for (size_t k = 0; k < x0.size(); ++k) {
            norm += g[k] * g[k];
        }
        norm = sqrt(norm);
        for (size_t k = 0; k < x0.size(); ++k) {
            g[k] /= norm;
            LOG(logINFO) << "Gradient normalized " << k << " " << g[k];
        }
        return g;
    }

    std::vector<TexelParam<ScoreType> > TexelOptimizeGD(const std::vector<TexelParam<ScoreType> >& initialGuess, std::vector<Texel::TexelInput> &data) {

        Definitions::ttConfig.do_transpositionTableEval = false;
        Definitions::ttConfig.do_transpositionTableEvalPawn = false;
        Definitions::ttConfig.do_transpositionTableQuiesce = false;

        Definitions::SetValue("pawnValue", std::to_string(100));
        Definitions::SetValue("knightValue", std::to_string(1000));
        Definitions::SetValue("bishopValue", std::to_string(1000));
        Definitions::SetValue("rookValue", std::to_string(1000));
        Definitions::SetValue("queenValue", std::to_string(1000));

        size_t errorBatchSize = 1024*64;
        Randomize(data, errorBatchSize);

        std::vector<TexelParam<ScoreType> > bestParam = initialGuess;
        bool improved = true;

        while (improved) {
            improved = false;

            std::vector<double> g = ComputeGradient(bestParam, data, errorBatchSize);

            double delta = 50;

            LOG(logINFO) << "Line search";

            double curE = E(data, errorBatchSize);
            double bestELS = curE;
            while (delta >= 2) { // stupid line search
                LOG(logINFO) << "Applying gradient, delta = " << delta;
                ///@todo check inf/sup
                for (size_t k = 0; k < bestParam.size(); ++k) {
                    ScoreType oldValue = bestParam[k]();
                    bestParam[k].Set(oldValue - ScoreType(delta * g[k]));
                }
                LOG(logINFO) << "Computing new error";
                curE = E(data, errorBatchSize);

                LOG(logINFO) << "LS : " << delta << " " << curE << " best was : " << bestELS;

                if (curE <= bestELS) {
                    improved = true;
                    bestELS = curE;
                    break;
                }
                else {
                    // reseting last iteration
                    for (size_t k = 0; k < bestParam.size(); ++k) {
                        ScoreType oldValue = bestParam[k]();
                        bestParam[k].Set(oldValue + ScoreType(delta * g[k]));
                    }
                }
                delta /= 2;
            }
            
            // randomize for next iteration
            Randomize(data, errorBatchSize);

            LOG(logINFO) << "Current values :";
            for (size_t k = 0; k < bestParam.size(); ++k) {
                LOG(logINFO) << bestParam[k].accessor << " " << bestParam[k]();
            }

        }
        return bestParam;
    }

}

void TexelTuning(const std::string & filename) {
    std::ifstream stream(filename);
    std::string line;

    std::vector<Texel::TexelInput> data;

    LOG(logINFO) << "Running texel tuning with file " << filename;

    int count = 0;
    while (std::getline(stream, line)){
        //std::cout << line << std::endl;
        nlohmann::json o;
        try {
            o = nlohmann::json::parse(line);
        }
        catch (...) {
            LOG(logFATAL) << "Cannot parse json " << line;
        }
        //std::cout << o << std::endl;
        std::string fen = o["fen"];
        Position * p = new Position(fen);
        if (std::abs(o["result"].get<int>()) < 800) {
            data.push_back(Texel::TexelInput(p, o["result"]));
        }
        ++count;
        if (count % 10000 == 0) {
            LOG(logINFO) << count << " position read";
        }
    }

    LOG(logINFO) << "Data size : " << data.size();

    std::vector<Texel::TexelParam<ScoreType> > guess;
    guess.push_back(Texel::TexelParam<ScoreType>("knightValue", 100, 2000));
    guess.push_back(Texel::TexelParam<ScoreType>("bishopValue", 100, 2000));
    guess.push_back(Texel::TexelParam<ScoreType>("rookValue", 100, 2000));
    guess.push_back(Texel::TexelParam<ScoreType>("queenValue", 100, 2000));
    //guess.push_back(Texel::TexelParam<ScoreType>("badBishopMalus", 100, 2000));
    //guess.push_back(Texel::TexelParam<ScoreType>("bishopPairBonus", 100, 2000));
    //guess.push_back(Texel::TexelParam<ScoreType>("knightPairMalus", 100, 2000));
    //guess.push_back(Texel::TexelParam<ScoreType>("rookPairMalus", 100, 2000));
    //guess.push_back(Texel::TexelParam<ScoreType>("isolatedPawnMalus", 100, 2000));
    //guess.push_back(Texel::TexelParam<ScoreType>("doublePawnMalus", 100, 2000));
    //guess.push_back(Texel::TexelParam<ScoreType>("bonusCandidatePassedPawn", 100, 2000));

    LOG(logINFO) << "Initial values :";
    for (size_t k = 0; k < guess.size(); ++k) {
        LOG(logINFO) << guess[k].accessor << " " << guess[k]();
    }

    std::vector<Texel::TexelParam<ScoreType> > optim = Texel::TexelOptimizeGD(guess, data);

    LOG(logINFO) << "Optimized values :";
    for (size_t k = 0; k < optim.size(); ++k) {
        LOG(logINFO) << optim[k].accessor << " " << optim[k]();
    }

    for (size_t k = 0; k < data.size(); ++k) {
        delete data[k].p;
    }
}

There is probably something stupid in that ...

Sven · Post by **Sven** » Tue Sep 04, 2018 10:34 pm

xr_a_y wrote: ↑Tue Sep 04, 2018 8:58 pm hmmm that's more or less what I do too, without success. Even starting from stupid piece value, the algorithm is stuck very soon ...

Here's the code [...]

There is probably something stupid in that ...

As far as I can see your algorithm stops as soon as you have calculated a gradient vector g that does not help to improve E. If that happens very quickly then maybe something is wrong with the gradient calculation?

In function ComputeGradient(), how does changing one parameter value (e.g. by doing "x0[k].Set(oldvalue + delta);") influence the calculation of E(), without the vector x being passed as an argument to E?

xr_a_y · Post by **xr_a_y** » Wed Sep 05, 2018 6:56 am

In Weini all parameters are accessible everywhere, the "x" vector is just some "accessor" to params values that are currently beeing tuned and changing then is done by the "Set" function. So when E call Sigmoid and then qsearch, everybody will use the updated paramaters (that would be an issue when I'll try multi-threaded tuning, I will need to associate parameters set to the thread context ; but that is not the issue for now).

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 10, 2018 9:58 pm

jdart wrote: ↑Thu Aug 30, 2018 3:42 am I am using batch gradient descent with ADAM. This method tunes all parameters simultaneously.

I actually don't use qsearch but a one-ply search during tuning.

10 million tuning positions x 750 or so parameters takes about 130 iterations to converge.

My tuner is multithreaded. On a 24 core machine tuning takes 1.5-2 hrs.

--Jon

Jon, Do you have that dataset with 10 million positions?
If you have an even bigger database of high quality it would be great to have too.

Daniel

jdart · Post by **jdart** » Thu Oct 11, 2018 3:57 pm

Jon, Do you have that dataset with 10 million positions?

I went to look for it, and it looks like it was lost when I did a recent machine upgrade. Too bad.

Here are a couple of data sets that I do have:

https://www.arasanchess.org/big3.epd.bz2 (about 5 million positions)

https://www.arasanchess.org/lichess-new-labeled.epd.bz2 (2.5 million positions, I think distinct from big3 - but not sure)

These are EPD format, with two comment fields. "c1" is the Arasan castling status, which you can ignore. "c2" is the game result (-1 for 0-1, 0.5 for 1/2-1/2, 1 for 1-0). Note: these are generally non-quiescent positions.

--Jon

Daniel Shawul · Post by **Daniel Shawul** » Thu Oct 11, 2018 7:04 pm

Thanks Jon, I have got them now.

I started filtering the positions for quietness but it seems most are quiet so I am going to use them they are.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Thu Oct 11, 2018 7:47 pm

xr_a_y wrote: ↑Thu Aug 30, 2018 12:16 am I am trying Texel tuning and have a question about expected speed of the method.

I seems Weini is able to run the qsearch needed for each position and each evaluation of the error in around 0.06 millisecond.

Let's say I have only 100 000 positions and want to optimize 10 parameters.
It will requiere let's say at least 100 000 x 10 x 100 qsearch, so 2h40min of computation.

The same thing for 1M positions would take a least one day.
For 10M positions it would take a least 10 days.

Is that the timing you also get ?

I belive I have the fastest tuining method for liniear evaluatio function -- which most are. It could also handle some non-linearities.
The way it works is you call eval() once for all your training postions and compute a Jacobian matrix.
After this initialization phase you never have to call eval() again during the training process -- making it super-duper fast.
I explained the method here: http://talkchess.com/forum3/viewtopic.p ... n&start=70
and the github source for the implemetation is in my signature.

Daniel

Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed

Re: Texel tuning speed