Python script for TTM

mar · Post by **mar** » Mon Oct 31, 2016 10:15 am

lucasart wrote:I think speed is key here, because the optimization process must run lots of iterations.

I see two potential problems: IO, FEN parsing/validation.

I wouldn't be surprised if the universal tuner ran order of magnitude slower than integrated native one, obviously people hate to waste time
so why wait for 1 parameter when you could do 10 in the same amount of time?

What comes to mind is batching; since you pre-split the fens to threads (I assume), you can send all fens to the engine only once, then tune many parameters/iterations.
This would reduce IO from tuner to engine to zero and would also get rid of FEN processing.
The drawback is slightly increased complexity on engine part.

Another thing which might be interesting to try is to use plain eval instead of qs (what Miguel did in Gaviota IIRC), by filtering out non-quiet positions.

lucasart · Post by **lucasart** » Tue Nov 01, 2016 1:32 am

mar wrote:
lucasart wrote:I think speed is key here, because the optimization process must run lots of iterations.
I see two potential problems: IO, FEN parsing/validation.

I wouldn't be surprised if the universal tuner ran order of magnitude slower than integrated native one, obviously people hate to waste time
so why wait for 1 parameter when you could do 10 in the same amount of time?

What comes to mind is batching; since you pre-split the fens to threads (I assume), you can send all fens to the engine only once, then tune many parameters/iterations.
This would reduce IO from tuner to engine to zero and would also get rid of FEN processing.
The drawback is slightly increased complexity on engine part.

Another thing which might be interesting to try is to use plain eval instead of qs (what Miguel did in Gaviota IIRC), by filtering out non-quiet positions.

Based on my results, and the ones from Ferdinand, it seems that we have 2 orders of magnitude speed difference. This huge! Of course there's IO, task switching, and FEN parsing, but that's not all, nor even most of the overhead.

Based on my experience coding a cutechess-cli in python, I'm pretty certain that 1 order of magnitude could be regained by doing the tuner in C++ instead of Python. We're just calculating qsearch, which runs at 1.5m/s in my case, so *everything* is overhead, and we need to kill overhead wherever possible.

On a side note, it's not so easy to call the qsearch like that, particularly with SMP. It's normally deeply nested, and many things must be setup before calling the QS. It was an interesting exercise, because it forced me to refactor my code better.

lucasart · Post by **lucasart** » Wed Nov 02, 2016 3:39 am

I'm now doing a proper fitting for lambda (using newton raphson method to minimize the quadratic error). That means the lambda before/after a patch is slightly different, always optimal: score = 1 / (1 + exp(-lambda * qsearch)).

I'm also guarding against over-fitting, by doing a 75/25 validation: optimize parameter on 75% sample, then validate with 25% sample.

Still no success. So far, every single tuning patch that improved the logistic fit was an elo loss.

I'll try with games from random starting positions. But I don't expect much from that.

Pio · Post by **Pio** » Wed Nov 02, 2016 10:59 am

Hi, I think you will get much better result if you try to minimize the absolute error.

/Pio

ZirconiumX · Post by **ZirconiumX** » Thu Nov 03, 2016 5:55 pm

I'm sure I'm going to get yelled at by statisticians everywhere - I am not a mathematician.

But thinking about this, what if, instead of fitting to the single game result, which may or may not have blunders, you perform monte-carlo playouts to get an empirical value for the winning probability for this position, and then fit to that instead of fitting to the game result?

Of course, this method would be significantly more compute-intensive, but perhaps it would overfit less?

Evert · Post by **Evert** » Thu Nov 03, 2016 7:03 pm

ZirconiumX wrote:I'm sure I'm going to get yelled at by statisticians everywhere - I am not a mathematician.

But thinking about this, what if, instead of fitting to the single game result, which may or may not have blunders, you perform monte-carlo playouts to get an empirical value for the winning probability for this position, and then fit to that instead of fitting to the game result?

Of course, this method would be significantly more compute-intensive, but perhaps it would overfit less?

I don't think it does anything for overfitting, but the idea is interesting in that it can give you a "fuzzy" game result, which could be more useful than a hard "0, 1/2, 1". For that to work though the monte-carlo playout needs to be a reasonable predictor of the result under optimal play, and there's the rub: I don't think monte-carlo playouts have so far proven to be very effective with chess.

In general, picking positions for this type of tuning is hard, and I'm still not entirely sure what the best way to do it is (which is part of why I haven't tried this myself). What you need are positions that a) represent what the engine encounters during search, and b) have a good correlation to the outcome of the game. In other words, positions from that branch of the tree where white bungled a rook are useless if the actual line that was played resulted in white winning. Conversely, even a root position is useless if the engine was winning, but blundered a piece later (I suppose it needs to be at a level where it doesn't do that for this method to give good results anyway, but weird things happen ever so often).

ZirconiumX · Post by **ZirconiumX** » Thu Nov 03, 2016 7:23 pm

Evert wrote:I don't think it does anything for overfitting, but the idea is interesting in that it can give you a "fuzzy" game result, which could be more useful than a hard "0, 1/2, 1".

An evaluation function is a heuristic, so perhaps heuristic values would be more useful than game theoretic values for positions the eval may never see in search.

For that to work though the monte-carlo playout needs to be a reasonable predictor of the result under optimal play, and there's the rub: I don't think monte-carlo playouts have so far proven to be very effective with chess.

This is the issue. I'm sure somebody could engineer a way around this, however.

In general, picking positions for this type of tuning is hard, and I'm still not entirely sure what the best way to do it is (which is part of why I haven't tried this myself). What you need are positions that a) represent what the engine encounters during search, and b) have a good correlation to the outcome of the game. In other words, positions from that branch of the tree where white bungled a rook are useless if the actual line that was played resulted in white winning. Conversely, even a root position is useless if the engine was winning, but blundered a piece later (I suppose it needs to be at a level where it doesn't do that for this method to give good results anyway, but weird things happen ever so often).

Yeah, my experience is that your engine already needs to be good to be improved, which is somewhat circular. I suppose it can't make up for knowledge the engine does not have, however.

lucasart · Post by **lucasart** » Fri Nov 04, 2016 12:11 am

lucasart wrote:I'll try with games from random starting positions. But I don't expect much from that.

Also fails miserably. I'm just finding more and more ways to lose elo while reducing the quadratic error to a logistic curve.

I have 2 more things left to try:
- fit with absolute error instead of quadratic. I really don't think this is going to make a difference, but let's give it a chance.
- use depth=2 instead of depth=0 (qsearch).

AlvaroBegue · Post by **AlvaroBegue** » Fri Nov 04, 2016 12:22 am

Would you like to try with the set of positions I am preparing? It's 1.34M positions taken from calls to the evaluation function of my program RuyDos, with results assigned by playing out from that position using SF6-vs-SF6 at 30 moves/second on a pretty fast computer.

I'll have it ready in a few hours (it's been computing for about a week), and I have some hope that it will be a good dataset to tune evaluation functions.

Are you interested?

lucasart · Post by **lucasart** » Sat Nov 05, 2016 1:32 am

- use depth=2 instead of depth=0 (qsearch).

This made no difference, apart from being much slower.

- fit with absolute error instead of quadratic.

We could be on to something here. It is the only one that refuted the series of elo losing patch that improved the regression in all other test cases. And it suggested a patch that seems to be working (though test is still running, still early to say).

Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM