Here's the latest after another week of research, studying, and testing
.
From calculating the derivative, I am not sure if I should be calculating the dt of (result - Sigmoid(qs))^2 or 1/N*Sum(n=1,(result - Sigmoid(qs))^2). There are other sites that calculate derivatives and I do not seem to always get the same results for either of the above functions. So, I decided to stick with the last plan, which is following differentiation from first principles - f'(x) = (limit x->0) f(x + h) - f(x) / h where h is the delta for the parameter.
With this, I am able to calculate a dt of E w.r.t P (dtEP) but as this is a small fraction, figuring out a rate of change for the parameter P is not straightforward. After enough analysis, I tested scaling the dtEP by 10K and 100K, had a learning rate, and limited this change for control.
The whole process seems to work and I was very intrigued reviewing the output as this tuning method would seem to learn values even for a specific square in a PSQ
. Eventually, I had a set of tuned parameters to use. Unfortunately, the base version still wins by about 58% or more. I am not so sure what's going on. I have tested:
* Tuning just pieces
* Tuning pieces and psq (mg and eg and they end up no symmetrical)
* My psqs are all created by hand, so I created a set of psqs that were calculated, thus only a few parameters were exposed to the tuner. This was tested.
* I noted what direction a parameter wanted to go and I manually adjusted my values by a small amount and tested.
All tuned versions lose the same.
Someone asked in an earlier post how I tested. Hand tuning, I normally test base vs new, 4000 games with 4-moves in openings and 4000 games with 10-moves in opening with a set depth of 6 (and following Ed Schroder's ideas), the depth increases as pieces reduce. I would then test against other engines with a time control (like 0:02+1). This has been successful for me.
I have tried this same testing with the tuned/base engine tests. I have tried removing the depth constraints and put in a time control; still no positive winning rate for the tuned engine.
I have tested the tuning, for example, setting pieces 20 points lower than my base and 20 points higher than the tuned values - the tuner keeps coming back to the same tuned values. This seems to be a positive note.
I thought about the average error, being an average, if one position qscore is increased, the average E improves/decreases. If a single positions increases by 10 but 8 others decrease by 1, the average E will still decrease... but, this is 8 more bad positions. Is this bad? When counting the total of positions that were affected positively versus negatively, there is only about 500K difference (for the positive positions). That does not seem like enough to me to be successful improvements.
There's the latest. I am running some tests now as I write. I am not sure where to go next. I expect either my parameter values are already good or the math is off somewhere.