I will certainly try this out as well to see if this helps but for some reason, I am just not getting positive results with this. I am thinking I am missing something. I reread all the posts to this thread. Would someone mind shedding some light on my scenario? I am sorry if some of this is repetitive.
What I have read about is some use "end of PV" nodes, use quiet nodes, comments about not running qsearch and just using the eval, and comments about comparing my results against results of a better engine. All of this confuses me
and makes me think my process is wrong.
Based on what I have read on the process on how to test, this is what I know and have done:
First, My qsearch is a plain fail soft - no delta pruning, no TT, etc.
I have engine version 1 and version 2. They played 64k games. Version 2 is a bit better than version 1 as it has the pawn parameters I wish to test and tune.
From these games, I extracted 8+ million positions, excluded opening book positions and positions when mate was found. The fen was saved with the game result (0 for black won that game, 0.5 for draw, 1 for white won).
I had the engine compute a K value. All the engine did was load each FEN (and the game result), run qsearch, get the returned score, store it and the result into a vector, and once all fens were searched, I calculated a K with minimal E.
So far, at no point did I compare anything with the real scores from the real games that were played or any other scores from other engines.
Next, I used the gradient search (local search seems to be another name used) to tune a few values.
First, with K set (as discovered in a previous paragraph), I'd qsearch all the positions again with all the eval parameters at their "base" settings. The E determined here is now the best E.
Second, I'd have eval-param1 increased by 1, qsearch all the positions, and using all the new returned scores and the known K, I could determine a new E and if it is better than the best E.
Third, based on if a new best E was found or not, I'd either decrease eval-param1 or move onto the next eval-param while saving the new best E.
This would repeat until all parameters were tested in a round and there was no improvement on E. The final tuned parameters, I'd then set into the eval function as the new base/default values and then play the engine against a previous version but do not have a winning engine.
That's the process. Am I missing something? Should I compute K from the scores from the real games first? Should I only tune parameters when my engine loses to the older version? Etc.?
Thank you