Centipawns is by no means a well-defined standard. When talking about the value of the pawn, is it a pawn on the second rank? On which file? Whats the value of an additional pawn? Etc. The standard doesn’t give any requirements to this “centipawn”-score, other than that the order of values is preserved (a better move according to the engine should get a higher score). As such I think it is the default of the standard and not the GUI or Engine. Given that there is no standardization to be expected on the protocol side I think the GUI could step in instead as an additional feature provided to the users.hgm wrote:This can be easily done, of course, although I wonder how useful it is. (I can devide by 2 mentally quite easily.) It can potentially cause a lot of confusion, when comparing scores of Stockfish with scores posted by other people, using other calibration.Edmund wrote:Something I have not read before would be to allow for per engine score and score-variance calibration.
The most streight forward one would be to allow for an engine like SF to set: f(score) = score /2
It seems that you want to cure a defect in the engine here by patching the GUI. I am not too keen on that. There is a well-defined standard for reporting engine scores (centi-Pawns), and if engines deviate from it, it should be considered an engine bug, and fixed in the engine. Trying to normalize it in the GUI sort of sanctions everyone to just do as he pleases, making an arbitrary mess of it, dumping the problem in the lap of the user. Not a good thing, and not worth facilitating, IMO.
A lot of this sounds like science fiction to me. Can such a multivariate regression work at all? How many games would have to be played with the same engine to learn this information? How can a GUI ever derive a renormalization of engine scores when it doesn't know what the scores would have to be? Or draw conclusions from the win rate if it doesn't know the strength of the opponent? I cannot imagine anything practical that would do this. But perhaps you have thought this out more than I have.Maybe one could make it more elaborate and allow for a multivariate regression equation (where the gui exposes a series of easily accessible parameters):
f(score, #blocked_pawns, material_white, material_black, etc.) = ...
next one could differentiate and add different functions for win, draw probability. furthermore one could add a function also taking depth, nodes searched and statistics about previous PVs to predict the score variability.
Eventually engine users would independently look for well predicting parameters running regressions over their game logs. These paramters can be exchanged online for the whole comunity to benefit.
I see the following gains:
a) we get more information out than just some arbitrary centipawn score
b) one can compare different engines output better
c) the GUI can correct for engine differences and engine specific score informations e.g. the point of view of the scores, Depth to conversion, different types of draw, etc.
The system would be evolutionary with gradually improving predictors as more people get involved.
I would have seen this as a two-step approach. First the form is specified and the coefficients get estimated. And second the GUI just takes as an option for each engine an equation that tells it how to transform the score exposing certain parameters.
The first step could happen independently of the GUI. One takes a large set of games with engine evaluations, using a tool one collects per position data and finally runs a regression on game outcome (or better expected outcome given the elo difference).