I wanted to map the eval of the engines to the expected result in the game (equal opponents). I had to trim the PGN, because it became apparent that the mapping for opening/middlegame is different from that in the endgame. So, the games (and their moves) are trimmed to move 40. Endgames maybe will be treated separately later, as Ferdinand upgraded again his PGN tool.
I had an empiric model for the mapping:
Expected Score = (1+tanh[eval/a])/2 for equal opponents.
Here a is an empirical parameter, different for each engine. With that model now, I was unable to fit well the data, and the fitting model has one more parameter:
Expected Score = (1+tanh[eval^b/a])/2 for equal opponents.
Now there are 2 empirical parameters, a and b, but they fit the experimental datapoints VERY well for both Komodo 8 and SF 6.
1/ Komodo 8

The blue line and dots are experimental data. The red line is the fitted model.
The fit is: Expected Score = (1+tanh[eval^1.757/1.313])/2
2/ SF 6

The blue line and dots are experimental data. The red line is the fitted model.
The fit is: Expected Score = (1+tanh[eval^1.4707/1.7645])/2
3/ Comparison of expected scores for Komodo 8 and SF 6

The largest difference seems to be between eval = 1.0 to eval = 2.0. For eval = 1.5 Komodo 8 has an expected score of 96%, SF 6 a score of 88%, so, trice the probability something goes wrong with SF 6 compared to Komodo 8 at eval = 1.5.



