AndrewGrant wrote: ↑Tue Nov 03, 2020 5:53 am
Karlo Bala wrote: ↑Tue Nov 03, 2020 5:39 am
Don't get me wrong, I didn't want to be ungrateful, it was just funny to me that the first position was already wrong.
If I had infinite computational resources, we could settle this overnight.
So some of those games are 1s, 2s, and 4s. What time contrl do you need to use in order to "ensure" that the replayed game has a "more accurate" end game result?
Perhaps it is possible to use a mixed approach. For example, instead of playing full games, just evaluate the position to some depth. If the evaluation differs too much from the game result consider the result of that position as suspicious. Later play games only for suspicious positions but with bigger time control or simply exclude that position.
I didn't completely understand from the paper what did you do, but it seems that you already have the search result for every position.
Of course, there are at least 2 main problems:
1. "wild" positions - it is not easy to filter them out. Does the end of PV guarantee a quiet position?
2. data set with different numbers of games that belong to W/L/D classes. I didn't count W/L/D numbers, but if they differ by much, tuning will be biased.
I have a few more ideas, and perhaps I'm going to filter out the Zurichess set. Up to now, I found the Zurichess set as the most appropriate for eval tuning. Played with dozens of different CNNs and got about a 75% success rate on the validation set. I found (by eye test) that some of the positions CNN failed to learn were simply wrong labeled, or heavy tactical positions.
Do you plan to publish your work in a scientific journal (or you have already published it)?