Ethereal Tuning - Data Dump

Karlo Bala · Post by **Karlo Bala** » Mon Nov 02, 2020 12:15 pm

AndrewGrant wrote: ↑Sat Oct 10, 2020 11:36 am Tuning Paper has been pushed to Github (Still a draft, but not finishing it)
https://github.com/AndyGrant/Ethereal/b ... Tuning.pdf

3x Sets of ~10M positions of <fen> <result>

1x Sets of ~12.5M postitions from FRC of <fen> <result>

Code to extract positions from PGNs for building books

Not sure how long any of these links will stay alive. About a dozen authors have used parts of the datasets I've just shared, and have found massive gains along the way. I'm confident that everyone can gain elo from them. Even Stockfish could gain elo if my paper is implemented and relative data is generated.

Thanks for sharing!

If read it correctly, the very first position in your book is labeled as a win for black. Should be a dead draw

[d]8/3kp3/1p3bp1/8/1B1P1PK1/4P3/8/8 b - - 0 40 [0.0]

AndrewGrant · Post by **AndrewGrant** » Mon Nov 02, 2020 10:06 pm

Karlo Bala wrote: ↑Mon Nov 02, 2020 12:15 pm If read it correctly, the very first position in your book is labeled as a win for black. Should be a dead draw

[d]8/3kp3/1p3bp1/8/1B1P1PK1/4P3/8/8 b - - 0 40 [0.0]

Well, feel free to go through all 45M by hand and correct the results for me

I reference in the paper that by doing PV playouts, one distances the Result from reality.
I would bet good money If I played 45M games, one from each position, I would get a better dataset.

Karlo Bala · Post by **Karlo Bala** » Tue Nov 03, 2020 5:39 am

AndrewGrant wrote: ↑Mon Nov 02, 2020 10:06 pm
Karlo Bala wrote: ↑Mon Nov 02, 2020 12:15 pm If read it correctly, the very first position in your book is labeled as a win for black. Should be a dead draw

[d]8/3kp3/1p3bp1/8/1B1P1PK1/4P3/8/8 b - - 0 40 [0.0]
Well, feel free to go through all 45M by hand and correct the results for me
I reference in the paper that by doing PV playouts, one distances the Result from reality.
I would bet good money If I played 45M games, one from each position, I would get a better dataset.

You gave me a good idea. A million games to play are too much, but a few thousand are reachable...

P.S.
Don't get me wrong, I didn't want to be ungrateful, it was just funny to me that the first position was already wrong.

AndrewGrant · Post by **AndrewGrant** » Tue Nov 03, 2020 5:53 am

Karlo Bala wrote: ↑Tue Nov 03, 2020 5:39 am Don't get me wrong, I didn't want to be ungrateful, it was just funny to me that the first position was already wrong.

If I had infinite computational resources, we could settle this overnight.

So some of those games are 1s, 2s, and 4s. What time contrl do you need to use in order to "ensure" that the replayed game has a "more accurate" end game result?

Karlo Bala · Post by **Karlo Bala** » Tue Nov 03, 2020 8:27 am

AndrewGrant wrote: ↑Tue Nov 03, 2020 5:53 am
Karlo Bala wrote: ↑Tue Nov 03, 2020 5:39 am Don't get me wrong, I didn't want to be ungrateful, it was just funny to me that the first position was already wrong.
If I had infinite computational resources, we could settle this overnight.

So some of those games are 1s, 2s, and 4s. What time contrl do you need to use in order to "ensure" that the replayed game has a "more accurate" end game result?

Perhaps it is possible to use a mixed approach. For example, instead of playing full games, just evaluate the position to some depth. If the evaluation differs too much from the game result consider the result of that position as suspicious. Later play games only for suspicious positions but with bigger time control or simply exclude that position.

I didn't completely understand from the paper what did you do, but it seems that you already have the search result for every position.

Of course, there are at least 2 main problems:

1. "wild" positions - it is not easy to filter them out. Does the end of PV guarantee a quiet position?

2. data set with different numbers of games that belong to W/L/D classes. I didn't count W/L/D numbers, but if they differ by much, tuning will be biased.

I have a few more ideas, and perhaps I'm going to filter out the Zurichess set. Up to now, I found the Zurichess set as the most appropriate for eval tuning. Played with dozens of different CNNs and got about a 75% success rate on the validation set. I found (by eye test) that some of the positions CNN failed to learn were simply wrong labeled, or heavy tactical positions.

Do you plan to publish your work in a scientific journal (or you have already published it)?

Ethereal Tuning - Data Dump

Re: Ethereal Tuning - Data Dump

Re: Ethereal Tuning - Data Dump

Re: Ethereal Tuning - Data Dump

Re: Ethereal Tuning - Data Dump

Re: Ethereal Tuning - Data Dump