lithander wrote: ↑Mon Jan 24, 2022 5:33 pm
Why only high quality games? The positions encountered and moves played by master players will only be a small fraction of what your engine is likely required to evaluate during search.
Huh, good point. Not sure why the need for low-quality games slipped my mind. That's likely what's causing a lot of issues. I'm only "teaching" Blunder examples of good patterns, but no examples of bad patterns, so it's not learning well bad squares for pieces, bad pawn structure, bad king safety, etc. At least not to a high enough degree to improve the evaluations knowledge.
Thanks, I'm going to make sure to try to account for this in my next experiment this weekend
lithander wrote: ↑Mon Jan 24, 2022 5:33 pm
I've never done it myself (also used Zurichess so far, as you know^^) but I'd try not to exclude non-quiet positions but instead just to make them quiet myself in exactly the way my engine does it: Call qsearch, get a PV, play the moves and use the resulting quiet position to train on. Because if my engine would encounter the original position that would be exactly what it'd do before calling eval.
Right, that's on my list of methods to try. In these experiments, I wanted to try the most basic approach first, which in my mind was just to exclude quiet positions. For the next couple of sessions, I wanted to try to take each FEN string and do a 2-3 depth search, and play the PV out, and save the resulting position.
lithander wrote: ↑Mon Jan 24, 2022 5:33 pm
...and that's why I never did that with MinimalChess and am waiting for (hopefully faster) Leorik instead. Yes, like you I want to get rid of the "flaw" of having tuned on external data but I don't want to waste excessive amounts of computation and time on it.
Yep. Although I think what you said is also true about not necessarily needing millions of positions to tune, as long as your dataset is high-quality. For a long time I got away with using only 400K positions using Zurichess's dataset, and I only got about ~25 Elo when I expanded that to 800K. So I think for my next session I'll only be using 800K.
lithander wrote: ↑Mon Jan 24, 2022 5:33 pm
I would probably, personally be okay to pay a few ELO's for the "100% by my own making" tag on my engine. But certainly not 153, haha.
Yep, me either. In fact, I'd like an Elo
gain, but perhaps that's too ambitious right now
lithander wrote: ↑Mon Jan 24, 2022 5:33 pm
I totally get your motivation to exclude Stockfish. I would even want to exclude any external source for the FENs e.g. where you used a set of PGN's from rebel I plan to play my own games. But that's like an endgoal. If you suspect that the accuracy of W/L/D is at fault then use stockfish, confirm or falsify your theory. Then act accordingly. Only the final result has to be Stockfish-free (of course it hasn't but I respect your goal) but you can always use it as a tool to setup a working process.
True. I'm approaching all of this with a very experimental mindset. I'm open to trying many, many different approaches to find an approach that balances originality and strength. So using Stockfish for WDL results will happen at some point down my line of experiments.
At the end of all of this, I'd like to write up a little "paper" documenting the whole process, so it might be helpful to future developers.