- Every change I make is recorded in version control, and all testing results can be associated with a git commit-id that allows me to see exactly what the code looked like when I ran the tests.
- I test my changes using several thousand games against multiple at 1s+0.1s time control. Cutechess-cli makes this very easy, and its concurrency feature simplifies things a lot.
- Bayeselo makes it very easy to quantify the effect of my changes, complete with likelihood-of-superiority numbers.
- PGN-Extract lets me look at the kinds of games that Daydreamer tends to win or lose and lets me home in on the areas that need work. For instance, it's easy to isolate just the games that feature pawn endgames, or just the games that end in 20 moves or fewer.
It's certainly possible to create an engine that's weak at fast time controls and strong at longer time controls, but I've never seen an engine that's extremely strong at fast time controls that's not strong in other scenarios as well. When you combine the large number of games from high speed testing with a reliable way to sort through those games and extract useful information about engine strengths and weaknesses, you have an incredibly powerful tool to improve your engine, and it's completely practical to run on modest desktop hardware.