Thanks Erik for taking the time to make tests of your own!
emadsen wrote: ↑Tue Mar 02, 2021 11:49 pm
I disagree with #2. In my experience, fixed depth games cannot accurately measure engine strength, either in a gauntlet against other engines or in a self-testing match.
My simple engine just does iterative deepening until it runs out of time. If I limit depth and not time all it changes is that I pretend it ran out of time after reaching a specific iteration of the search. Now, with no further code-changes it will always produce the same result on a slow or fast computer. It will also produce the same results independent on how performant your implementation is.
Adding a very inefficient but correct Quiescence implementation will provide a strength improvement under these testing conditions that is an upper bound on the gain you can expect under real conditions. Testing my Qsearch that way I saw an improvement of well over 100 Elo. I can't say exactly how much because, as you say: the exact number would have meant nothing so I didn't write it down.
But it was high enough to show that there's a benefit which is just negated by the fact that my quiesence implementation is slowing search down so much that I didn't see it.
emadsen wrote: ↑Tue Mar 02, 2021 11:49 pm
#3 makes no sense to me. An engine that evaluates positions solely by material is strategically stupid (not knowing how to develop pieces) and ignorant of the danger posed by passed pawns and attacks on squares near the king. However, if it has a correct quiescence search implementation, it can find discovered checks, forks, attacks on overloaded pieces, etc... all of which win material.
I took MadChess 3.0 Beta (2570 Elo), removed all eval terms except material, and ran a gauntlet tournament against other engines of varying strength, from TSCP 1.81 (1751 Elo) to Sungorus 1.4 (2311 Elo) at 2m+1s time control. I got this result:
Code: Select all
MadChess 3.0 Beta, Material Only, No Quiescence = 1746 Elo
MadChess 3.0 Beta, Material Only, With Quiescence = 1915 Elo
This suggests a correct implementation of quiescence search is worth 169 Elo. I only had time for a quick test so the error bars are like +/- 50 Elo.
MadChess 3.0 Beta has a fairly fast bitboard implementation, doesn't allocate memory during search, has a hash table for score cutoffs and best moves, orders quiet moves by history heuristic score, implements null move and LMR (reducing between 3 and 6 ply), etc...
So perhaps my experience differs from yours due to me coming at the problem from the opposite direction as you: I've stripped down a mature engine. You're building up a new engine. Perhaps the difference between me finding quiescence search to be worth 169 Elo and you and Marcel finding it not worth much at all (both for material-only eval) can be explained away by the non-linear manner in which chess engine features combine. But, in my opinion, I don't think so. An engine with a correct quiescence search implementation should be stronger than one without it, even if it evaluates only material, because it's less vulnerable to hanging pieces due to the horizon effect.
Of course if you pit both versions of the engine against competitors that are > 300 Elo points stronger, it's likely to lose so often that quiescence search doesn't matter. But when pitted against nearly equal competitors, the version with quiescence search should be measurably stronger.
Marcel also did the same test as you and stripped down his engine:
mvanthoor wrote: ↑Thu Feb 25, 2021 6:36 pm
Rustic Alpha 1: 1680 Elo
Rustic PASTA: 1200 Elo (without PST, -480 Elo)
Rustic YOLO: 1095 Elo (without PST, without QSearch, -105 Elo)
And later he added: Not having QSearch loses +/- 400 Elo.
Which means adding PSTs alone is only worth ~200 Elo. Adding QSearch alone is only worth ~100 ELO.
But if you use them together you get more than the sum of it's parts. You get a huge compound benefit of 600 ELO.
That's a significant synergy!
You get more out of QSearch then Marcel did, which may be due to the fact that your stripped down engine still plays at a higher strength then ours do with all features enabled!
Who knows what other synergies are still at play?
And me? After adding PSTs I saw a huge jump in playing strength as well. Depending on what PSTs I use I'm now between 180 and 80 ELO behind Rustic. (Tested with 1000 games each so the numbers are somewhat accurate) I'm currently looking for ways to optimize the QSearch and move ordering to close that gap a little more.
I'll provide my measurements as soon as I'm done with refactoring!