Are Aspiration Windows Worthless?

No4b · Post by **No4b** » Wed Dec 23, 2020 10:45 pm

I tried to implement Aspiration windows in Drofa two times, both were unsuccessful.
But on the first try i still had TT bug in the engine that made aspiration search very bad (plagued TT with worthless entries basically).
In the second try engine searched much less nodes, but still was ~-7 elo.
I suppose the Devil in the details here. You have to get everything right in order for it to work.

i`ll try one or two more times to implement this.
With like every top engine using it I more or less sure it is working technique, but the tricky one.

Ras · Post by **Ras** » Thu Dec 24, 2020 12:23 pm

emadsen wrote: ↑Mon Dec 21, 2020 9:17 pmOtherwise I feel I'm subverting the correctness of the alpha / beta algorithm.

With pruning and reductions (especially LMR), that has gone out of the window long since anyway.

emadsen · Post by **emadsen** » Thu Dec 24, 2020 6:16 pm

Ras wrote: ↑Thu Dec 24, 2020 12:23 pm
emadsen wrote: ↑Mon Dec 21, 2020 9:17 pmOtherwise I feel I'm subverting the correctness of the alpha / beta algorithm.
With pruning and reductions (especially LMR), that has gone out of the window long since anyway.

Is that a bad pun? Window, ha ha. Well, there never was correctness anyway because that assumes perfect static eval which is only true for draw-by-rule, stalemate, and checkmate. I see your point though. I just don’t see any advantage in aspiration windows over what I already get from PVS. Whereas I see a massive advantage in LMR.

abulmo2 · Post by **abulmo2** » Fri Dec 25, 2020 1:43 am

I just run some experimentations that strongly disagree with your findings.
The aspiration windows algorithm is definitely worth on my engines. I got the following results, with self play using SPRT stop condition:
- Dumb : +54.2 +/- 7.3 Elo (834 games)
- Amoeba : +131.6 +/- 12.2 Elo (260 games)
Because of self play and SPRT, the Elo differences are probably exagerated, but obviously significant.
Are you sure your implementation is correct and optimal?

Steve Maughan · Post by **Steve Maughan** » Fri Dec 25, 2020 11:15 am

I've never had any luck with aspiration windows for Maverick. I've always assumed it was down to a poorly tuned evaluation function and generally weak-ish engine.

I've always thought that when searching the first move, the hash table would guide the search down the previous PV and the window would quickly close.

One point: do you have a fixed width for your window after each research, or are you gradually opening the window? For example...

First search: alpha = pv_score - 25; beta = pv_score + 25
Second search on fail high: alpha = pv_score - 25; beta = pv_score + 125
Third search on fail high: alpha = pv_score - 25; beta = pv_score + 300
Fourth search on fail high: alpha = pv_score - 25; beta = +inf

Best regards,

Steve

Uri Blass · Post by **Uri Blass** » Fri Dec 25, 2020 6:28 pm

abulmo2 wrote: ↑Fri Dec 25, 2020 1:43 am I just run some experimentations that strongly disagree with your findings.
The aspiration windows algorithm is definitely worth on my engines. I got the following results, with self play using SPRT stop condition:
- Dumb : +54.2 +/- 7.3 Elo (834 games)
- Amoeba : +131.6 +/- 12.2 Elo (260 games)
Because of self play and SPRT, the Elo differences are probably exagerated, but obviously significant.
Are you sure your implementation is correct and optimal?

If you want to test rating difference then SPRT is not the right test and you need to use fixed number of games.

emadsen · Post by **emadsen** » Sat Dec 26, 2020 5:28 pm

abulmo2 wrote: ↑Fri Dec 25, 2020 1:43 am I just run some experimentations that strongly disagree with your findings... Are you sure your implementation is correct and optimal?

Interesting. Thank you Richard for running these tests. I always am willing to admit the possibility I screwed up something in my code.

Steve Maughan wrote: ↑Fri Dec 25, 2020 11:15 amI've never had any luck with aspiration windows for Maverick. I've always assumed it was down to a poorly tuned evaluation function and generally weak-ish engine... Do you have a fixed width for your window after each research, or are you gradually opening the window?

I tried gradually opening the window by +/- 25, 50, 100, 200, 500, etc but eventually settled on +/- 100, 500, infinite.

abulmo2 · Post by **abulmo2** » Sat Dec 26, 2020 6:35 pm

Uri Blass wrote: ↑Fri Dec 25, 2020 6:28 pm
abulmo2 wrote: ↑Fri Dec 25, 2020 1:43 am I just run some experimentations that strongly disagree with your findings.
The aspiration windows algorithm is definitely worth on my engines. I got the following results, with self play using SPRT stop condition:
- Dumb : +54.2 +/- 7.3 Elo (834 games)
- Amoeba : +131.6 +/- 12.2 Elo (260 games)
Because of self play and SPRT, the Elo differences are probably exagerated, but obviously significant.
Are you sure your implementation is correct and optimal?
If you want to test rating difference then SPRT is not the right test and you need to use fixed number of games.

I ran a gauntlet test with Dumb (aspiration on/off) and found +50.9 Elo (+/- 12, 100 games × 19 opponents) in favour of the engine with the aspiration windows. So the result is on par with the SPRT.

Code: Select all

  # PLAYER                : RATING  ERROR   POINTS  PLAYED    (%)
   1 DiscoCheck 3.7.1      : 2596.8   44.8    163.0     200   81.5%
   2 Glaurung 2.2          : 2548.6   39.5    154.0     200   77.0%
   3 arasan-15.6           : 2472.6   36.6    137.0     200   68.5%
   4 Mini Rodent 1.0       : 2456.6   35.1    133.0     200   66.5%
   5 Zappa 1.1             : 2437.2   35.5    128.0     200   64.0%
   6 Cheese 1.7 64 bits    : 2412.8   34.8    121.5     200   60.8%
   7 Cyrano 0.6b17         : 2407.2   33.8    120.0     200   60.0%
   8 Fruit 2.1             : 2389.1   33.8    115.0     200   57.5%
   9 dumb-1.6              : 2361.3   12.2   1088.5    1900   57.3%
  10 EXchess v6.50b        : 2360.5   33.7    107.0     200   53.5%
  11 Sloppy-0.2.2          : 2325.2   34.1     97.0     200   48.5%
  12 dumb-1.6 no-AW        : 2310.4   12.1    976.5    1900   51.4%
  13 Fridolin 2.00         : 2253.3   34.7     77.0     200   38.5%
  14 Pepito_v1.59          : 2211.1   36.8     66.0     200   33.0%
  15 Yace Paderborn        : 2188.7   37.3     60.5     200   30.2%
  16 OliThink 5.3.2        : 2184.5   36.9     59.5     200   29.8%
  17 amundsen              : 2144.7   39.1     50.5     200   25.2%
  18 Fruit 1.0             : 2135.3   39.4     48.5     200   24.2%
  19 Jazz 501              : 2094.5   42.8     40.5     200   20.2%
  20 phalanx               : 2091.8   41.9     40.0     200   20.0%
  21 beowulf               : 1917.9   61.7     17.0     200    8.5%

White advantage = 0.00
Draw rate (equal opponents) = 50.00 %

PS: sorry for using old version of some engines, the purpose is not to test them, but to test my own program.

Ferdy · Post by **Ferdy** » Sun Dec 27, 2020 8:20 am

Tested Deuterium's aspwin, have not tested it for a long time. The version with aspwin won.

TC 15s+100ms

Code: Select all

Score of Deuterium_aw vs Deuterium: 89 - 60 - 155  [0.548] 304
...      Deuterium_aw playing White: 49 - 23 - 80  [0.586] 152
...      Deuterium_aw playing Black: 40 - 37 - 75  [0.510] 152
...      White vs Black: 86 - 63 - 155  [0.538] 304
Elo difference: 33.2 +/- 27.4, LOS: 99.1 %, DrawRatio: 51.0 %

The algo is simple, if there is an early sign of score instability reset the bounds to its original value as early as possible.

It started with alpha = -inf, beta = +inf

* If it fails low, set alpha to score - 100, beta = score, but if the score is already losing or winning reset alpha/beta to -inf/+inf, then research meaning use the previous iteration depth.
* If it fails high, set beta to score + 100, alpha = score, but if the score is already losing or winning reset alpha/beta to -inf/+inf, then research.
* Otherwise, set alpha = score - 30, beta = score + 30, no research just continue with the next iteration depth.

Next:
* If it fails low and previous score was low or high then reset alpha/beta to -inf/+inf, then research. However if the score is already losing or winning reset alpha/beta to -inf/+inf, then research.
* If it fails high and previous score was low or high then reset alpha/beta to -inf/+inf, then research. However if the score is already losing or winning reset alpha/beta to -inf/+inf, then research.
So in summary if there is successive lows low/low or successive highs high/high, or alternate high/low or low/high, then reset alpha/beta to -inf/+inf.

I call 100 as BadWindow and 30 as GoodWindow.

I am trying to tune these two params with optuna optimizer at 100 games per trial for 100 trials at TC 15s+50ms to see if the optimizer can improve it.

Code: Select all

python -u tuner.py --study-name deu_aspwindow_opt --sampler name=tpe --engine ./engines/deuterium/deuterium_17.exe --initial-best-value 0.55 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --opening-format epd --input-param "{'AspWindowGood': {'default':30, 'min':5, 'max':100, 'step':1}, 'AspWindowBad': {'default':100, 'min':5, 'max':200, 'step':1}}" --games-per-trial 100 --trials 100 --base-time-sec 15 --inc-time-sec 0.05 --pgn-output deu_aspwindow_opt.pgn --threshold-pruner result=0.25 --plot

The best param it could find so far after 6 trials with 53% score from a 100-game match against the default or init param is:

Code: Select all

2020-12-27 14:58:05,825 | INFO  | init param: {'AspWindowBad': 100, 'AspWindowGood': 30}
2020-12-27 15:05:29,205 | INFO  | study best param: {'AspWindowBad': 171, 'AspWindowGood': 42}
2020-12-27 15:05:29,206 | INFO  | study best value: 0.53

Ferdy · Post by **Ferdy** » Sun Dec 27, 2020 2:08 pm

Ferdy wrote: ↑Sun Dec 27, 2020 8:20 am Tested Deuterium's aspwin, have not tested it for a long time. The version with aspwin won.

TC 15s+100ms
Code: Select all
Score of Deuterium_aw vs Deuterium: 89 - 60 - 155  [0.548] 304
...      Deuterium_aw playing White: 49 - 23 - 80  [0.586] 152
...      Deuterium_aw playing Black: 40 - 37 - 75  [0.510] 152
...      White vs Black: 86 - 63 - 155  [0.538] 304
Elo difference: 33.2 +/- 27.4, LOS: 99.1 %, DrawRatio: 51.0 %
The algo is simple, if there is an early sign of score instability reset the bounds to its original value as early as possible.

It started with alpha = -inf, beta = +inf

* If it fails low, set alpha to score - 100, beta = score, but if the score is already losing or winning reset alpha/beta to -inf/+inf, then research meaning use the previous iteration depth.
* If it fails high, set beta to score + 100, alpha = score, but if the score is already losing or winning reset alpha/beta to -inf/+inf, then research.
* Otherwise, set alpha = score - 30, beta = score + 30, no research just continue with the next iteration depth.

Next:
* If it fails low and previous score was low or high then reset alpha/beta to -inf/+inf, then research. However if the score is already losing or winning reset alpha/beta to -inf/+inf, then research.
* If it fails high and previous score was low or high then reset alpha/beta to -inf/+inf, then research. However if the score is already losing or winning reset alpha/beta to -inf/+inf, then research.
So in summary if there is successive lows low/low or successive highs high/high, or alternate high/low or low/high, then reset alpha/beta to -inf/+inf.

I call 100 as BadWindow and 30 as GoodWindow.

I am trying to tune these two params with optuna optimizer at 100 games per trial for 100 trials at TC 15s+50ms to see if the optimizer can improve it.
Code: Select all
python -u tuner.py --study-name deu_aspwindow_opt --sampler name=tpe --engine ./engines/deuterium/deuterium_17.exe --initial-best-value 0.55 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --opening-format epd --input-param "{'AspWindowGood': {'default':30, 'min':5, 'max':100, 'step':1}, 'AspWindowBad': {'default':100, 'min':5, 'max':200, 'step':1}}" --games-per-trial 100 --trials 100 --base-time-sec 15 --inc-time-sec 0.05 --pgn-output deu_aspwindow_opt.pgn --threshold-pruner result=0.25 --plot
The best param it could find so far after 6 trials with 53% score from a 100-game match against the default or init param is:
Code: Select all
2020-12-27 14:58:05,825 | INFO  | init param: {'AspWindowBad': 100, 'AspWindowGood': 30}
2020-12-27 15:05:29,205 | INFO  | study best param: {'AspWindowBad': 171, 'AspWindowGood': 42}
2020-12-27 15:05:29,206 | INFO  | study best value: 0.53

I stop the optimization after 40 trials (can be resumed). It came up with the best param below found at 17th trial.

Code: Select all

2020-12-27 19:11:54,617 | INFO  | study best param: {'AspWindowBad': 32, 'AspWindowGood': 29}
2020-12-27 19:11:54,625 | INFO  | study best value: 0.550625
2020-12-27 19:11:54,632 | INFO  | study best trial number: 17

Performance score history.

Run a verification match of 1000 games at TC 15s+100ms. The optimized version using {'AspWindowBad': 32, 'AspWindowGood': 29} won by +7 games vs the default {'AspWindowBad': 100, 'AspWindowGood': 30}

Code: Select all

Score of Deuterium_17_opt vs Deuterium_17: 219 - 212 - 569  [0.503] 1000
...      Deuterium_17_opt playing White: 125 - 96 - 280  [0.529] 501
...      Deuterium_17_opt playing Black: 94 - 116 - 289  [0.478] 499
...      White vs Black: 241 - 190 - 569  [0.525] 1000
Elo difference: 2.4 +/- 14.1, LOS: 63.2 %, DrawRatio: 56.9 %

An ideal games per trial should have been 1000 or more to get more coverage of the starting positions. The optimization done above is only 100 games per trial.

Are Aspiration Windows Worthless?

Re: Are Aspiration Windows Worthless?

Re: Are Aspiration Windows Worthless?

Re: Are Aspiration Windows Worthless?

Re: Are Aspiration Windows Worthless?

Re: Are Aspiration Windows Worthless?

Re: Are Aspiration Windows Worthless?

Re: Are Aspiration Windows Worthless?

Re: Are Aspiration Windows Worthless?

Re: Are Aspiration Windows Worthless?

Re: Are Aspiration Windows Worthless?