How much ELO should I expect to gain from killer moves?

algerbrex · Post by **algerbrex** » Fri Jul 16, 2021 3:14 am

I'm currently in the process of rebuilding Blunder from the ground up, as I wasn't happy with the code structure and it was riddled with too many arcane bugs.

After a week or so of work, from my testing against MinimalChess 2.0 and Shallow Blue 1.1.0, my engine is now around ~1500, with alpha-beta pruning, quiescence search, and piece-square tables. And my plan going forward is to add one feature at a time, test the hell out if to make sure it works, and then add another feature.

So with this established, I added killer move ordering into Blunder and started testing it against the previous version of itself without it. As a preliminary, I ran 50 games with a 1 min time format, and the newer version actually seemed to perform slightly worse (the outputs from cutechess):

Code: Select all

Score of Blunder 1.1.0 vs Blunder 1.0.0 : 5 - 9 - 36 [0.460]
...      Blunder 1.0.0 playing White: 4 - 3 - 18  [0.520] 25
...      Blunder 1.0.0 playing Black: 1 - 6 - 18  [0.400] 25
...      White vs Black: 10 - 4 - 36  [0.560] 50
Elo difference: -27.9 +/- 51.0, LOS: 14.3 %, DrawRatio: 72.0 %
50 of 50 games finished.

Of course, this is only 50 games, but I do have some questions. What kind of ELO gain should I be expecting with killer moves added? (I know every engine is different, I'm just more so asking generally). And depending on the minuteness of the ELO gain, how many games would I need to run to see a result?

What I find odd about the killer moves being added as well is that, from my testing, it speed-up the engine by a very sizeable amount by pruning huge chunks of the game tree, which allowed it to search a bit deeper than my previous version. This seemed great to me and it should increase the ELO, no?

Also, with regards to testing, I obviously need more tests, so I'm going to be using a sub-30-second time control. In this kind of control, all the engines I'm testing with are searching to around a depth of 4-5 each move, occasionally 6-10 very, very late in the endgame. In slower time controls, my engine usually searches to around a depth of 6-8 for the middle game and 10-15 towards the endgame. Would these lower search depths affect my results?

pedrojdm2021 · Post by **pedrojdm2021** » Fri Jul 16, 2021 8:13 am

algerbrex wrote: ↑Fri Jul 16, 2021 3:14 am I'm currently in the process of rebuilding Blunder from the ground up, as I wasn't happy with the code structure and it was riddled with too many arcane bugs.

After a week or so of work, from my testing against MinimalChess 2.0 and Shallow Blue 1.1.0, my engine is now around ~1500, with alpha-beta pruning, quiescence search, and piece-square tables. And my plan going forward is to add one feature at a time, test the hell out if to make sure it works, and then add another feature.

So with this established, I added killer move ordering into Blunder and started testing it against the previous version of itself without it. As a preliminary, I ran 50 games with a 1 min time format, and the newer version actually seemed to perform slightly worse (the outputs from cutechess):
Code: Select all
Score of Blunder 1.1.0 vs Blunder 1.0.0 : 5 - 9 - 36 [0.460]
...      Blunder 1.0.0 playing White: 4 - 3 - 18  [0.520] 25
...      Blunder 1.0.0 playing Black: 1 - 6 - 18  [0.400] 25
...      White vs Black: 10 - 4 - 36  [0.560] 50
Elo difference: -27.9 +/- 51.0, LOS: 14.3 %, DrawRatio: 72.0 %
50 of 50 games finished.
Of course, this is only 50 games, but I do have some questions. What kind of ELO gain should I be expecting with killer moves added? (I know every engine is different, I'm just more so asking generally). And depending on the minuteness of the ELO gain, how many games would I need to run to see a result?

What I find odd about the killer moves being added as well is that, from my testing, it speed-up the engine by a very sizeable amount by pruning huge chunks of the game tree, which allowed it to search a bit deeper than my previous version. This seemed great to me and it should increase the ELO, no?

Also, with regards to testing, I obviously need more tests, so I'm going to be using a sub-30-second time control. In this kind of control, all the engines I'm testing with are searching to around a depth of 4-5 each move, occasionally 6-10 very, very late in the endgame. In slower time controls, my engine usually searches to around a depth of 6-8 for the middle game and 10-15 towards the endgame. Would these lower search depths affect my results?

As far as i know Killer heuristic/history heuristic is mosly used to provide a GOOD move ordering that will help you prune more nodes as early as possible.

in the chessprogramming wiki site it mentions that it can gain some elo points, but don't expect too much, as i said before, think it as a feature to improve perfomance mainly.

But in the end if you have less nodes in higher depth's it will allow you to search deeper in the tree, and that will be a significant ELO gain.

Rhe feature that will guarantee you to gain a hugue elo boost is the "Tapered Evaluation" even only that feature alone implemented in your engine's evaluation you can have a ~2000 elo points engine.

algerbrex · Post by **algerbrex** » Fri Jul 16, 2021 9:42 am

pedrojdm2021 wrote: ↑Fri Jul 16, 2021 8:13 am
algerbrex wrote: ↑Fri Jul 16, 2021 3:14 am I'm currently in the process of rebuilding Blunder from the ground up, as I wasn't happy with the code structure and it was riddled with too many arcane bugs.

After a week or so of work, from my testing against MinimalChess 2.0 and Shallow Blue 1.1.0, my engine is now around ~1500, with alpha-beta pruning, quiescence search, and piece-square tables. And my plan going forward is to add one feature at a time, test the hell out if to make sure it works, and then add another feature.

So with this established, I added killer move ordering into Blunder and started testing it against the previous version of itself without it. As a preliminary, I ran 50 games with a 1 min time format, and the newer version actually seemed to perform slightly worse (the outputs from cutechess):
Code: Select all
Score of Blunder 1.1.0 vs Blunder 1.0.0 : 5 - 9 - 36 [0.460]
...      Blunder 1.0.0 playing White: 4 - 3 - 18  [0.520] 25
...      Blunder 1.0.0 playing Black: 1 - 6 - 18  [0.400] 25
...      White vs Black: 10 - 4 - 36  [0.560] 50
Elo difference: -27.9 +/- 51.0, LOS: 14.3 %, DrawRatio: 72.0 %
50 of 50 games finished.
Of course, this is only 50 games, but I do have some questions. What kind of ELO gain should I be expecting with killer moves added? (I know every engine is different, I'm just more so asking generally). And depending on the minuteness of the ELO gain, how many games would I need to run to see a result?

What I find odd about the killer moves being added as well is that, from my testing, it speed-up the engine by a very sizeable amount by pruning huge chunks of the game tree, which allowed it to search a bit deeper than my previous version. This seemed great to me and it should increase the ELO, no?

Also, with regards to testing, I obviously need more tests, so I'm going to be using a sub-30-second time control. In this kind of control, all the engines I'm testing with are searching to around a depth of 4-5 each move, occasionally 6-10 very, very late in the endgame. In slower time controls, my engine usually searches to around a depth of 6-8 for the middle game and 10-15 towards the endgame. Would these lower search depths affect my results?
As far as i know Killer heuristic/history heuristic is mosly used to provide a GOOD move ordering that will help you prune more nodes as early as possible.

in the chessprogramming wiki site it mentions that it can gain some elo points, but don't expect too much, as i said before, think it as a feature to improve perfomance mainly.

But in the end if you have less nodes in higher depth's it will allow you to search deeper in the tree, and that will be a significant ELO gain.

Rhe feature that will guarantee you to gain a hugue elo boost is the "Tapered Evaluation" even only that feature alone implemented in your engine's evaluation you can have a ~2000 elo points engine.

Ah, I see. I didn't expect a huge ELO boost, but I was wondering if it'd be significant enough to be fairly noticeable. I suppose not.

Well, I think I'll be adding them back in, as from running some tests earlier, both engine versions stayed even for the matches I tested. I'll run some more tests, but if everything stays equal, I'll be adding them back.

And thanks for the heads up about tapered evaluation. I've heard good things about it, and I'm definitely going to be looking at adding it as a feature shortly. My plan right now is just to get some more basic ELO/efficiency gains out of the way first (killer moves, history heuristic, etc.)

hgm · Post by **hgm** » Fri Jul 16, 2021 10:36 am

I never measured this precisely, but killer alwas helped my engines a lot. (In contrast to history.)

mvanthoor · Post by **mvanthoor** » Fri Jul 16, 2021 12:46 pm

algerbrex wrote: ↑Fri Jul 16, 2021 3:14 am After a week or so of work, from my testing against MinimalChess 2.0 and Shallow Blue 1.1.0, my engine is now around ~1500, with alpha-beta pruning, quiescence search, and piece-square tables. And my plan going forward is to add one feature at a time, test the hell out if to make sure it works, and then add another feature.

1500 is OK, but a bit on the low side. Good that you're testing one feature at a time.

Of course, this is only 50 games, but I do have some questions. What kind of ELO gain should I be expecting with killer moves added? (I know every engine is different, I'm just more so asking generally). And depending on the minuteness of the ELO gain, how many games would I need to run to see a result?

Rustic gained about 56 Elo from the killer move feature, in self-play:

https://rustic-chess.org/progress/sprt_results.html

Expect about 60% or so to stick in actual gauntlets. Roughly ~35 Elo.

1000 games gives you a +/- 20 Elo margin. 2000 games gives you a +/- 12 Elo margin.

What I find odd about the killer moves being added as well is that, from my testing, it speed-up the engine by a very sizeable amount by pruning huge chunks of the game tree, which allowed it to search a bit deeper than my previous version. This seemed great to me and it should increase the ELO, no?

Yes. But your engine will search 1-2 ply deeper, sometimes, than the old version. That's where the Elo increase comes from. Don't expect a 200 point gain by only sometimes searching 1-2 ply deeper.

It also depends on your PST's. If they're somewhat weak, your engine will only be able to find dumb moves faster at a greater depth

My advice would be to use the PST's from Rustic Alpha 1 and see what your rating is (without killers) and then add killer moves.

Also make sure that both killers are always unique.

Also, with regards to testing, I obviously need more tests, so I'm going to be using a sub-30-second time control. In this kind of control, all the engines I'm testing with are searching to around a depth of 4-5 each move, occasionally 6-10 very, very late in the endgame. In slower time controls, my engine usually searches to around a depth of 6-8 for the middle game and 10-15 towards the endgame. Would these lower search depths affect my results?

Your search depths look OK for an engine without a TT, searching for 30 seconds. No, they don't affect your results, if the other engines have similar search depths. It's all about the relative Elo difference. So, run 1000 games per engine against 2-3 other engines, without killer moves, and then run another gauntlet of 1000 games per engine, against the same field and see if you score better.

algerbrex · Post by **algerbrex** » Fri Jul 16, 2021 12:52 pm

hgm wrote: ↑Fri Jul 16, 2021 10:36 am I never measured this precisely, but killer alwas helped my engines a lot. (In contrast to history.)

As I'm doing more testing, they seem to be helping my engine quite a bit as well actually. I think I just needed to test more games. After running a (yet unfinished) tournament at a 5s + 0.1 time control, it appears that the version of Blunder with Killer Moves is performing quite a bit better:

Code: Select all

Score of BlunderOld vs Blunder 1.0.0: 22 - 35 - 113 [0.462]
...      BlunderOld playing White: 14 - 16 - 55  [0.488] 85
...      BlunderOld playing Black: 8 - 19 - 58  [0.435] 85
...      White vs Black: 33 - 24 - 113  [0.526] 170
Elo difference: -26.6 +/- 30.2, LOS: 4.3 %, DrawRatio: 66.5 %
170 of 200 games finished.

mvanthoor · Post by **mvanthoor** » Fri Jul 16, 2021 12:59 pm

pedrojdm2021 wrote: ↑Fri Jul 16, 2021 8:13 am Rhe feature that will guarantee you to gain a hugue elo boost is the "Tapered Evaluation" even only that feature alone implemented in your engine's evaluation you can have a ~2000 elo points engine.

Make that more like 1850 - 1900 if you don't have a transposition table. (See MinimalChess 0.3 -> 0.4.1.)
A transposition table will gain somewhere around 150 Elo (See Rustic Alpha 1 -> Alpha 2)
Killer moves + Principle Variation Search together will gain +50 points. (See Rustic Alpha 2 -> Alpha 3)
Adding tapered evaluation gains ~270-300 points (MinimalChess 0.3 -> 0.4.1, Rustic Alpha 3 -> Current Dev Version.)

After adding the tapered evaluation to Alpha 3 (CCRL 1865), the test version is CCRL 2160 +/-30 points, depending on the engines it is paired to. Against some engines it performs at around ~2120 (BikJump 2.01, Prophet 3), and against others it performs around 2200 (MinimalChess 0.5.0).

algerbrex · Post by **algerbrex** » Fri Jul 16, 2021 1:02 pm

mvanthoor wrote: ↑Fri Jul 16, 2021 12:46 pm
algerbrex wrote: ↑Fri Jul 16, 2021 3:14 am After a week or so of work, from my testing against MinimalChess 2.0 and Shallow Blue 1.1.0, my engine is now around ~1500, with alpha-beta pruning, quiescence search, and piece-square tables. And my plan going forward is to add one feature at a time, test the hell out if to make sure it works, and then add another feature.
1500 is OK, but a bit on the low side. Good that you're testing one feature at a time.

Of course, this is only 50 games, but I do have some questions. What kind of ELO gain should I be expecting with killer moves added? (I know every engine is different, I'm just more so asking generally). And depending on the minuteness of the ELO gain, how many games would I need to run to see a result?
Rustic gained about 56 Elo from the killer move feature, in self-play:

https://rustic-chess.org/progress/sprt_results.html

Expect about 60% or so to stick in actual gauntlets. Roughly ~35 Elo.

1000 games gives you a +/- 20 Elo margin. 2000 games gives you a +/- 12 Elo margin.

What I find odd about the killer moves being added as well is that, from my testing, it speed-up the engine by a very sizeable amount by pruning huge chunks of the game tree, which allowed it to search a bit deeper than my previous version. This seemed great to me and it should increase the ELO, no?
Yes. But your engine will search 1-2 ply deeper, sometimes, than the old version. That's where the Elo increase comes from. Don't expect a 200 point gain by only sometimes searching 1-2 ply deeper.

It also depends on your PST's. If they're somewhat weak, your engine will only be able to find dumb moves faster at a greater depth My advice would be to use the PST's from Rustic Alpha 1 and see what your rating is (without killers) and then add killer moves.

Also make sure that both killers are always unique.

Also, with regards to testing, I obviously need more tests, so I'm going to be using a sub-30-second time control. In this kind of control, all the engines I'm testing with are searching to around a depth of 4-5 each move, occasionally 6-10 very, very late in the endgame. In slower time controls, my engine usually searches to around a depth of 6-8 for the middle game and 10-15 towards the endgame. Would these lower search depths affect my results?
Your search depths look OK for an engine without a TT, searching for 30 seconds. No, they don't affect your results, if the other engines have similar search depths. It's all about the relative Elo difference. So, run 1000 games per engine against 2-3 other engines, without killer moves, and then run another gauntlet of 1000 games per engine, against the same field and see if you score better.

Hey Marcel, thanks for the great feedback.

I wasn't expecting anything like a 100+ point Elo increase just from Killer Moves

I'm hoping that'll come with later features like a TT and LMR. But good to know that some ELO should be gained.

As I showed in my other reply, after running some more extensive tests, it does appear that adding killer moves gives the newer version of Blunder an advantage over the old one (the results with the testing now finished):

Code: Select all

Score of BlunderOld vs Blunder 1.0.0: 28 - 41 - 131 [0.468]
...      BlunderOld playing White: 17 - 19 - 64  [0.490] 100
...      BlunderOld playing Black: 11 - 22 - 67  [0.445] 100
...      White vs Black: 39 - 30 - 131  [0.522] 200
Elo difference: -22.6 +/- 28.3, LOS: 5.9 %, DrawRatio: 65.5 %
200 of 200 games finished.

The next step now is to, as you said, beginning running my tests against MinimalChess and Shallow Blue again.

And as far as PSQT, I was actually using the ones from Sunfish, Thomas Hale's Python engine. This may or may not explain why I'm stuck ~1500. And while from what I've seen they're quite strong and like the playing style they give my engine, I'm going to be running some tests comparing his PSQT's to yours and seeing what helps Blunder the most, since, at the end of the day, what matters most is the concrete ELO gain.

mvanthoor · Post by **mvanthoor** » Fri Jul 16, 2021 1:17 pm

algerbrex wrote: ↑Fri Jul 16, 2021 1:02 pm The next step now is to, as you said, beginning running my tests against MinimalChess and Shallow Blue again.

Those engines are counterparts (just like Rustic Alpha 1 and ShallowBlue 2.0.0). Sadly, ShallowBlue is not an engine to use as example. You can play against it, but just keep in mind that it is hugely underpowered for the feature set it has. Especially version 2 has a massive feature list, while it only has a 1700 rating.

1700 was my target rating with Alpha 1, and I (initially) missed out on it by a hair (1695), until Rustic got paired with the nemesis of every new engine: TSCP. That lost Rustic Alpha 1 about 20 rating points in the CCRL list. As expected, because TSCP is very hard to defeat without implementing measures against its pawn tricks.

The best way to defeat TSCP is to implement a TT, so your engine will 'automatically' see most of TSCP's pawn shenanigans without doing anything special.

And as far as PSQT, I was actually using the ones from Sunfish, Thomas Hale's Python engine. This may or may not explain why I'm stuck ~1500. And while from what I've seen they're quite strong and like the playing style they give my engine, I'm going to be running some tests comparing his PSQT's to yours and seeing what helps Blunder the most, since, at the end of the day, what matters most is the concrete ELO gain.

In the end, the PST's are only somewhat important for the very first version of your engine. As long as you credit the original author of the PST's (if you use them in a released version), I'd go with the best you can find. After your first version, I'd implement a TT and a tapered evaluation (again using someone else's tables) and a tuner, to tune your own.

I wrote my own PST's because I'm a fairly decent chess player. MinimalChess used Rustic's PST's for a time. Then MinimalChess switched to PeSTO's tables for the tuned evaluation, and Lithander wrote his own tuner. Now I switched to MinimalChess' tuned PST's for my tapered evaluation, and then I'll write my own tuner. (I hope to be able to do this in the coming weeks.)

This way, engines can build on top of one another, without actually "stealing" features. I call it "bootstrapping": using some-one else's implementation of a feature to see what is approximately possible, and then re-implement / refactor the feature to work in the best way possible with your engine.

algerbrex · Post by **algerbrex** » Fri Jul 16, 2021 1:34 pm

mvanthoor wrote: ↑Fri Jul 16, 2021 1:17 pm
Those engines are counterparts (just like Rustic Alpha 1 and ShallowBlue 2.0.0). Sadly, ShallowBlue is not an engine to use as example. You can play against it, but just keep in mind that it is hugely underpowered for the feature set it has. Especially version 2 has a massive feature list, while it only has a 1700 rating.

1700 was my target rating with Alpha 1, and I (initially) missed out on it by a hair (1695), until Rustic got paired with the nemesis of every new engine: TSCP. That lost Rustic Alpha 1 about 20 rating points in the CCRL list. As expected, because TSCP is very hard to defeat without implementing measures against its pawn tricks.

The best way to defeat TSCP is to implement a TT, so your engine will 'automatically' see most of TSCP's pawn shenanigans without doing anything special.

Interesting. I planned on using TSCP in future testing in the coming weeks as I added more features, and I wasn't aware of trickiness. I'll look more into that.

And I actually remember coming across one of the threads you started on here about shallow blue being so underpowered for its feature set. So I suppose I'll start using Rustic Alpha 1 and trying to test against it and MinimalChess 2.0.

mvanthoor wrote: ↑Fri Jul 16, 2021 1:17 pm In the end, the PST's are only somewhat important for the very first version of your engine. As long as you credit the original author of the PST's (if you use them in a released version), I'd go with the best you can find. After your first version, I'd implement a TT and a tapered evaluation (again using someone else's tables) and a tuner, to tune your own.

I wrote my own PST's because I'm a fairly decent chess player. MinimalChess used Rustic's PST's for a time. Then MinimalChess switched to PeSTO's tables for the tuned evaluation, and Lithander wrote his own tuner. Now I switched to MinimalChess' tuned PST's for my tapered evaluation, and then I'll write my own tuner. (I hope to be able to do this in the coming weeks.)

This way, engines can build on top of one another, without actually "stealing" features. I call it "bootstrapping": using some-one else's implementation of a feature to see what is approximately possible and then re-implement / refactor the feature to work in the best way possible with your engine.

Right about the credit, of course.

I originally started off by writing my own, but I'm even less of a decent player than you are, and even though I did some research into chess theory and piece placement, I never got very good results compared to using other engines PSQT.

But I want this engine to be original, so after it gets off the ground, I'll definitely be looking into tuning my own tables. I have to admit though, I find the whole tuning process pretty daunting, and after reading the article about it on the Chess Programming Wiki, I still don't have a good idea about how to go about implementing it concretely. Any resources you'd recommend?

How much ELO should I expect to gain from killer moves?

How much ELO should I expect to gain from killer moves?

Re: How much ELO should I expect to gain from killer moves?

Re: How much ELO should I expect to gain from killer moves?

Re: How much ELO should I expect to gain from killer moves?

Re: How much ELO should I expect to gain from killer moves?

Re: How much ELO should I expect to gain from killer moves?

Re: How much ELO should I expect to gain from killer moves?

Re: How much ELO should I expect to gain from killer moves?

Re: How much ELO should I expect to gain from killer moves?

Re: How much ELO should I expect to gain from killer moves?