A practical way to test a patch is to run a session of blitz one minute games between engines.
I have observed that sometimes, the players arrive to the last few seconds in a condition where one engine is in a clear advantage, then suddendly, due to extreme time pressure, a blunders occurs and the advantage disappears and the game ends up in a draw, more rarely in a loss.
In the long term these blunders are evenly distributed between the two engine. But…”in the long term”, this is the key point.
These blunders are a noise and can be attribute to the blitz time limit, in longer matches these kind of extreme time pressure blunders are rare.
I am wondering to add a last seconds noise (LSN) filtering that works as following:
When an engine is loosing by at least a LSN thersold value and remaining time is less then LSN limit then the engine giveups without further fighting.
I am expecting that the testing time (read the number of matches) to validate or reject a patch will be shortened using this technique.
BTW LSN can prevent also another more subtle issue: the lucky draws.
Sometime an engine A is in a clear advantage against B at few seconds from the end, then, also without blunders, engine B manages to draw, as example for perpetual checking.
So we can have 10 draws where engine A is drawed by perpetuals by B while in clear advantage at few seconds from time limit. There is no blunder and is also perfectly legal, the two engines A and B, techicaly speaking should have the same ELO, but from a developer point of view that has to choose if accepting or rejecting a patch, there is little doubt that engine A should be preferred to engine B.
LSN filtering could help also in these cases.
Anyone has some experience on this?
Thanks
Marco
Increasing SNR with last seconds noise filtering
Moderators: hgm, Rebel, chrisw
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
-
- Posts: 27794
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Increasing SNR with last seconds noise filtering
Sudden-death games should be considered a severe test of time-management code, in the sens that a lousy time management will cause an enormous Elo reduction, while with normal time control you can get away with it (and does will have a larger Elo).
If you don't want to test the time management of an engine, you should not use sudden-death games. If you do want to test the time management above all other aspects, you should.
That's all there is to it.
If you don't want to test the time management of an engine, you should not use sudden-death games. If you do want to test the time management above all other aspects, you should.
That's all there is to it.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Increasing SNR with last seconds noise filtering
Another worry. I have been running millions of games over the past 2-3 weeks. And I have clearly seen cases where changing the time per move makes a huge difference in how programs compare. I have been tuning some based on pretty quick games (10 secs on the clock, 10ms increment) but have to frequently go back and re-test with longer games as I have had cases where a change works better on the quick games and is worse on the longer games.mcostalba wrote:A practical way to test a patch is to run a session of blitz one minute games between engines.
I have observed that sometimes, the players arrive to the last few seconds in a condition where one engine is in a clear advantage, then suddendly, due to extreme time pressure, a blunders occurs and the advantage disappears and the game ends up in a draw, more rarely in a loss.
In the long term these blunders are evenly distributed between the two engine. But…”in the long term”, this is the key point.
These blunders are a noise and can be attribute to the blitz time limit, in longer matches these kind of extreme time pressure blunders are rare.
I am wondering to add a last seconds noise (LSN) filtering that works as following:
When an engine is loosing by at least a LSN thersold value and remaining time is less then LSN limit then the engine giveups without further fighting.
I am expecting that the testing time (read the number of matches) to validate or reject a patch will be shortened using this technique.
BTW LSN can prevent also another more subtle issue: the lucky draws.
Sometime an engine A is in a clear advantage against B at few seconds from the end, then, also without blunders, engine B manages to draw, as example for perpetual checking.
So we can have 10 draws where engine A is drawed by perpetuals by B while in clear advantage at few seconds from time limit. There is no blunder and is also perfectly legal, the two engines A and B, techicaly speaking should have the same ELO, but from a developer point of view that has to choose if accepting or rejecting a patch, there is little doubt that engine A should be preferred to engine B.
LSN filtering could help also in these cases.
Anyone has some experience on this?
Thanks
Marco
This is not a simple task...
BTW much of the noise disappears if you use an increment, so that one never runs out of time unless there is a bug. But that only heightens the other issue further...