I'm facing a self test issue ... I have a +40 in self-testing that seems to be in fact a -20 against others ...
I'm quite aware that self-test are often over optimistic, but here
This concern eval features that are taking into account "next move" possibilities (hanging pieces, forks, pawn push, ...)
This is also a general concern; in a sense every test against other alpha-beta engines is a self-test, as these engines are so much alike ('incestuous testing'). So what we think is a huge improvement, in the end resulting in Elos rising from 1800 to 3300 or more, might be -500 Elo when tested against entities that think in an entirely different way.
xr_a_y wrote: ↑Wed Sep 18, 2019 7:04 pm
I'm facing a self test issue ... I have a +40 in self-testing that seems to be in fact a -20 against others ...
I'm quite aware that self-test are often over optimistic, but here
This concern eval features that are taking into account "next move" possibilities (hanging pieces, forks, pawn push, ...)
Any advice ?
My answer is pretty simple - I never self-test. I always run a gauntlet of 12 different engines that are mostly -20/+50 ELO compared to Myrddin whenever I have a new version to test.
xr_a_y wrote: ↑Wed Sep 18, 2019 7:04 pm
I'm facing a self test issue ... I have a +40 in self-testing that seems to be in fact a -20 against others ...
I'm quite aware that self-test are often over optimistic, but here
This concern eval features that are taking into account "next move" possibilities (hanging pieces, forks, pawn push, ...)
Any advice ?
My answer is pretty simple - I never self-test. I always run a gauntlet of 12 different engines that are mostly -20/+50 ELO compared to Myrddin whenever I have a new version to test.
It is useful to include and self test the old version (without the current patch) inside the multi-engine gauntlet. That way, one can make sure there is a definite elo increase.
xr_a_y wrote: ↑Wed Sep 18, 2019 7:04 pm
I'm facing a self test issue ... I have a +40 in self-testing that seems to be in fact a -20 against others ...
I'm quite aware that self-test are often over optimistic, but here
This concern eval features that are taking into account "next move" possibilities (hanging pieces, forks, pawn push, ...)
Any advice ?
My answer is pretty simple - I never self-test. I always run a gauntlet of 12 different engines that are mostly -20/+50 ELO compared to Myrddin whenever I have a new version to test.
It is useful to include and self test the old version (without the current patch) inside the multi-engine gauntlet. That way, one can make sure there is a definite elo increase.
hgm wrote: ↑Wed Sep 18, 2019 7:14 pm
This is also a general concern; in a sense every test against other alpha-beta engines is a self-test, as these engines are so much alike ('incestuous testing'). So what we think is a huge improvement, in the end resulting in Elos rising from 1800 to 3300 or more, might be -500 Elo when tested against entities that think in an entirely different way.
I think for example that it was indeed harder for Minic to match Winter compared to other ~2900 rated engines
Very weird behaviour, haven't experienced that against other engines(then again, I don't test that extensivly against other engines). Usually the self gain is about exactly what I get against other engines aswell.
Your TC seems good to me, I usually test selfplay at 10 +0.1 and if a patch passes then to confirm at 60 +0.6. I would say 40/20 is roughly somewhere between 20 +0.2 and 30 +0.3, which should be sufficient