self test

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
xr_a_y
Posts: 734
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

self test

Post by xr_a_y » Wed Sep 18, 2019 5:04 pm

I'm facing a self test issue ... I have a +40 in self-testing that seems to be in fact a -20 against others ...
I'm quite aware that self-test are often over optimistic, but here :shock: :shock:

This concern eval features that are taking into account "next move" possibilities (hanging pieces, forks, pawn push, ...)

Any advice ?

User avatar
hgm
Posts: 23630
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: self test

Post by hgm » Wed Sep 18, 2019 5:14 pm

This is also a general concern; in a sense every test against other alpha-beta engines is a self-test, as these engines are so much alike ('incestuous testing'). So what we think is a huge improvement, in the end resulting in Elos rising from 1800 to 3300 or more, might be -500 Elo when tested against entities that think in an entirely different way.

User avatar
JVMerlino
Posts: 1003
Joined: Wed Mar 08, 2006 9:15 pm
Location: San Francisco, California

Re: self test

Post by JVMerlino » Wed Sep 18, 2019 5:27 pm

xr_a_y wrote:
Wed Sep 18, 2019 5:04 pm
I'm facing a self test issue ... I have a +40 in self-testing that seems to be in fact a -20 against others ...
I'm quite aware that self-test are often over optimistic, but here :shock: :shock:

This concern eval features that are taking into account "next move" possibilities (hanging pieces, forks, pawn push, ...)

Any advice ?
My answer is pretty simple - I never self-test. :) I always run a gauntlet of 12 different engines that are mostly -20/+50 ELO compared to Myrddin whenever I have a new version to test.

fabianVDW
Posts: 72
Joined: Fri Mar 15, 2019 7:46 pm
Location: Germany
Full name: Fabian von der Warth

Re: self test

Post by fabianVDW » Wed Sep 18, 2019 6:12 pm

Just because I am curious.

How many games were run against whom(in selfplay and vs. others) at which TC with what results?
Author of FabChess: https://github.com/fabianvdW/FabChess
A UCI compliant chess engine written in Rust.
FabChessWiki: https://github.com/fabianvdW/FabChess/wiki
fabianvonderwarth@gmail.com

D Sceviour
Posts: 449
Joined: Mon Jul 20, 2015 3:06 pm
Contact:

Re: self test

Post by D Sceviour » Wed Sep 18, 2019 6:31 pm

JVMerlino wrote:
Wed Sep 18, 2019 5:27 pm
xr_a_y wrote:
Wed Sep 18, 2019 5:04 pm
I'm facing a self test issue ... I have a +40 in self-testing that seems to be in fact a -20 against others ...
I'm quite aware that self-test are often over optimistic, but here :shock: :shock:

This concern eval features that are taking into account "next move" possibilities (hanging pieces, forks, pawn push, ...)

Any advice ?
My answer is pretty simple - I never self-test. :) I always run a gauntlet of 12 different engines that are mostly -20/+50 ELO compared to Myrddin whenever I have a new version to test.
It is useful to include and self test the old version (without the current patch) inside the multi-engine gauntlet. That way, one can make sure there is a definite elo increase.

User avatar
xr_a_y
Posts: 734
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

Re: self test

Post by xr_a_y » Wed Sep 18, 2019 6:36 pm

fabianVDW wrote:
Wed Sep 18, 2019 6:12 pm
Just because I am curious.

How many games were run against whom(in selfplay and vs. others) at which TC with what results?
1000 games in self test resulting in a +/-20 margin

5000 games in a 10 men tourney resulting in a +/19 margin

But this was repeated twice with the same results.

TC is 40/20sec 1024Mo TT.

User avatar
xr_a_y
Posts: 734
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

Re: self test

Post by xr_a_y » Wed Sep 18, 2019 6:37 pm

D Sceviour wrote:
Wed Sep 18, 2019 6:31 pm
JVMerlino wrote:
Wed Sep 18, 2019 5:27 pm
xr_a_y wrote:
Wed Sep 18, 2019 5:04 pm
I'm facing a self test issue ... I have a +40 in self-testing that seems to be in fact a -20 against others ...
I'm quite aware that self-test are often over optimistic, but here :shock: :shock:

This concern eval features that are taking into account "next move" possibilities (hanging pieces, forks, pawn push, ...)

Any advice ?
My answer is pretty simple - I never self-test. :) I always run a gauntlet of 12 different engines that are mostly -20/+50 ELO compared to Myrddin whenever I have a new version to test.
It is useful to include and self test the old version (without the current patch) inside the multi-engine gauntlet. That way, one can make sure there is a definite elo increase.
I did ! 2 previous version were included

User avatar
xr_a_y
Posts: 734
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

Re: self test

Post by xr_a_y » Wed Sep 18, 2019 6:44 pm

hgm wrote:
Wed Sep 18, 2019 5:14 pm
This is also a general concern; in a sense every test against other alpha-beta engines is a self-test, as these engines are so much alike ('incestuous testing'). So what we think is a huge improvement, in the end resulting in Elos rising from 1800 to 3300 or more, might be -500 Elo when tested against entities that think in an entirely different way.
I think for example that it was indeed harder for Minic to match Winter compared to other ~2900 rated engines

fabianVDW
Posts: 72
Joined: Fri Mar 15, 2019 7:46 pm
Location: Germany
Full name: Fabian von der Warth

Re: self test

Post by fabianVDW » Wed Sep 18, 2019 6:59 pm

Very weird behaviour, haven't experienced that against other engines(then again, I don't test that extensivly against other engines). Usually the self gain is about exactly what I get against other engines aswell.

Your TC seems good to me, I usually test selfplay at 10 +0.1 and if a patch passes then to confirm at 60 +0.6. I would say 40/20 is roughly somewhere between 20 +0.2 and 30 +0.3, which should be sufficient
Author of FabChess: https://github.com/fabianvdW/FabChess
A UCI compliant chess engine written in Rust.
FabChessWiki: https://github.com/fabianvdW/FabChess/wiki
fabianvonderwarth@gmail.com

mar
Posts: 1992
Joined: Fri Nov 26, 2010 1:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: self test

Post by mar » Fri Sep 20, 2019 3:07 pm

I always did only self-testing, and typically got about one half of the expected gain in CCRL.

I certainly worked for me and works for others (if you play enough games)

I did something non-standard perhaps, namely always playing against the last released version (=any fixed stable previous version);

this way I didn't fall for the trap of accumulating "improvements" when you actually accumulate error (i.e. not chasing your own tail)

(this is especially true for small improvements).

Always 10k games for small improvements, but error bars still 9 elo.
Martin Sedlak

Post Reply