Some thoughts on QS

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Some thoughts on QS

Post by diep »

Houdini wrote:
diep wrote:So there is 2 effects, one with statistical insight and one with a theoretical background, that explain why incesttesting is a bad idea.
There is theory, and there is practice.

Your style of argumentation is very much like Bob Hyatt's, claiming things don't work based on some past experience from a very long time ago, citing "a dozen chess programmers who have hard proof" etc. Very often this is to support a claim that something "does not work" when in fact, in my and other people's experience, it works very well.

From various comments on this forum it appears that for all current top engine engine authors auto-testing is an important part of their development cycle. Your comments suggest that you're slightly out of touch.

Robert
Who are you?
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Some thoughts on QS

Post by diep »

Uri Blass wrote:The theoretical insight is not so simple when we talk about search changes.

If A2 is weaker than A1 when they play against unrelated program B then
it means that A2 does mistakes against B that A1 does not do against B
and if you start from the relevant positions that A2 does mistakes it is going to lose also against A1.

I did not read a convincing explanation why the positions that A2 is weaker happen only against B.

Note that I think that when we talk only about evaluation changes it may be logical to have something not transitive when you change symmetric evaluation to non symmetric evaluation

Imagine we have A1 and A2 and B(A2 is modified A1).

both A1 and A2 have common weakness and they do not have king safety evaluation.

A2 at least know that it does not know about king safety so it has a bonus for trading to the endgame even at the price of inferior position.

Of course A1 is going to beat A2 in a match because A1 is going to be happy to trade to a better endgame.

When both of them play against B(a program with king safety evaluation but no endgame knowledge) things are different.

A1 is going to get inferior endgame against B but it is going to beat B because it is stronger in the endgame.

A2 is not going to get into an endgame against B and B is going to beat A2 often by king attacks.
Let's just check it point by point Uri and go further from there. Let's start with the first question.

a) do you agree that if we test against 1 specific engine, ignoring the rest of the planet, that this is a bad idea?

Suppose we decide to just test against crafty the rest of our life. Would yo uconsider that a good idea?

Or do you agree that this is just a subset of reality?

Or in logics the example of crafty we call this something like:

Crafty c Engines

Where engines is the set of engines that includes crafty and all others and c the mathematical symbol of crafty being part of that set.

Right?
Uri Blass
Posts: 10268
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Some thoughts on QS

Post by Uri Blass »

diep wrote:
Houdini wrote:
diep wrote:So there is 2 effects, one with statistical insight and one with a theoretical background, that explain why incesttesting is a bad idea.
There is theory, and there is practice.

Your style of argumentation is very much like Bob Hyatt's, claiming things don't work based on some past experience from a very long time ago, citing "a dozen chess programmers who have hard proof" etc. Very often this is to support a claim that something "does not work" when in fact, in my and other people's experience, it works very well.

From various comments on this forum it appears that for all current top engine engine authors auto-testing is an important part of their development cycle. Your comments suggest that you're slightly out of touch.

Robert
Who are you?
Robert Houdart is the programmer of houdini.
Uri Blass
Posts: 10268
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Some thoughts on QS

Post by Uri Blass »

diep wrote:
Uri Blass wrote:The theoretical insight is not so simple when we talk about search changes.

If A2 is weaker than A1 when they play against unrelated program B then
it means that A2 does mistakes against B that A1 does not do against B
and if you start from the relevant positions that A2 does mistakes it is going to lose also against A1.

I did not read a convincing explanation why the positions that A2 is weaker happen only against B.

Note that I think that when we talk only about evaluation changes it may be logical to have something not transitive when you change symmetric evaluation to non symmetric evaluation

Imagine we have A1 and A2 and B(A2 is modified A1).

both A1 and A2 have common weakness and they do not have king safety evaluation.

A2 at least know that it does not know about king safety so it has a bonus for trading to the endgame even at the price of inferior position.

Of course A1 is going to beat A2 in a match because A1 is going to be happy to trade to a better endgame.

When both of them play against B(a program with king safety evaluation but no endgame knowledge) things are different.

A1 is going to get inferior endgame against B but it is going to beat B because it is stronger in the endgame.

A2 is not going to get into an endgame against B and B is going to beat A2 often by king attacks.
Let's just check it point by point Uri and go further from there. Let's start with the first question.

a) do you agree that if we test against 1 specific engine, ignoring the rest of the planet, that this is a bad idea?

Suppose we decide to just test against crafty the rest of our life. Would yo uconsider that a good idea?

Or do you agree that this is just a subset of reality?

Or in logics the example of crafty we call this something like:

Crafty c Engines

Where engines is the set of engines that includes crafty and all others and c the mathematical symbol of crafty being part of that set.

Right?
I think that you need to test against programs of similiar strength.

I think that if you score 40% against Crafty based on 300 games then testing only against Crafty is not so bad idea.

If you improve your program and your program score 60% against Crafty based on 300 games then you certainly did a big improvement and it is time to look for stronger opponents.

From my experience of testing movei in the past
significant improvement against one program is verified also against other programs and if we talk about small differences of 5 or 10 elo I never played enough games to verify them but I imagine that usually they are also improvement against other programs.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Some thoughts on QS

Post by diep »

Uri Blass wrote:
diep wrote:
Uri Blass wrote:The theoretical insight is not so simple when we talk about search changes.

If A2 is weaker than A1 when they play against unrelated program B then
it means that A2 does mistakes against B that A1 does not do against B
and if you start from the relevant positions that A2 does mistakes it is going to lose also against A1.

I did not read a convincing explanation why the positions that A2 is weaker happen only against B.

Note that I think that when we talk only about evaluation changes it may be logical to have something not transitive when you change symmetric evaluation to non symmetric evaluation

Imagine we have A1 and A2 and B(A2 is modified A1).

both A1 and A2 have common weakness and they do not have king safety evaluation.

A2 at least know that it does not know about king safety so it has a bonus for trading to the endgame even at the price of inferior position.

Of course A1 is going to beat A2 in a match because A1 is going to be happy to trade to a better endgame.

When both of them play against B(a program with king safety evaluation but no endgame knowledge) things are different.

A1 is going to get inferior endgame against B but it is going to beat B because it is stronger in the endgame.

A2 is not going to get into an endgame against B and B is going to beat A2 often by king attacks.
Let's just check it point by point Uri and go further from there. Let's start with the first question.

a) do you agree that if we test against 1 specific engine, ignoring the rest of the planet, that this is a bad idea?

Suppose we decide to just test against crafty the rest of our life. Would yo uconsider that a good idea?

Or do you agree that this is just a subset of reality?

Or in logics the example of crafty we call this something like:

Crafty c Engines

Where engines is the set of engines that includes crafty and all others and c the mathematical symbol of crafty being part of that set.

Right?
I think that you need to test against programs of similiar strength.

I think that if you score 40% against Crafty based on 300 games then testing only against Crafty is not so bad idea.

If you improve your program and your program score 60% against Crafty based on 300 games then you certainly did a big improvement and it is time to look for stronger opponents.

From my experience of testing movei in the past
significant improvement against one program is verified also against other programs and if we talk about small differences of 5 or 10 elo I never played enough games to verify them but I imagine that usually they are also improvement against other programs.
Uri,

Simply answer the question with: "yes" or "no".

Don't go talk about something else.
Uri Blass
Posts: 10268
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Some thoughts on QS

Post by Uri Blass »

I believe that testing against one opponent can be a good idea
but I do not recommend Crafty but Houdini and I recommend only testing at unequal time control in the beginning and you need some time handicap that your program score 40-50% against Houdini.

This can save testing time because houdini use less time in your testing.
After your program score 60% you reduce the time handicap by giving houdini more time so again houdini score 40-50% and repeat the same process.

If after enough improvements your program score more than 50% at equal time control against houdini then I fully expect it also to beat other programs
that are significantly weaker than houdini in direct matchs at the same time control.