Testing methodology

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Testing methodology

Post by Edsel Apostol »

Hi everyone,

I've been trying lately to find a better way of testing the development of my engine due to the limited testing resources that I have. I decided to run some depth 1 matches against some opponents from a fixed set of positions. This is to determine the accuracy of my evaluation and my quiescent search between versions, before I even started refining the search algorithm.

I noticed that programs that prune losing captures in the quiescent search like Fruit 2.1 loses badly against my engine versions in which I disabled pruning those losing captures. My engine versions that I do those pruning are very weak also on this matches. I also notice that Rybka 1.0beta is very strong in this setup, so it might suggest that it has a very accurate evaluation or maybe a very accurate quiescent search. What do you people think of this?

I have not tested much but it seems to me to be a good way to test faster changes in the evaluation.

Does anyone has experience with this setup? Is it productive or there are other much better testing methodologies?

Edsel Apsotol (Twisted Logic)
Karlo Bala
Posts: 373
Joined: Wed Mar 22, 2006 10:17 am
Location: Novi Sad, Serbia
Full name: Karlo Balla

Re: Testing methodology

Post by Karlo Bala »

Rubka1.0 is strong in this setup because at depth 1 rybka actually search to depth 3 + quiescent search (I don't know exactly what rybka search in QS).
Point is that with simple QS (not included checking moves etc.) program search deeper and compensate QS weakness.

I think it is good way of testing. Once you resolve horizon problem you can put your energy on implementing search.
Best Regards,
Karlo Balla Jr.
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: Testing methodology

Post by Aleks Peshkov »

Fixed depth search is not chess. Tuning chess evaluation using non chess game results is pointless.

IMHO a small set of test positions with limited solution time is better chess, then fixed depth chess.
adams161
Posts: 626
Joined: Sun May 13, 2007 9:55 pm
Location: Bay Area, CA USA
Full name: Mike Adams

Re: Testing methodology

Post by adams161 »

I think he said he is testing the eval and wants a low depth like depth 1 because he can see ( apparently ) what the position would look like in the eval.

What i do to test eval is run a fen on a position with no search. this gives the best info i think.
Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Re: Testing methodology

Post by Edsel Apostol »

Hi Aleks,

My initial purpose was to test only the changes I made with the evaluation so I decided to do depth 1 matches where the search has little or no contribution to playing strength at all.

I think that when I play the matches using some time factor I would have a longer time to determine the status of changes on my evaluation whether they are good or bad as the factor of search could affect it.

If you think that this is not a good idea, please enlighten me.
Tuning chess evaluation using non chess game results is pointless.
I am wondering what do you mean by this. When I play a depth 1 game between two engines, they are playing chess, so how come it is a non chess game result?
Last edited by Edsel Apostol on Thu Nov 29, 2007 3:48 am, edited 1 time in total.
Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Re: Testing methodology

Post by Edsel Apostol »

Hi Karlo,

I have knowledge about Rybka showing misleading depth but I overlooked it when I used it on my test matches using this setup. Thanks for reminding me, I would not use Rybka again for this specific test.

One thing I noticed is that pruning losing captures is bad at this test but in real games, it helps, so I don't really know if this affects my assessment of whether my evaluation is good or bad.
Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Re: Testing methodology

Post by Edsel Apostol »

Hi Mike,

Your name sounds like an English grandmaster.

By the way, you said that you run your tests from FEN positions, how can you determine then if the evaluation is good or bad from this. I mean are you analyzing the eval of each FEN positions manually?
Tony

Re: Testing methodology

Post by Tony »

Edsel Apostol wrote:Hi Karlo,

I have knowledge about Rybka showing misleading depth but I overlooked it when I used it on my test matches using this setup. Thanks for reminding me, I would not use Rybka again for this specific test.

One thing I noticed is that pruning losing captures is bad at this test but in real games, it helps, so I don't really know if this affects my assessment of whether my evaluation is good or bad.
Off coarse, you give them for free.

In your test setup, searching all non-capture moves the first 5 ply in quiescence will give good results.

Tony
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: Testing methodology

Post by Aleks Peshkov »

Edsel Apostol wrote:My initial purpose was to test only the changes I made with the evaluation so I decided to do depth 1 matches where the search has little or no contribution to playing strength at all.
IMO it is exactly opposite. Your disable evaluation importance -- changes in evaluation will be hidden by "novice" level tactical vision.
When I play a depth 1 game between two engines, they are playing chess, so how come it is a non chess game result?
Computers generally play master level chess because of tactical strength. We need master level evaluation function to make them play gross master.

If you artificially destroy computer primary strength, engine will not able to understand advanced chess evaluation ideas. Long term plans, like king mating attack or creating passed pawn will be counter-productive in evaluation because of short-term "do nothing, avoid complications, wait for tactical blunder by opponent" novice player winning tactics. (Even Zappa defeat Rybka by "do nothing" in one well-known match game).
adams161
Posts: 626
Joined: Sun May 13, 2007 9:55 pm
Location: Bay Area, CA USA
Full name: Mike Adams

Re: Testing methodology

Post by adams161 »

this is just one way i test. its a snapshot. if i change the eval what is teh change in a no search fen eval. also if your eval should be symetric it will find out if it really is.