Hi everyone,
I've been trying lately to find a better way of testing the development of my engine due to the limited testing resources that I have. I decided to run some depth 1 matches against some opponents from a fixed set of positions. This is to determine the accuracy of my evaluation and my quiescent search between versions, before I even started refining the search algorithm.
I noticed that programs that prune losing captures in the quiescent search like Fruit 2.1 loses badly against my engine versions in which I disabled pruning those losing captures. My engine versions that I do those pruning are very weak also on this matches. I also notice that Rybka 1.0beta is very strong in this setup, so it might suggest that it has a very accurate evaluation or maybe a very accurate quiescent search. What do you people think of this?
I have not tested much but it seems to me to be a good way to test faster changes in the evaluation.
Does anyone has experience with this setup? Is it productive or there are other much better testing methodologies?
Edsel Apsotol (Twisted Logic)
Testing methodology
Moderators: hgm, Rebel, chrisw
-
- Posts: 803
- Joined: Mon Jul 17, 2006 5:53 am
- Full name: Edsel Apostol
-
- Posts: 373
- Joined: Wed Mar 22, 2006 10:17 am
- Location: Novi Sad, Serbia
- Full name: Karlo Balla
Re: Testing methodology
Rubka1.0 is strong in this setup because at depth 1 rybka actually search to depth 3 + quiescent search (I don't know exactly what rybka search in QS).
Point is that with simple QS (not included checking moves etc.) program search deeper and compensate QS weakness.
I think it is good way of testing. Once you resolve horizon problem you can put your energy on implementing search.
Point is that with simple QS (not included checking moves etc.) program search deeper and compensate QS weakness.
I think it is good way of testing. Once you resolve horizon problem you can put your energy on implementing search.
Best Regards,
Karlo Balla Jr.
Karlo Balla Jr.
-
- Posts: 892
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
Re: Testing methodology
Fixed depth search is not chess. Tuning chess evaluation using non chess game results is pointless.
IMHO a small set of test positions with limited solution time is better chess, then fixed depth chess.
IMHO a small set of test positions with limited solution time is better chess, then fixed depth chess.
-
- Posts: 626
- Joined: Sun May 13, 2007 9:55 pm
- Location: Bay Area, CA USA
- Full name: Mike Adams
Re: Testing methodology
I think he said he is testing the eval and wants a low depth like depth 1 because he can see ( apparently ) what the position would look like in the eval.
What i do to test eval is run a fen on a position with no search. this gives the best info i think.
What i do to test eval is run a fen on a position with no search. this gives the best info i think.
-
- Posts: 803
- Joined: Mon Jul 17, 2006 5:53 am
- Full name: Edsel Apostol
Re: Testing methodology
Hi Aleks,
My initial purpose was to test only the changes I made with the evaluation so I decided to do depth 1 matches where the search has little or no contribution to playing strength at all.
I think that when I play the matches using some time factor I would have a longer time to determine the status of changes on my evaluation whether they are good or bad as the factor of search could affect it.
If you think that this is not a good idea, please enlighten me.
My initial purpose was to test only the changes I made with the evaluation so I decided to do depth 1 matches where the search has little or no contribution to playing strength at all.
I think that when I play the matches using some time factor I would have a longer time to determine the status of changes on my evaluation whether they are good or bad as the factor of search could affect it.
If you think that this is not a good idea, please enlighten me.
I am wondering what do you mean by this. When I play a depth 1 game between two engines, they are playing chess, so how come it is a non chess game result?Tuning chess evaluation using non chess game results is pointless.
Last edited by Edsel Apostol on Thu Nov 29, 2007 3:48 am, edited 1 time in total.
Edsel Apostol
https://github.com/ed-apostol/InvictusChess
https://github.com/ed-apostol/InvictusChess
-
- Posts: 803
- Joined: Mon Jul 17, 2006 5:53 am
- Full name: Edsel Apostol
Re: Testing methodology
Hi Karlo,
I have knowledge about Rybka showing misleading depth but I overlooked it when I used it on my test matches using this setup. Thanks for reminding me, I would not use Rybka again for this specific test.
One thing I noticed is that pruning losing captures is bad at this test but in real games, it helps, so I don't really know if this affects my assessment of whether my evaluation is good or bad.
I have knowledge about Rybka showing misleading depth but I overlooked it when I used it on my test matches using this setup. Thanks for reminding me, I would not use Rybka again for this specific test.
One thing I noticed is that pruning losing captures is bad at this test but in real games, it helps, so I don't really know if this affects my assessment of whether my evaluation is good or bad.
Edsel Apostol
https://github.com/ed-apostol/InvictusChess
https://github.com/ed-apostol/InvictusChess
-
- Posts: 803
- Joined: Mon Jul 17, 2006 5:53 am
- Full name: Edsel Apostol
Re: Testing methodology
Hi Mike,
Your name sounds like an English grandmaster.
By the way, you said that you run your tests from FEN positions, how can you determine then if the evaluation is good or bad from this. I mean are you analyzing the eval of each FEN positions manually?
Your name sounds like an English grandmaster.
By the way, you said that you run your tests from FEN positions, how can you determine then if the evaluation is good or bad from this. I mean are you analyzing the eval of each FEN positions manually?
Edsel Apostol
https://github.com/ed-apostol/InvictusChess
https://github.com/ed-apostol/InvictusChess
Re: Testing methodology
Off coarse, you give them for free.Edsel Apostol wrote:Hi Karlo,
I have knowledge about Rybka showing misleading depth but I overlooked it when I used it on my test matches using this setup. Thanks for reminding me, I would not use Rybka again for this specific test.
One thing I noticed is that pruning losing captures is bad at this test but in real games, it helps, so I don't really know if this affects my assessment of whether my evaluation is good or bad.
In your test setup, searching all non-capture moves the first 5 ply in quiescence will give good results.
Tony
-
- Posts: 892
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
Re: Testing methodology
IMO it is exactly opposite. Your disable evaluation importance -- changes in evaluation will be hidden by "novice" level tactical vision.Edsel Apostol wrote:My initial purpose was to test only the changes I made with the evaluation so I decided to do depth 1 matches where the search has little or no contribution to playing strength at all.
Computers generally play master level chess because of tactical strength. We need master level evaluation function to make them play gross master.When I play a depth 1 game between two engines, they are playing chess, so how come it is a non chess game result?
If you artificially destroy computer primary strength, engine will not able to understand advanced chess evaluation ideas. Long term plans, like king mating attack or creating passed pawn will be counter-productive in evaluation because of short-term "do nothing, avoid complications, wait for tactical blunder by opponent" novice player winning tactics. (Even Zappa defeat Rybka by "do nothing" in one well-known match game).
-
- Posts: 626
- Joined: Sun May 13, 2007 9:55 pm
- Location: Bay Area, CA USA
- Full name: Mike Adams
Re: Testing methodology
this is just one way i test. its a snapshot. if i change the eval what is teh change in a no search fen eval. also if your eval should be symetric it will find out if it really is.