Hi all,
The question came from an experiment that I did recently. I replace the evaluation function in Fruit 2.1 engine with the evaluation function of Sungrous 1.4. Let’s call this hybrid engine “Fruit-SungorusEval”. Here replace means I implement an evaluation function for Fruit 2.1 that always returns exact same value as Sungorus given the same position. Then I launched an tounament with 40 moves/4 minutes time control between Sungrous 1.4 and Fruit-SungorusEval with 4000 rounds (with repeat) and 16 plies random openings. The result by BayesanElo shows that Fruit-SungorusEval is 70 elo weaker than Sungorus.
Rank Name Elo + - games score oppo. draws
1 sungorus 35 6 6 4000 60% -35 24%
2 fruit-sungeval -35 6 6 4000 40% 35 24%
This result is a little surprising to me, because I think fruit has a more complicate search implementations. However, the Fruit-SungorusEval lost the tounament for 70 elo, so I came to ask these questions:
Firstly, do you think this test setting can imply the conclusion that Fruit-SungorusEval is weaker for 70 elos in the 40/4 time control?
Secondly, if the setting is OK, what’s the main reason that Fruit-SungorusEval become weaker? I know that an evaluation function should be matched with a search algorithm in an engine, but I never expect a simplified evaluation would cause such a huge elo drop. Did some one see some similar results before?
Thanks for all suggestions!
What’s the key factor to win in the 40/4 matches?
Moderators: hgm, Rebel, chrisw
-
- Posts: 74
- Joined: Sat Dec 18, 2010 5:19 pm
- Location: Tianjin, China
- Full name: Chao M.
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: What’s the key factor to win in the 40/4 matches?
Sounds like there is nothing wrong with your test.
Search and eval tend to interact (for example, futility margins can depend on the range of scoring values), so it is not surprising that replacing one and keeping the other is less performant.
--Jon
Search and eval tend to interact (for example, futility margins can depend on the range of scoring values), so it is not surprising that replacing one and keeping the other is less performant.
--Jon
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: What’s the key factor to win in the 40/4 matches?
To know the overall effect of using Sungorus eval in Fruit, perhaps you could have run the test, Fruit_sungorus_eval vs Fruit_fruit_eval.nkg114mc wrote:Hi all,
The question came from an experiment that I did recently. I replace the evaluation function in Fruit 2.1 engine with the evaluation function of Sungrous 1.4. Let’s call this hybrid engine “Fruit-SungorusEval”. Here replace means I implement an evaluation function for Fruit 2.1 that always returns exact same value as Sungorus given the same position. Then I launched an tounament with 40 moves/4 minutes time control between Sungrous 1.4 and Fruit-SungorusEval with 4000 rounds (with repeat) and 16 plies random openings. The result by BayesanElo shows that Fruit-SungorusEval is 70 elo weaker than Sungorus.
Rank Name Elo + - games score oppo. draws
1 sungorus 35 6 6 4000 60% -35 24%
2 fruit-sungeval -35 6 6 4000 40% 35 24%
This result is a little surprising to me, because I think fruit has a more complicate search implementations. However, the Fruit-SungorusEval lost the tounament for 70 elo, so I came to ask these questions:
Firstly, do you think this test setting can imply the conclusion that Fruit-SungorusEval is weaker for 70 elos in the 40/4 time control?
Secondly, if the setting is OK, what’s the main reason that Fruit-SungorusEval become weaker? I know that an evaluation function should be matched with a search algorithm in an engine, but I never expect a simplified evaluation would cause such a huge elo drop. Did some one see some similar results before?
Thanks for all suggestions!
Another option is to run a new match between, Sungorus vs Fruit. Then compare it with your existing result. This way we can measure too the total effect of Sungorus eval when applied to Fruit.
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: What’s the key factor to win in the 40/4 matches?
There are lots of possibilities here.
The first thing I would do (and you have probably already done this) would be to run eval in fruit and in sung-fruit on 1000 different positions from different game phases, side to move, etc. and verify that the eval function returns exactly the same value.
That would be proof that there is no bug in the translated evaluation.
One possible explanation is that a more complicated evaluation will reduce the speed of the search. It is also true that eval functions tend to be tuned to the entire framework that they are embedded in.
But I would say that your finding is both interesting and surprising.
The first thing I would do (and you have probably already done this) would be to run eval in fruit and in sung-fruit on 1000 different positions from different game phases, side to move, etc. and verify that the eval function returns exactly the same value.
That would be proof that there is no bug in the translated evaluation.
One possible explanation is that a more complicated evaluation will reduce the speed of the search. It is also true that eval functions tend to be tuned to the entire framework that they are embedded in.
But I would say that your finding is both interesting and surprising.
-
- Posts: 74
- Joined: Sat Dec 18, 2010 5:19 pm
- Location: Tianjin, China
- Full name: Chao M.
Re: What’s the key factor to win in the 40/4 matches?
Hi Jon, thanks for the reminding! I guess this is probably an important reason, since the scale of Sungorus and Fruit are different, the interpretation of the score difference would also be changed. I will try to look deeper into this issue.jdart wrote:Sounds like there is nothing wrong with your test.
Search and eval tend to interact (for example, futility margins can depend on the range of scoring values), so it is not surprising that replacing one and keeping the other is less performant.
--Jon
-
- Posts: 74
- Joined: Sat Dec 18, 2010 5:19 pm
- Location: Tianjin, China
- Full name: Chao M.
Re: What’s the key factor to win in the 40/4 matches?
Hi Ferdinand,Ferdy wrote:To know the overall effect of using Sungorus eval in Fruit, perhaps you could have run the test, Fruit_sungorus_eval vs Fruit_fruit_eval.nkg114mc wrote:Hi all,
The question came from an experiment that I did recently. I replace the evaluation function in Fruit 2.1 engine with the evaluation function of Sungrous 1.4. Let’s call this hybrid engine “Fruit-SungorusEval”. Here replace means I implement an evaluation function for Fruit 2.1 that always returns exact same value as Sungorus given the same position. Then I launched an tounament with 40 moves/4 minutes time control between Sungrous 1.4 and Fruit-SungorusEval with 4000 rounds (with repeat) and 16 plies random openings. The result by BayesanElo shows that Fruit-SungorusEval is 70 elo weaker than Sungorus.
Rank Name Elo + - games score oppo. draws
1 sungorus 35 6 6 4000 60% -35 24%
2 fruit-sungeval -35 6 6 4000 40% 35 24%
This result is a little surprising to me, because I think fruit has a more complicate search implementations. However, the Fruit-SungorusEval lost the tounament for 70 elo, so I came to ask these questions:
Firstly, do you think this test setting can imply the conclusion that Fruit-SungorusEval is weaker for 70 elos in the 40/4 time control?
Secondly, if the setting is OK, what’s the main reason that Fruit-SungorusEval become weaker? I know that an evaluation function should be matched with a search algorithm in an engine, but I never expect a simplified evaluation would cause such a huge elo drop. Did some one see some similar results before?
Thanks for all suggestions!
Another option is to run a new match between, Sungorus vs Fruit. Then compare it with your existing result. This way we can measure too the total effect of Sungorus eval when applied to Fruit.
Thanks for the comment! The "Fruit_sungorus_eval vs Fruit_fruit_eval" is exactly what I did now. For the matches between standard Sungorus 1.4 and Fruit 2.1 can be found in CCRL 40/4, where these two engine shows around 400 elo different, and Fruit is stronger. That's why I feel surprised to test result I got.
-
- Posts: 74
- Joined: Sat Dec 18, 2010 5:19 pm
- Location: Tianjin, China
- Full name: Chao M.
Re: What’s the key factor to win in the 40/4 matches?
Hi Dann,Dann Corbit wrote:There are lots of possibilities here.
The first thing I would do (and you have probably already done this) would be to run eval in fruit and in sung-fruit on 1000 different positions from different game phases, side to move, etc. and verify that the eval function returns exactly the same value.
That would be proof that there is no bug in the translated evaluation.
One possible explanation is that a more complicated evaluation will reduce the speed of the search. It is also true that eval functions tend to be tuned to the entire framework that they are embedded in.
But I would say that your finding is both interesting and surprising.
Thanks for your reply! Actualy I did not run the Fruit vs Sung-FruitEval matches, because converting the evaluation of Fruit is a complicate work (which may also introduce a lot of potential bugs). And as you mentioned, the converted evaluation might not be efficient as original version, so it would slow down the search, and then hurts the performance, which eventually makes me hard to conclude on the source of problems.
In the experiment in this post, I put the evaluation of Sungorus on Fruit searcher, because Sungorus evaluation is very simple, so it almost has no "slow-down" issue. But, the performance seriously drops (for more than 400 elos) as well. That's why I was confused .
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: What’s the key factor to win in the 40/4 matches?
Looking forward to the result of that test.nkg114mc wrote:Hi Ferdinand,Ferdy wrote:To know the overall effect of using Sungorus eval in Fruit, perhaps you could have run the test, Fruit_sungorus_eval vs Fruit_fruit_eval.nkg114mc wrote:Hi all,
The question came from an experiment that I did recently. I replace the evaluation function in Fruit 2.1 engine with the evaluation function of Sungrous 1.4. Let’s call this hybrid engine “Fruit-SungorusEval”. Here replace means I implement an evaluation function for Fruit 2.1 that always returns exact same value as Sungorus given the same position. Then I launched an tounament with 40 moves/4 minutes time control between Sungrous 1.4 and Fruit-SungorusEval with 4000 rounds (with repeat) and 16 plies random openings. The result by BayesanElo shows that Fruit-SungorusEval is 70 elo weaker than Sungorus.
Rank Name Elo + - games score oppo. draws
1 sungorus 35 6 6 4000 60% -35 24%
2 fruit-sungeval -35 6 6 4000 40% 35 24%
This result is a little surprising to me, because I think fruit has a more complicate search implementations. However, the Fruit-SungorusEval lost the tounament for 70 elo, so I came to ask these questions:
Firstly, do you think this test setting can imply the conclusion that Fruit-SungorusEval is weaker for 70 elos in the 40/4 time control?
Secondly, if the setting is OK, what’s the main reason that Fruit-SungorusEval become weaker? I know that an evaluation function should be matched with a search algorithm in an engine, but I never expect a simplified evaluation would cause such a huge elo drop. Did some one see some similar results before?
Thanks for all suggestions!
Another option is to run a new match between, Sungorus vs Fruit. Then compare it with your existing result. This way we can measure too the total effect of Sungorus eval when applied to Fruit.
Thanks for the comment! The "Fruit_sungorus_eval vs Fruit_fruit_eval" is exactly what I did now.
I checked the pgn and there were no games between the two engine.Ma wrote:For the matches between standard Sungorus 1.4 and Fruit 2.1 can be found in CCRL 40/4,
Lets say we use the data at this time based on playing other engines we have,Mao wrote:where these two engine shows around 400 elo different, and Fruit is stronger. That's why I feel surprised to test result I got.
Fruit 2.1 = 2685
Sungorus 1.4 = 2311
Diff = 2685-2311 = 374
So the effect of that eval change is 374+70 = 444 rating points. This indeed is surprising.
But it is better to compare to the result of the match between the two only.
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: What’s the key factor to win in the 40/4 matches?
What I meant was this:nkg114mc wrote:Hi Dann,Dann Corbit wrote:There are lots of possibilities here.
The first thing I would do (and you have probably already done this) would be to run eval in fruit and in sung-fruit on 1000 different positions from different game phases, side to move, etc. and verify that the eval function returns exactly the same value.
That would be proof that there is no bug in the translated evaluation.
One possible explanation is that a more complicated evaluation will reduce the speed of the search. It is also true that eval functions tend to be tuned to the entire framework that they are embedded in.
But I would say that your finding is both interesting and surprising.
Thanks for your reply! Actualy I did not run the Fruit vs Sung-FruitEval matches, because converting the evaluation of Fruit is a complicate work (which may also introduce a lot of potential bugs). And as you mentioned, the converted evaluation might not be efficient as original version, so it would slow down the search, and then hurts the performance, which eventually makes me hard to conclude on the source of problems.
In the experiment in this post, I put the evaluation of Sungorus on Fruit searcher, because Sungorus evaluation is very simple, so it almost has no "slow-down" issue. But, the performance seriously drops (for more than 400 elos) as well. That's why I was confused .
Write a function that reads a position and then just runs eval on the position (no search). It should be instantaneous.
Do the same thing for both engines and compare the outputs.
Some engines have an eval command that does just that, but I don't think fruit or sungorus have that.
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: What’s the key factor to win in the 40/4 matches?
Hi!
Do you know how to obtain the source of Sungorus? I'm not able to find it.
Just curiosity.
Thanks.
Do you know how to obtain the source of Sungorus? I'm not able to find it.
Just curiosity.
Thanks.
Daniel José - http://www.andscacs.com