What’s the key factor to win in the 40/4 matches?

nkg114mc · Post by **nkg114mc** » Wed Feb 25, 2015 2:21 am

Hi all,

The question came from an experiment that I did recently. I replace the evaluation function in Fruit 2.1 engine with the evaluation function of Sungrous 1.4. Let’s call this hybrid engine “Fruit-SungorusEval”. Here replace means I implement an evaluation function for Fruit 2.1 that always returns exact same value as Sungorus given the same position. Then I launched an tounament with 40 moves/4 minutes time control between Sungrous 1.4 and Fruit-SungorusEval with 4000 rounds (with repeat) and 16 plies random openings. The result by BayesanElo shows that Fruit-SungorusEval is 70 elo weaker than Sungorus.

Rank Name Elo + - games score oppo. draws
1 sungorus 35 6 6 4000 60% -35 24%
2 fruit-sungeval -35 6 6 4000 40% 35 24%

This result is a little surprising to me, because I think fruit has a more complicate search implementations. However, the Fruit-SungorusEval lost the tounament for 70 elo, so I came to ask these questions:

Firstly, do you think this test setting can imply the conclusion that Fruit-SungorusEval is weaker for 70 elos in the 40/4 time control?
Secondly, if the setting is OK, what’s the main reason that Fruit-SungorusEval become weaker? I know that an evaluation function should be matched with a search algorithm in an engine, but I never expect a simplified evaluation would cause such a huge elo drop. Did some one see some similar results before?

Thanks for all suggestions!

jdart · Post by **jdart** » Wed Feb 25, 2015 2:50 am

Sounds like there is nothing wrong with your test.

Search and eval tend to interact (for example, futility margins can depend on the range of scoring values), so it is not surprising that replacing one and keeping the other is less performant.

--Jon

Ferdy · Post by **Ferdy** » Wed Feb 25, 2015 4:24 am

nkg114mc wrote:Hi all,

The question came from an experiment that I did recently. I replace the evaluation function in Fruit 2.1 engine with the evaluation function of Sungrous 1.4. Let’s call this hybrid engine “Fruit-SungorusEval”. Here replace means I implement an evaluation function for Fruit 2.1 that always returns exact same value as Sungorus given the same position. Then I launched an tounament with 40 moves/4 minutes time control between Sungrous 1.4 and Fruit-SungorusEval with 4000 rounds (with repeat) and 16 plies random openings. The result by BayesanElo shows that Fruit-SungorusEval is 70 elo weaker than Sungorus.

Rank Name Elo + - games score oppo. draws
1 sungorus 35 6 6 4000 60% -35 24%
2 fruit-sungeval -35 6 6 4000 40% 35 24%

This result is a little surprising to me, because I think fruit has a more complicate search implementations. However, the Fruit-SungorusEval lost the tounament for 70 elo, so I came to ask these questions:

Firstly, do you think this test setting can imply the conclusion that Fruit-SungorusEval is weaker for 70 elos in the 40/4 time control?
Secondly, if the setting is OK, what’s the main reason that Fruit-SungorusEval become weaker? I know that an evaluation function should be matched with a search algorithm in an engine, but I never expect a simplified evaluation would cause such a huge elo drop. Did some one see some similar results before?

Thanks for all suggestions!

To know the overall effect of using Sungorus eval in Fruit, perhaps you could have run the test, Fruit_sungorus_eval vs Fruit_fruit_eval.

Another option is to run a new match between, Sungorus vs Fruit. Then compare it with your existing result. This way we can measure too the total effect of Sungorus eval when applied to Fruit.

Dann Corbit · Post by **Dann Corbit** » Wed Feb 25, 2015 7:50 am

There are lots of possibilities here.
The first thing I would do (and you have probably already done this) would be to run eval in fruit and in sung-fruit on 1000 different positions from different game phases, side to move, etc. and verify that the eval function returns exactly the same value.

That would be proof that there is no bug in the translated evaluation.

One possible explanation is that a more complicated evaluation will reduce the speed of the search. It is also true that eval functions tend to be tuned to the entire framework that they are embedded in.

But I would say that your finding is both interesting and surprising.

nkg114mc · Post by **nkg114mc** » Thu Feb 26, 2015 9:54 pm

jdart wrote:Sounds like there is nothing wrong with your test.

Search and eval tend to interact (for example, futility margins can depend on the range of scoring values), so it is not surprising that replacing one and keeping the other is less performant.

--Jon

Hi Jon, thanks for the reminding! I guess this is probably an important reason, since the scale of Sungorus and Fruit are different, the interpretation of the score difference would also be changed. I will try to look deeper into this issue.

nkg114mc · Post by **nkg114mc** » Thu Feb 26, 2015 9:58 pm

Ferdy wrote:
nkg114mc wrote:Hi all,

The question came from an experiment that I did recently. I replace the evaluation function in Fruit 2.1 engine with the evaluation function of Sungrous 1.4. Let’s call this hybrid engine “Fruit-SungorusEval”. Here replace means I implement an evaluation function for Fruit 2.1 that always returns exact same value as Sungorus given the same position. Then I launched an tounament with 40 moves/4 minutes time control between Sungrous 1.4 and Fruit-SungorusEval with 4000 rounds (with repeat) and 16 plies random openings. The result by BayesanElo shows that Fruit-SungorusEval is 70 elo weaker than Sungorus.

Rank Name Elo + - games score oppo. draws
1 sungorus 35 6 6 4000 60% -35 24%
2 fruit-sungeval -35 6 6 4000 40% 35 24%

This result is a little surprising to me, because I think fruit has a more complicate search implementations. However, the Fruit-SungorusEval lost the tounament for 70 elo, so I came to ask these questions:

Firstly, do you think this test setting can imply the conclusion that Fruit-SungorusEval is weaker for 70 elos in the 40/4 time control?
Secondly, if the setting is OK, what’s the main reason that Fruit-SungorusEval become weaker? I know that an evaluation function should be matched with a search algorithm in an engine, but I never expect a simplified evaluation would cause such a huge elo drop. Did some one see some similar results before?

Thanks for all suggestions!
To know the overall effect of using Sungorus eval in Fruit, perhaps you could have run the test, Fruit_sungorus_eval vs Fruit_fruit_eval.

Another option is to run a new match between, Sungorus vs Fruit. Then compare it with your existing result. This way we can measure too the total effect of Sungorus eval when applied to Fruit.

Hi Ferdinand,

Thanks for the comment! The "Fruit_sungorus_eval vs Fruit_fruit_eval" is exactly what I did now. For the matches between standard Sungorus 1.4 and Fruit 2.1 can be found in CCRL 40/4, where these two engine shows around 400 elo different, and Fruit is stronger. That's why I feel surprised to test result I got.

nkg114mc · Post by **nkg114mc** » Thu Feb 26, 2015 10:11 pm

Dann Corbit wrote:There are lots of possibilities here.
The first thing I would do (and you have probably already done this) would be to run eval in fruit and in sung-fruit on 1000 different positions from different game phases, side to move, etc. and verify that the eval function returns exactly the same value.

That would be proof that there is no bug in the translated evaluation.

One possible explanation is that a more complicated evaluation will reduce the speed of the search. It is also true that eval functions tend to be tuned to the entire framework that they are embedded in.

But I would say that your finding is both interesting and surprising.

Hi Dann,

Thanks for your reply! Actualy I did not run the Fruit vs Sung-FruitEval matches, because converting the evaluation of Fruit is a complicate work (which may also introduce a lot of potential bugs). And as you mentioned, the converted evaluation might not be efficient as original version, so it would slow down the search, and then hurts the performance, which eventually makes me hard to conclude on the source of problems.

In the experiment in this post, I put the evaluation of Sungorus on Fruit searcher, because Sungorus evaluation is very simple, so it almost has no "slow-down" issue. But, the performance seriously drops (for more than 400 elos) as well. That's why I was confused

.

Ferdy · Post by **Ferdy** » Fri Feb 27, 2015 7:57 am

nkg114mc wrote:
Ferdy wrote:
nkg114mc wrote:Hi all,

The question came from an experiment that I did recently. I replace the evaluation function in Fruit 2.1 engine with the evaluation function of Sungrous 1.4. Let’s call this hybrid engine “Fruit-SungorusEval”. Here replace means I implement an evaluation function for Fruit 2.1 that always returns exact same value as Sungorus given the same position. Then I launched an tounament with 40 moves/4 minutes time control between Sungrous 1.4 and Fruit-SungorusEval with 4000 rounds (with repeat) and 16 plies random openings. The result by BayesanElo shows that Fruit-SungorusEval is 70 elo weaker than Sungorus.

Rank Name Elo + - games score oppo. draws
1 sungorus 35 6 6 4000 60% -35 24%
2 fruit-sungeval -35 6 6 4000 40% 35 24%

This result is a little surprising to me, because I think fruit has a more complicate search implementations. However, the Fruit-SungorusEval lost the tounament for 70 elo, so I came to ask these questions:

Firstly, do you think this test setting can imply the conclusion that Fruit-SungorusEval is weaker for 70 elos in the 40/4 time control?
Secondly, if the setting is OK, what’s the main reason that Fruit-SungorusEval become weaker? I know that an evaluation function should be matched with a search algorithm in an engine, but I never expect a simplified evaluation would cause such a huge elo drop. Did some one see some similar results before?

Thanks for all suggestions!
To know the overall effect of using Sungorus eval in Fruit, perhaps you could have run the test, Fruit_sungorus_eval vs Fruit_fruit_eval.

Another option is to run a new match between, Sungorus vs Fruit. Then compare it with your existing result. This way we can measure too the total effect of Sungorus eval when applied to Fruit.
Hi Ferdinand,

Thanks for the comment! The "Fruit_sungorus_eval vs Fruit_fruit_eval" is exactly what I did now.

Looking forward to the result of that test.

Ma wrote:For the matches between standard Sungorus 1.4 and Fruit 2.1 can be found in CCRL 40/4,

I checked the pgn and there were no games between the two engine.

Mao wrote:where these two engine shows around 400 elo different, and Fruit is stronger. That's why I feel surprised to test result I got.

Lets say we use the data at this time based on playing other engines we have,
Fruit 2.1 = 2685
Sungorus 1.4 = 2311
Diff = 2685-2311 = 374

So the effect of that eval change is 374+70 = 444 rating points. This indeed is surprising.

But it is better to compare to the result of the match between the two only.

Dann Corbit · Post by **Dann Corbit** » Fri Feb 27, 2015 8:26 am

nkg114mc wrote:
Dann Corbit wrote:There are lots of possibilities here.
The first thing I would do (and you have probably already done this) would be to run eval in fruit and in sung-fruit on 1000 different positions from different game phases, side to move, etc. and verify that the eval function returns exactly the same value.

That would be proof that there is no bug in the translated evaluation.

One possible explanation is that a more complicated evaluation will reduce the speed of the search. It is also true that eval functions tend to be tuned to the entire framework that they are embedded in.

But I would say that your finding is both interesting and surprising.
Hi Dann,

Thanks for your reply! Actualy I did not run the Fruit vs Sung-FruitEval matches, because converting the evaluation of Fruit is a complicate work (which may also introduce a lot of potential bugs). And as you mentioned, the converted evaluation might not be efficient as original version, so it would slow down the search, and then hurts the performance, which eventually makes me hard to conclude on the source of problems.

In the experiment in this post, I put the evaluation of Sungorus on Fruit searcher, because Sungorus evaluation is very simple, so it almost has no "slow-down" issue. But, the performance seriously drops (for more than 400 elos) as well. That's why I was confused .

What I meant was this:
Write a function that reads a position and then just runs eval on the position (no search). It should be instantaneous.
Do the same thing for both engines and compare the outputs.
Some engines have an eval command that does just that, but I don't think fruit or sungorus have that.

cdani · Post by **cdani** » Fri Feb 27, 2015 8:38 am

Hi!
Do you know how to obtain the source of Sungorus? I'm not able to find it.
Just curiosity.
Thanks.

What’s the key factor to win in the 40/4 matches?

What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?