What’s the key factor to win in the 40/4 matches?

PK · Post by PK » Fri Feb 27, 2015 9:03 am

https://sites.google.com/site/sungorus/

giving it Fruit eval would require rewriting piece/square code - at least this is what I've done in the early version of Rodent. Sungorus uses only one set of piece/square tables, which are vertically symmetrical (!). It updates this score component while making and unmaking moves using one variable (no split for mg/eg and no split for colors). To "frutify" eval it would be needed to split this variable and probably to add incrementally updated game phase variable. At least this is what I've done at the beginning of Rodent

Henk · Post by **Henk** » Fri Feb 27, 2015 11:55 am

PK wrote:https://sites.google.com/site/sungorus/

giving it Fruit eval would require rewriting piece/square code - at least this is what I've done in the early version of Rodent. Sungorus uses only one set of piece/square tables, which are vertically symmetrical (!). It updates this score component while making and unmaking moves using one variable (no split for mg/eg and no split for colors). To "frutify" eval it would be needed to split this variable and probably to add incrementally updated game phase variable. At least this is what I've done at the beginning of Rodent

Downloading from that link does not work on my computer. Why isn't there a website where we could view the code ? I have enough of downloading software with possible viruses or unwanted side effects.

PK · Post by PK » Fri Feb 27, 2015 1:30 pm

http://www.pkoziol.cal24.pl/rodent/sungorus_legacy.zip

(if Pablo Vazquez objects, I will take this down immediately)

Uri Blass · Post by **Uri Blass** » Fri Feb 27, 2015 4:54 pm

Ferdy wrote:
nkg114mc wrote:
Ferdy wrote:
nkg114mc wrote:Hi all,

The question came from an experiment that I did recently. I replace the evaluation function in Fruit 2.1 engine with the evaluation function of Sungrous 1.4. Let’s call this hybrid engine “Fruit-SungorusEval”. Here replace means I implement an evaluation function for Fruit 2.1 that always returns exact same value as Sungorus given the same position. Then I launched an tounament with 40 moves/4 minutes time control between Sungrous 1.4 and Fruit-SungorusEval with 4000 rounds (with repeat) and 16 plies random openings. The result by BayesanElo shows that Fruit-SungorusEval is 70 elo weaker than Sungorus.

Rank Name Elo + - games score oppo. draws
1 sungorus 35 6 6 4000 60% -35 24%
2 fruit-sungeval -35 6 6 4000 40% 35 24%

This result is a little surprising to me, because I think fruit has a more complicate search implementations. However, the Fruit-SungorusEval lost the tounament for 70 elo, so I came to ask these questions:

Firstly, do you think this test setting can imply the conclusion that Fruit-SungorusEval is weaker for 70 elos in the 40/4 time control?
Secondly, if the setting is OK, what’s the main reason that Fruit-SungorusEval become weaker? I know that an evaluation function should be matched with a search algorithm in an engine, but I never expect a simplified evaluation would cause such a huge elo drop. Did some one see some similar results before?

Thanks for all suggestions!
To know the overall effect of using Sungorus eval in Fruit, perhaps you could have run the test, Fruit_sungorus_eval vs Fruit_fruit_eval.

Another option is to run a new match between, Sungorus vs Fruit. Then compare it with your existing result. This way we can measure too the total effect of Sungorus eval when applied to Fruit.
Hi Ferdinand,

Thanks for the comment! The "Fruit_sungorus_eval vs Fruit_fruit_eval" is exactly what I did now.
Looking forward to the result of that test.

Ma wrote:For the matches between standard Sungorus 1.4 and Fruit 2.1 can be found in CCRL 40/4,
I checked the pgn and there were no games between the two engine.

Mao wrote:where these two engine shows around 400 elo different, and Fruit is stronger. That's why I feel surprised to test result I got.
Lets say we use the data at this time based on playing other engines we have,
Fruit 2.1 = 2685
Sungorus 1.4 = 2311
Diff = 2685-2311 = 374

So the effect of that eval change is 374+70 = 444 rating points. This indeed is surprising.

But it is better to compare to the result of the match between the two only.

I do not understand how do you get 374+70

I understood the following
Fruit 2.1 = 2685
Sungorus 1.4 = 2311
Fruit2.1(with bad simple evaluation of Sungorus)=2685-70=2615

Ferdy · Post by **Ferdy** » Fri Feb 27, 2015 5:42 pm

Uri Blass wrote:
Ferdy wrote:
nkg114mc wrote:
Ferdy wrote:
nkg114mc wrote:Hi all,

The question came from an experiment that I did recently. I replace the evaluation function in Fruit 2.1 engine with the evaluation function of Sungrous 1.4. Let’s call this hybrid engine “Fruit-SungorusEval”. Here replace means I implement an evaluation function for Fruit 2.1 that always returns exact same value as Sungorus given the same position. Then I launched an tounament with 40 moves/4 minutes time control between Sungrous 1.4 and Fruit-SungorusEval with 4000 rounds (with repeat) and 16 plies random openings. The result by BayesanElo shows that Fruit-SungorusEval is 70 elo weaker than Sungorus.

Rank Name Elo + - games score oppo. draws
1 sungorus 35 6 6 4000 60% -35 24%
2 fruit-sungeval -35 6 6 4000 40% 35 24%

This result is a little surprising to me, because I think fruit has a more complicate search implementations. However, the Fruit-SungorusEval lost the tounament for 70 elo, so I came to ask these questions:

Firstly, do you think this test setting can imply the conclusion that Fruit-SungorusEval is weaker for 70 elos in the 40/4 time control?
Secondly, if the setting is OK, what’s the main reason that Fruit-SungorusEval become weaker? I know that an evaluation function should be matched with a search algorithm in an engine, but I never expect a simplified evaluation would cause such a huge elo drop. Did some one see some similar results before?

Thanks for all suggestions!
To know the overall effect of using Sungorus eval in Fruit, perhaps you could have run the test, Fruit_sungorus_eval vs Fruit_fruit_eval.

Another option is to run a new match between, Sungorus vs Fruit. Then compare it with your existing result. This way we can measure too the total effect of Sungorus eval when applied to Fruit.
Hi Ferdinand,

Thanks for the comment! The "Fruit_sungorus_eval vs Fruit_fruit_eval" is exactly what I did now.
Looking forward to the result of that test.

Ma wrote:For the matches between standard Sungorus 1.4 and Fruit 2.1 can be found in CCRL 40/4,
I checked the pgn and there were no games between the two engine.

Mao wrote:where these two engine shows around 400 elo different, and Fruit is stronger. That's why I feel surprised to test result I got.
Lets say we use the data at this time based on playing other engines we have,
Fruit 2.1 = 2685
Sungorus 1.4 = 2311
Diff = 2685-2311 = 374

So the effect of that eval change is 374+70 = 444 rating points. This indeed is surprising.

But it is better to compare to the result of the match between the two only.
I do not understand how do you get 374+70

The +70 is from original post.

Code: Select all

Rank Name            Elo    +    - games score oppo. draws
   1 sungorus          35    6    6  4000   60%   -35   24%
   2 fruit-sungeval   -35    6    6  4000   40%    35   24%

at CCRL:
fruit = 2685
sungorus = 2311
+374 for fruit

new test:
+70 for sungorus over fruit_sung, fixing sungorus at 2311 we get fruit_sung at,
2311-70 = 2241
fruit_sung = 2241
sungorus = 2311
Overall effect between fruit and fruit_sung = 2685-2241 = 444.

I understood the following
Fruit 2.1 = 2685
Sungorus 1.4 = 2311
Fruit2.1(with bad simple evaluation of Sungorus)=2685-70=2615

cdani · Post by **cdani** » Sat Feb 28, 2015 1:30 pm

Hello.
I have done the same test with Andscacs.

Andscacs with the evaluation of Sungorus easily outsearches Sungorus, obtaining 80% of the points.

Here I put a file with the two executables, two pgn files, and a cutechess bat file I used to do the tests:

www.andscacs.com/sungorus/sungorus.rar

So I suppose that something was bad with Fruit-sungorus, or the evaluation function is very incompatible. Or may be it's just that Andscacs is a lot stronger than the version of Fruit used.

I have done much shorter tests. May be someone wants to do better ones.

Uri Blass · Post by **Uri Blass** » Sat Feb 28, 2015 4:18 pm

Ferdy wrote:
Uri Blass wrote:
Ferdy wrote:
nkg114mc wrote:
Ferdy wrote:
nkg114mc wrote:Hi all,

The question came from an experiment that I did recently. I replace the evaluation function in Fruit 2.1 engine with the evaluation function of Sungrous 1.4. Let’s call this hybrid engine “Fruit-SungorusEval”. Here replace means I implement an evaluation function for Fruit 2.1 that always returns exact same value as Sungorus given the same position. Then I launched an tounament with 40 moves/4 minutes time control between Sungrous 1.4 and Fruit-SungorusEval with 4000 rounds (with repeat) and 16 plies random openings. The result by BayesanElo shows that Fruit-SungorusEval is 70 elo weaker than Sungorus.

Rank Name Elo + - games score oppo. draws
1 sungorus 35 6 6 4000 60% -35 24%
2 fruit-sungeval -35 6 6 4000 40% 35 24%

This result is a little surprising to me, because I think fruit has a more complicate search implementations. However, the Fruit-SungorusEval lost the tounament for 70 elo, so I came to ask these questions:

Firstly, do you think this test setting can imply the conclusion that Fruit-SungorusEval is weaker for 70 elos in the 40/4 time control?
Secondly, if the setting is OK, what’s the main reason that Fruit-SungorusEval become weaker? I know that an evaluation function should be matched with a search algorithm in an engine, but I never expect a simplified evaluation would cause such a huge elo drop. Did some one see some similar results before?

Thanks for all suggestions!
To know the overall effect of using Sungorus eval in Fruit, perhaps you could have run the test, Fruit_sungorus_eval vs Fruit_fruit_eval.

Another option is to run a new match between, Sungorus vs Fruit. Then compare it with your existing result. This way we can measure too the total effect of Sungorus eval when applied to Fruit.
Hi Ferdinand,

Thanks for the comment! The "Fruit_sungorus_eval vs Fruit_fruit_eval" is exactly what I did now.
Looking forward to the result of that test.

Ma wrote:For the matches between standard Sungorus 1.4 and Fruit 2.1 can be found in CCRL 40/4,
I checked the pgn and there were no games between the two engine.

Mao wrote:where these two engine shows around 400 elo different, and Fruit is stronger. That's why I feel surprised to test result I got.
Lets say we use the data at this time based on playing other engines we have,
Fruit 2.1 = 2685
Sungorus 1.4 = 2311
Diff = 2685-2311 = 374

So the effect of that eval change is 374+70 = 444 rating points. This indeed is surprising.

But it is better to compare to the result of the match between the two only.
I do not understand how do you get 374+70
The +70 is from original post.
Code: Select all
Rank Name            Elo    +    - games score oppo. draws
   1 sungorus          35    6    6  4000   60%   -35   24%
   2 fruit-sungeval   -35    6    6  4000   40%    35   24%
at CCRL:
fruit = 2685
sungorus = 2311
+374 for fruit

new test:
+70 for sungorus over fruit_sung, fixing sungorus at 2311 we get fruit_sung at,
2311-70 = 2241
fruit_sung = 2241
sungorus = 2311
Overall effect between fruit and fruit_sung = 2685-2241 = 444.

I understood the following
Fruit 2.1 = 2685
Sungorus 1.4 = 2311
Fruit2.1(with bad simple evaluation of Sungorus)=2685-70=2615

I understand
I thought that the test was fruit-sungorus against fruit because of the following sentence:
"the Fruit-SungorusEval lost the tounament for 70 elo"

This sentence seem to suggest that the eval is 70 elo weaker but
when I read again the first post I see that the test was not against Fruit but against Sungorus and was practically test of different search and not of different evaluations so I understand your point.

Uri Blass · Post by **Uri Blass** » Sat Feb 28, 2015 4:31 pm

cdani wrote:Hello.
I have done the same test with Andscacs.

Andscacs with the evaluation of Sungorus easily outsearches Sungorus, obtaining 80% of the points.

Here I put a file with the two executables, two pgn files, and a cutechess bat file I used to do the tests:

www.andscacs.com/sungorus/sungorus.rar

So I suppose that something was bad with Fruit-sungorus, or the evaluation function is very incompatible. Or may be it's just that Andscacs is a lot stronger than the version of Fruit used.

I have done much shorter tests. May be someone wants to do better ones.

Andscacs is 270 elo better than fruit2.1
If the advantage is mainly or only thanks for search then it clearly explain it.

cdani · Post by **cdani** » Sat Feb 28, 2015 6:04 pm

Uri Blass wrote: Andscacs is 270 elo better than fruit2.1
If the advantage is mainly or only thanks for search then it clearly explain it.

Yes, the improvements of Andscacs in the latter months are mainly in the search.

nkg114mc · Post by **nkg114mc** » Sat Feb 28, 2015 11:22 pm

Thanks for the sunggestion, Dann! I have done this before starting the matches. I run a test on a fen file with around 30000 positions. It might not cover all possible positions, but since sungorus evaluation is relative simple, I think it is acceptable to believe that my implemetation is working correctly.

I will try to share my Fruit-SungEval implementation, and cutechess commands that I used, so that others who are also interested in it can try to replicate this result. Yesterday I checked my script, seeing that there is a resign option for the cutechess-cli: "-resign movecount=3 score=2000". I am not sure it would be issue or not, but I have remove that in all of my later tests.

Dann Corbit wrote:
nkg114mc wrote:
Dann Corbit wrote:There are lots of possibilities here.
The first thing I would do (and you have probably already done this) would be to run eval in fruit and in sung-fruit on 1000 different positions from different game phases, side to move, etc. and verify that the eval function returns exactly the same value.

That would be proof that there is no bug in the translated evaluation.

One possible explanation is that a more complicated evaluation will reduce the speed of the search. It is also true that eval functions tend to be tuned to the entire framework that they are embedded in.

But I would say that your finding is both interesting and surprising.
Hi Dann,

Thanks for your reply! Actualy I did not run the Fruit vs Sung-FruitEval matches, because converting the evaluation of Fruit is a complicate work (which may also introduce a lot of potential bugs). And as you mentioned, the converted evaluation might not be efficient as original version, so it would slow down the search, and then hurts the performance, which eventually makes me hard to conclude on the source of problems.

In the experiment in this post, I put the evaluation of Sungorus on Fruit searcher, because Sungorus evaluation is very simple, so it almost has no "slow-down" issue. But, the performance seriously drops (for more than 400 elos) as well. That's why I was confused .
What I meant was this:
Write a function that reads a position and then just runs eval on the position (no search). It should be instantaneous.
Do the same thing for both engines and compare the outputs.
Some engines have an eval command that does just that, but I don't think fruit or sungorus have that.

What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?

Re: What’s the key factor to win in the 40/4 matches?