Questions for the Stockfish team

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

rjgibert
Posts: 317
Joined: Mon Jun 26, 2006 9:44 am

Re: Questions for the Stockfish team

Post by rjgibert »

Then a math lesson. if A is 200 points weaker, it should win 1 of every 4 games played. If it is 400 points weaker, it should win 1 of every 16 games played. If it is 600 points weaker, it should only win 1 of every 64 games played. If it is 800 points weaker, 1 of every 256. See a pattern? So the random version could be only 800 weaker and not win a single game out of 67, and that would be perfectly normal... And even 0 out of 67 would not be that unusual in a 600 point weaker opponent.

You've "crisscrossed" a multiplicative property of odds with a multiplicative property of probabilites. For 200 elo weaker, it is 1 in 4.16 or a 3.16 to 1 dog. For 400 elo weaker, it is 1 in 11 or a 10 to 1 dog. 10 = 3.16*3.16 illustrates the multiplicative property of odds. This is probably what you were getting at. To confirm, you can plug in numbers into the rating win expectation formula:

We = 1/(1 + 10^(dR/400))

Where We = Win expectation and dR = Rating difference.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Questions for the Stockfish team

Post by michiguel »

bob wrote:
michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
Joost Buijs wrote:I do understand that with an infinite depth you don't need eval at all. With a perfect evaluation function a 1 ply search will be sufficient as well. This is just theoretical.

It is my feeling that everything depends on the quality of the evaluation. When i look at my own engine, it has an evaluation function comparable to a 1600 player, but it plays at 2850 level just because it is very good at tactics. I'm pretty sure that when i'm able to improve the evaluation function to a higher level it's elo will go up.
OK, some background. It turns out that if you replace Crafty's evaluation with a pure random number, it plays well above 2,000 Elo. If you disable all the search extensions, reductions, no null-move and such, you still can't get it below 1800. There has been a long discussion about this, something I call "The Beal Effect" since Don Beal first reported on this particular phenomenon many years ago. So a basic search + random eval gives an 1800 player. Full search + full eval adds 1,000 to that. How much from each? Unknown. But I have watched many many stockfish vs crafty games and the deciding issue does not seem to be evaluation. We seem to get hurt by endgame search depth more than anything...
And that is where most (all?) engines had the biggest holes in evaluation... endgame!

Miguel
I have never heard _anyone_ say that Crafty's endgame evaluation is poor. In fact, several GM players have said exactly the opposite. Most ignore candidate passed pawns and such. We don't.
Sorry, the endgame analysis of any engine have huge holes, including Crafty.

Miguel
The issue would be do you believe StockFish's eval (endgame) is far superior to Crafty's?
No, both suck :-)
The fact that SF search deeper, means that SF can find the holes faster. A superior search won't beat you if you do not leave holes in the first place.

To play endgames well what is needed is a combination of pattern recognition + retrograde analysis (i.e. planning) + and some search once the plan is established. Alpha beta is not even the right algorithm to approach many endgames.

Miguel

That was the implication I addressed. I do _not_ believe this, and have, in fact, noticed that we are getting out-searched for whatever reasons. As I had mentioned...

I can watch two programs play, and display analysis, and figure out who is searching deeper...

My comment was based on _watching_ games, where we get out-searched and then end up losing something tactically. Not positionally.
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Questions for the Stockfish team

Post by jwes »

Milos wrote:
jwes wrote:I wrote evaluate, not analyze. There is a difference.
Engines don't evaluate, engines search, GM's evaluate. Eventually ppl will understand this.
So what word would you use to describe what engines do when they execute their evaluation function?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions for the Stockfish team

Post by bob »

michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
Joost Buijs wrote:I do understand that with an infinite depth you don't need eval at all. With a perfect evaluation function a 1 ply search will be sufficient as well. This is just theoretical.

It is my feeling that everything depends on the quality of the evaluation. When i look at my own engine, it has an evaluation function comparable to a 1600 player, but it plays at 2850 level just because it is very good at tactics. I'm pretty sure that when i'm able to improve the evaluation function to a higher level it's elo will go up.
OK, some background. It turns out that if you replace Crafty's evaluation with a pure random number, it plays well above 2,000 Elo. If you disable all the search extensions, reductions, no null-move and such, you still can't get it below 1800. There has been a long discussion about this, something I call "The Beal Effect" since Don Beal first reported on this particular phenomenon many years ago. So a basic search + random eval gives an 1800 player. Full search + full eval adds 1,000 to that. How much from each? Unknown. But I have watched many many stockfish vs crafty games and the deciding issue does not seem to be evaluation. We seem to get hurt by endgame search depth more than anything...
And that is where most (all?) engines had the biggest holes in evaluation... endgame!

Miguel
I have never heard _anyone_ say that Crafty's endgame evaluation is poor. In fact, several GM players have said exactly the opposite. Most ignore candidate passed pawns and such. We don't.
Sorry, the endgame analysis of any engine have huge holes, including Crafty.

Miguel
The issue would be do you believe StockFish's eval (endgame) is far superior to Crafty's?
No, both suck :-)
The fact that SF search deeper, means that SF can find the holes faster. A superior search won't beat you if you do not leave holes in the first place.
I don't see where this is supposed to be going. If my eval had no "holes" I would not need search in the first place...



To play endgames well what is needed is a combination of pattern recognition + retrograde analysis (i.e. planning) + and some search once the plan is established. Alpha beta is not even the right algorithm to approach many endgames.

Miguel

That was the implication I addressed. I do _not_ believe this, and have, in fact, noticed that we are getting out-searched for whatever reasons. As I had mentioned...

I can watch two programs play, and display analysis, and figure out who is searching deeper...

My comment was based on _watching_ games, where we get out-searched and then end up losing something tactically. Not positionally.
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Questions for the Stockfish team

Post by Chan Rasjid »

bob wrote: OK, let's talk about this "breaking minimax." This idea is based on the fact that at any node in the tree, the more moves I have, the better the score I will get, because the scores are completely random.
...
You have retrogressed as a teacher. Your earlier explanation of the Beal Effect to me was succinct :-
and at any point in the tree, the more moves you have, the greater the probability you will get a good random score to back up, and vice-versa...
in other words :-
at any node that has N moves, the probability that one of the N random scores be the new PV increases with N. An equivalent is that the probability that one of the N random numbers be in the range (alpha, beta) increases with N.

to Daniel:
There could be a subtle difference if your random range is all +ve, 0 to 1000. The Beal Effect that I could grasp is "rigorously random" with randomness equally distributed about zero. I have not examined if this "subtle" difference can be what is giving all the grotesquely different test results.

Firstly, this Beal mobility is a rigorous assumption - that a random engine will have a slight tendency towards picking PV lines with better mobility and this is its intelligence. But we human cannot make assumptions of what ELO it could give - only testing should be accepted. My take is this:
1) differences in engines, any search tricks etc. should not affect result - everything is just a very curious Beal Effect.
2) improved hardware, search depth might magnify this Beal intelligence just as only now that powerful hardware enables chess engines to surpass human players.

Rasjid
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Result 10000 - 0

Post by Daniel Shawul »

10800 games completed all losses for scorpio_random !
Test conditions:
TC = 40 moves in 30 seconds repeating.
start positons = neutral.pgn of 600 positions from Dann.
Engines used for evaluating scorpio releases used.

Pretty much the setup is same as what I use before scorpio releases.
Scorpio reaches a depth of 10 in midgame and larger in endgames. There
is really no endgame as material builds up quickly for the non-random engine
resulting in quick wins.

Code: Select all

Rank Name              Elo    +    - games score oppo. draws
   1 Hermann 2.5       122  486  183  1200  100% -1094    0%
   2 Scorpio_2.6.3     122  486  183  1200  100% -1094    0%
   3 Scorpio_2.4.1     122  486  183  1200  100% -1094    0%
   4 Scorpio_2.4       122  486  183  1200  100% -1094    0%
   5 Fruit 2.1         122  486  183  1200  100% -1094    0%
   6 Glaurung 2.2      122  486  183  1200  100% -1094    0%
   7 Spike 1.2 Turin   122  486  183  1200  100% -1094    0%
   8 Doch64 1.3.4 JA   122  486  183  1200  100% -1094    0%
   9 Toga II 1.3.1     122  486  183  1200  100% -1094    0%
  10 Scorpio_random  -1094   78  106 10800    0%   122    0%
Bayeselo can't properly evaluate it and elostat even crashes..
The weakest engine is at least a thousand elo stronger.

To see some wins (probably due to seeing mates at the tip) I should pit it against
short searcher engines such that they will miss easy mate. That is if the random engine has not
given up all its material by that time. This is not Beal's effect btw.
Tommorrow morning test against TSCP like engines where I expect it to win some by outsearching a
mate value. My head is hearting from waiting for this test to finish, I have to sleep now.

Games here https://sites.google.com/site/dshawul/t ... ects=0&d=1
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Questions for the Stockfish team

Post by Chan Rasjid »

jwes wrote:
Milos wrote:
jwes wrote:I wrote evaluate, not analyze. There is a difference.
Engines don't evaluate, engines search, GM's evaluate. Eventually ppl will understand this.
So what word would you use to describe what engines do when they execute their evaluation function?
Milos is "more" correct or strictly speaking - engine returns a score by calling search() and within search() it does call eval() - but the direct return "score" is a value of search and NOT evaluate a position. An engine "searches" for a score.

Because engines win over human almost 99% of the time, the endgames reached would favour engines as much as they do for human.

Rasjid
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Questions for the Stockfish team

Post by Milos »

jwes wrote:
Milos wrote:
jwes wrote:I wrote evaluate, not analyze. There is a difference.
Engines don't evaluate, engines search, GM's evaluate. Eventually ppl will understand this.
So what word would you use to describe what engines do when they execute their evaluation function?
Guessing ;).
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Questions for the Stockfish team

Post by michiguel »

bob wrote:
michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:
Joost Buijs wrote:I do understand that with an infinite depth you don't need eval at all. With a perfect evaluation function a 1 ply search will be sufficient as well. This is just theoretical.

It is my feeling that everything depends on the quality of the evaluation. When i look at my own engine, it has an evaluation function comparable to a 1600 player, but it plays at 2850 level just because it is very good at tactics. I'm pretty sure that when i'm able to improve the evaluation function to a higher level it's elo will go up.
OK, some background. It turns out that if you replace Crafty's evaluation with a pure random number, it plays well above 2,000 Elo. If you disable all the search extensions, reductions, no null-move and such, you still can't get it below 1800. There has been a long discussion about this, something I call "The Beal Effect" since Don Beal first reported on this particular phenomenon many years ago. So a basic search + random eval gives an 1800 player. Full search + full eval adds 1,000 to that. How much from each? Unknown. But I have watched many many stockfish vs crafty games and the deciding issue does not seem to be evaluation. We seem to get hurt by endgame search depth more than anything...
And that is where most (all?) engines had the biggest holes in evaluation... endgame!

Miguel
I have never heard _anyone_ say that Crafty's endgame evaluation is poor. In fact, several GM players have said exactly the opposite. Most ignore candidate passed pawns and such. We don't.
Sorry, the endgame analysis of any engine have huge holes, including Crafty.

Miguel
The issue would be do you believe StockFish's eval (endgame) is far superior to Crafty's?
No, both suck :-)
The fact that SF search deeper, means that SF can find the holes faster. A superior search won't beat you if you do not leave holes in the first place.
I don't see where this is supposed to be going. If my eval had no "holes" I would not need search in the first place...
You do not need to search if your eval is perfect, which is not the same as not-having-huge-HOLES. In many endgame positions, the solution is not found by (alpha-beta) search. That is why humans are superior on those.

Miguel

To play endgames well what is needed is a combination of pattern recognition + retrograde analysis (i.e. planning) + and some search once the plan is established. Alpha beta is not even the right algorithm to approach many endgames.

Miguel

That was the implication I addressed. I do _not_ believe this, and have, in fact, noticed that we are getting out-searched for whatever reasons. As I had mentioned...

I can watch two programs play, and display analysis, and figure out who is searching deeper...

My comment was based on _watching_ games, where we get out-searched and then end up losing something tactically. Not positionally.
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Result 10000 - 0

Post by Chan Rasjid »

Daniel,

Your test might be invalid. If you use random numbers between 0 and 1000, there could be no Beal Effect!

The Beal Effect is equivalent to this:
For N random numbers and any random number alpha, the probability that one of the N numbers be greater than alpha increases with N.


In your test, at every node with N moves and bounds (alpha, beta), beta has a probabilty > 0.5 that beta < 0. So your alpha and beta are NOT RANDOM NUMBERS. This might be the reason your test result shows a true random engine without any Beal intelligence.

Rasjid