Questions for the Stockfish team

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions for the Stockfish team

Post by bob »

rjgibert wrote:
Then a math lesson. if A is 200 points weaker, it should win 1 of every 4 games played. If it is 400 points weaker, it should win 1 of every 16 games played. If it is 600 points weaker, it should only win 1 of every 64 games played. If it is 800 points weaker, 1 of every 256. See a pattern? So the random version could be only 800 weaker and not win a single game out of 67, and that would be perfectly normal... And even 0 out of 67 would not be that unusual in a 600 point weaker opponent.

You've "crisscrossed" a multiplicative property of odds with a multiplicative property of probabilites. For 200 elo weaker, it is 1 in 4.16 or a 3.16 to 1 dog. For 400 elo weaker, it is 1 in 11 or a 10 to 1 dog. 10 = 3.16*3.16 illustrates the multiplicative property of odds. This is probably what you were getting at. To confirm, you can plug in numbers into the rating win expectation formula:

We = 1/(1 + 10^(dR/400))

Where We = Win expectation and dR = Rating difference.
I simply used the simple +200 elo means the stronger player will win 3 of 4, or the loser will win 1 of 4. For +400, the loser will win 1 of 16 (using 1/4, I simply squared that for simplicity). For 1000 elo, you get 1/4 ^ 5. The point was to show that if you drop the Elo by 1000, playing 67 games is pointless, you'd expect to lose every one of 'em and then some.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Questions for the Stockfish team

Post by bob »

Mangar wrote:Hi,

sorry, if I have some questions that are explained in Beal´s paper (I haven´t found it yet):

1. Shouldn´t you have a large range of random numbers? If you have 1000 different numbers a tree with 10.000 and a tree with 100.000 nodes would perhaps produce the same result if only using 1000 different random numbers?
From testing, no. What you need is a large enough set so that when you reach a position with N possible moves, you sample from at least N different numbers so that you have a chance of getting a big number or a small number, that is distributed over the N moves. So that since you have N moves, you have N chances to get a big number. If you reduce this to just 0/1 (2 numbers), and you reach a position where you have 10 moves, you have a high probability of getting the biggest number (1) that is about the same as the probability of getting a 1 when you have 100 moves.

Once you have a range that makes it probable that the number of moves you have influences the random number you get, you are "good enough". I have experimented, while trying to get skill 1 to below the +1800 that caused the original complaint, with reducing this range. And until I got down to the 20 or so range, I didn't notice any significant reduction in skill. When you get down to the 2-3-4 it is horrible, because now the play really is random because almost every position will get a score of 4 since the only choices are 0, 1, 2, 3, 4.

You don't need different numbers for each possible position, just different numbers for each possible move from any single position. And really it seems that we don't need quite that many since when I tried 32, I found no skill reduction...


2. What about pruning methods. Maybe lots of them are contraproductive with a random eval.
LMR switched off?
Nullmove switched off (as eval is random, nullmove is just a hudge decrease of search depth)?
Any value based pruning methods switched of (Razoring, Futility, ...)?
I am turning null-move, LMR, and extensions off. Only because I wanted to drop the depth so that it would not find deep mates. Futility doesn't work with this eval, since the scores I return are never more than 1 pawn, and that is not enough to trigger the aggressive pruning stuff...




3. What about the opponents, maybe it will be lot smarter to test against 1800 Elo engines? WBEC should give a hint what engines to select.

Greetings Volker
This is where the problem was first found. I don't remember who complained, but there is a long thread in the general forum about the test. They had lots of low-rated programs and crafty skill=1 was rated at something in the 1750 range in a group that was primarily 1600-1800. I'll try to find the post and include an excerpt here...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Result 10000 - 0

Post by bob »

AlvaroBegue wrote:A question for Bob Hyatt: If I understand correctly, your scores in this mode always indicate white is ahead. If a draw is 0, wouldn't this mean that white will always avoid draws and black will always seek them? In that sense, this is not equivalent to using random numbers centered around 0.
I posted the code from Crafty a couple of times. Here is a very simplified version of what it does:

int Evaluate(int wtm) {

int score = Random(); // returns a value 0 <= v < 100
return ((wtm) ? score : -score);
}

I am not certain that the negation at the bottom is required.

And doing that is producing a program that is playing around 1800 on the rating lists that measure programs on this end of the rating range...
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Result 10000 - 0

Post by Chan Rasjid »

bob wrote:
AlvaroBegue wrote:A question for Bob Hyatt: If I understand correctly, your scores in this mode always indicate white is ahead. If a draw is 0, wouldn't this mean that white will always avoid draws and black will always seek them? In that sense, this is not equivalent to using random numbers centered around 0.
I posted the code from Crafty a couple of times. Here is a very simplified version of what it does:

int Evaluate(int wtm) {

int score = Random(); // returns a value 0 <= v < 100
return ((wtm) ? score : -score);
}

I am not certain that the negation at the bottom is required.

And doing that is producing a program that is playing around 1800 on the rating lists that measure programs on this end of the rating range...
I think 1800 might be possible as your engine has mates+draws+Beal's mobility.

But I still don't like small one-sided numbers (might be erroneous ?) for white and black; better to comment out Evaluate() and replace with something like:
-7000 + (hash % 14001) as long as they don't invalidate mates scores.

Rasjid
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Result 10000 - 0

Post by Daniel Shawul »

Did you ever think that perhaps _you_ screwed up the test. Others of us are not having that problem. For me, when others report X, and I run a test and conclude ~X, I go back and look at the test carefully to make sure it is doing what it is supposed to do, rather than shouting that everyone _else_ is wrong.
NO. The change you told me to do is HARD to screw up. Think about it one line code change, what could possibly go wrong....
I just added this at the beginning of eval and touched _nothing_else .

Code: Select all

int SEARCHER&#58;&#58;eval&#40;int lazy&#41; &#123;
	int score = int&#40;&#40;100.0 * rand&#40;)) / &#40;RAND_MAX + 1.0&#41;);
	//print&#40;"eval %d\n",score&#41;;
	//if&#40;player == black&#41; return -score;
	return score;
     ....
You comment that out and you get the regular scorpio. Everything is here http://sites.google.com/site/dshawul/ev ... ects=0&d=1,
source code of what I am using two test games against tscp. With negated score,0 - 100.0 , hardly makes a difference, same garbage engine...
Anyone can repeat the experiment and confirm.

Stop blowing hot air...
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Result 10000 - 0

Post by Daniel Shawul »

I just did the exact same thing as you did, negate score, use 100 as a factor ... all futile attempts from you to avoid the inevitable. 300 games against TSCP with same disastrous results..
I will add more games if it is not convincing enough.

http://sites.google.com/site/dshawul/ev ... ects=0&d=1

negated score with 0 - 100

Code: Select all

Num. Name            games   score 
   0 Scorpio_random    300       4 
   1 XboardEngine      300     296 
Rank Name             Elo    +    - games score oppo. draws 
   1 XboardEngine     359   81   54   300   99%  -359    0% 
   2 Scorpio_random  -359   54   81   300    1%   359    0% 
0 - 100 without negation

Code: Select all

Num. Name            games   score 
   0 Scorpio_random    300     3.5 
   1 XboardEngine      300   296.5 
Rank Name             Elo    +    - games score oppo. draws 
   1 XboardEngine     359   81   54   300   99%  -359    0% 
   2 Scorpio_random  -359   54   81   300    1%   359    0% 
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Result 10000 - 0

Post by Daniel Shawul »

Tested, anything else ?
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Result 10000 - 0

Post by AlvaroBegue »

Daniel Shawul wrote:Tested, anything else ?
You didn't run the test I suggested: Match `return 0;' against `return -10000+(rand()%20001);'.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Result 10000 - 0

Post by Daniel Shawul »

Maybe I will run it later, but whenever I provide the proof people should not be in _Denial_.
People can come up with billions of random ideas but i won't test them :) (in this particular case I will but not now )

Denial http://en.wikipedia.org/wiki/Denial
Denial is a defense mechanism postulated by Sigmund Freud, in which a person is faced with a fact that is too uncomfortable to accept and rejects it instead, insisting that it is not true despite what may be overwhelming evidence. [1] The subject may use:

* simple denial - deny the reality of the unpleasant fact altogether
* minimisation - admit the fact but deny its seriousness (a combination of denial and rationalisation), or
* projection - admit both the fact and seriousness but deny responsibility.
Denial of responsibility

This form of denial involves avoiding personal responsibility by:

* blaming - a direct statement shifting culpability and may overlap with denial of fact

* minimizing - an attempt to make the effects or results of an action appear to be less harmful than they may actually be, or
* justifying - when someone takes a choice and attempts to make that choice look okay due to their perception of what is "right" in a situation.
rjgibert
Posts: 317
Joined: Mon Jun 26, 2006 9:44 am

Re: Questions for the Stockfish team

Post by rjgibert »

My point:
elo 200 it's not 1/4, it's 1/4.16 or 3.16 to 1 against.
elo 400 it's not 1/16, it's 1/11 or 10 to 1
elo 600 it's not 1/64, it's 1/32.16 or 31.16 to 1
elo 800 it's not 1/256, it's 1/101 or 100 to 1
elo 1000 it's not 1/1024, it's 1/317.23 or 316.23 to 1

The odds are powers of 3.16227766 e.g. 3.16^5 = 316.23 approximately. You are computing powers of the probabilities, which is not valid in the context of your post.

And this time, please actually read what I write and actually think about what I write before replying.