I simply used the simple +200 elo means the stronger player will win 3 of 4, or the loser will win 1 of 4. For +400, the loser will win 1 of 16 (using 1/4, I simply squared that for simplicity). For 1000 elo, you get 1/4 ^ 5. The point was to show that if you drop the Elo by 1000, playing 67 games is pointless, you'd expect to lose every one of 'em and then some.rjgibert wrote:Then a math lesson. if A is 200 points weaker, it should win 1 of every 4 games played. If it is 400 points weaker, it should win 1 of every 16 games played. If it is 600 points weaker, it should only win 1 of every 64 games played. If it is 800 points weaker, 1 of every 256. See a pattern? So the random version could be only 800 weaker and not win a single game out of 67, and that would be perfectly normal... And even 0 out of 67 would not be that unusual in a 600 point weaker opponent.
You've "crisscrossed" a multiplicative property of odds with a multiplicative property of probabilites. For 200 elo weaker, it is 1 in 4.16 or a 3.16 to 1 dog. For 400 elo weaker, it is 1 in 11 or a 10 to 1 dog. 10 = 3.16*3.16 illustrates the multiplicative property of odds. This is probably what you were getting at. To confirm, you can plug in numbers into the rating win expectation formula:
We = 1/(1 + 10^(dR/400))
Where We = Win expectation and dR = Rating difference.
Questions for the Stockfish team
Moderators: hgm, Rebel, chrisw
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Questions for the Stockfish team
From testing, no. What you need is a large enough set so that when you reach a position with N possible moves, you sample from at least N different numbers so that you have a chance of getting a big number or a small number, that is distributed over the N moves. So that since you have N moves, you have N chances to get a big number. If you reduce this to just 0/1 (2 numbers), and you reach a position where you have 10 moves, you have a high probability of getting the biggest number (1) that is about the same as the probability of getting a 1 when you have 100 moves.Mangar wrote:Hi,
sorry, if I have some questions that are explained in Beal´s paper (I haven´t found it yet):
1. Shouldn´t you have a large range of random numbers? If you have 1000 different numbers a tree with 10.000 and a tree with 100.000 nodes would perhaps produce the same result if only using 1000 different random numbers?
Once you have a range that makes it probable that the number of moves you have influences the random number you get, you are "good enough". I have experimented, while trying to get skill 1 to below the +1800 that caused the original complaint, with reducing this range. And until I got down to the 20 or so range, I didn't notice any significant reduction in skill. When you get down to the 2-3-4 it is horrible, because now the play really is random because almost every position will get a score of 4 since the only choices are 0, 1, 2, 3, 4.
You don't need different numbers for each possible position, just different numbers for each possible move from any single position. And really it seems that we don't need quite that many since when I tried 32, I found no skill reduction...
I am turning null-move, LMR, and extensions off. Only because I wanted to drop the depth so that it would not find deep mates. Futility doesn't work with this eval, since the scores I return are never more than 1 pawn, and that is not enough to trigger the aggressive pruning stuff...2. What about pruning methods. Maybe lots of them are contraproductive with a random eval.
LMR switched off?
Nullmove switched off (as eval is random, nullmove is just a hudge decrease of search depth)?
Any value based pruning methods switched of (Razoring, Futility, ...)?
This is where the problem was first found. I don't remember who complained, but there is a long thread in the general forum about the test. They had lots of low-rated programs and crafty skill=1 was rated at something in the 1750 range in a group that was primarily 1600-1800. I'll try to find the post and include an excerpt here...
3. What about the opponents, maybe it will be lot smarter to test against 1800 Elo engines? WBEC should give a hint what engines to select.
Greetings Volker
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Result 10000 - 0
I posted the code from Crafty a couple of times. Here is a very simplified version of what it does:AlvaroBegue wrote:A question for Bob Hyatt: If I understand correctly, your scores in this mode always indicate white is ahead. If a draw is 0, wouldn't this mean that white will always avoid draws and black will always seek them? In that sense, this is not equivalent to using random numbers centered around 0.
int Evaluate(int wtm) {
int score = Random(); // returns a value 0 <= v < 100
return ((wtm) ? score : -score);
}
I am not certain that the negation at the bottom is required.
And doing that is producing a program that is playing around 1800 on the rating lists that measure programs on this end of the rating range...
-
- Posts: 588
- Joined: Thu Mar 09, 2006 4:47 pm
- Location: Singapore
Re: Result 10000 - 0
I think 1800 might be possible as your engine has mates+draws+Beal's mobility.bob wrote:I posted the code from Crafty a couple of times. Here is a very simplified version of what it does:AlvaroBegue wrote:A question for Bob Hyatt: If I understand correctly, your scores in this mode always indicate white is ahead. If a draw is 0, wouldn't this mean that white will always avoid draws and black will always seek them? In that sense, this is not equivalent to using random numbers centered around 0.
int Evaluate(int wtm) {
int score = Random(); // returns a value 0 <= v < 100
return ((wtm) ? score : -score);
}
I am not certain that the negation at the bottom is required.
And doing that is producing a program that is playing around 1800 on the rating lists that measure programs on this end of the rating range...
But I still don't like small one-sided numbers (might be erroneous ?) for white and black; better to comment out Evaluate() and replace with something like:
-7000 + (hash % 14001) as long as they don't invalidate mates scores.
Rasjid
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Result 10000 - 0
NO. The change you told me to do is HARD to screw up. Think about it one line code change, what could possibly go wrong....Did you ever think that perhaps _you_ screwed up the test. Others of us are not having that problem. For me, when others report X, and I run a test and conclude ~X, I go back and look at the test carefully to make sure it is doing what it is supposed to do, rather than shouting that everyone _else_ is wrong.
I just added this at the beginning of eval and touched _nothing_else .
Code: Select all
int SEARCHER::eval(int lazy) {
int score = int((100.0 * rand()) / (RAND_MAX + 1.0));
//print("eval %d\n",score);
//if(player == black) return -score;
return score;
....
source code of what I am using two test games against tscp. With negated score,0 - 100.0 , hardly makes a difference, same garbage engine...
Anyone can repeat the experiment and confirm.
Stop blowing hot air...
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Result 10000 - 0
I just did the exact same thing as you did, negate score, use 100 as a factor ... all futile attempts from you to avoid the inevitable. 300 games against TSCP with same disastrous results..
I will add more games if it is not convincing enough.
http://sites.google.com/site/dshawul/ev ... ects=0&d=1
negated score with 0 - 100
0 - 100 without negation
I will add more games if it is not convincing enough.
http://sites.google.com/site/dshawul/ev ... ects=0&d=1
negated score with 0 - 100
Code: Select all
Num. Name games score
0 Scorpio_random 300 4
1 XboardEngine 300 296
Rank Name Elo + - games score oppo. draws
1 XboardEngine 359 81 54 300 99% -359 0%
2 Scorpio_random -359 54 81 300 1% 359 0%
Code: Select all
Num. Name games score
0 Scorpio_random 300 3.5
1 XboardEngine 300 296.5
Rank Name Elo + - games score oppo. draws
1 XboardEngine 359 81 54 300 99% -359 0%
2 Scorpio_random -359 54 81 300 1% 359 0%
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Result 10000 - 0
Tested, anything else ?
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Result 10000 - 0
You didn't run the test I suggested: Match `return 0;' against `return -10000+(rand()%20001);'.Daniel Shawul wrote:Tested, anything else ?
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Result 10000 - 0
Maybe I will run it later, but whenever I provide the proof people should not be in _Denial_.
People can come up with billions of random ideas but i won't test them (in this particular case I will but not now )
Denial http://en.wikipedia.org/wiki/Denial
People can come up with billions of random ideas but i won't test them (in this particular case I will but not now )
Denial http://en.wikipedia.org/wiki/Denial
Denial is a defense mechanism postulated by Sigmund Freud, in which a person is faced with a fact that is too uncomfortable to accept and rejects it instead, insisting that it is not true despite what may be overwhelming evidence. [1] The subject may use:
* simple denial - deny the reality of the unpleasant fact altogether
* minimisation - admit the fact but deny its seriousness (a combination of denial and rationalisation), or
* projection - admit both the fact and seriousness but deny responsibility.
Denial of responsibility
This form of denial involves avoiding personal responsibility by:
* blaming - a direct statement shifting culpability and may overlap with denial of fact
* minimizing - an attempt to make the effects or results of an action appear to be less harmful than they may actually be, or
* justifying - when someone takes a choice and attempts to make that choice look okay due to their perception of what is "right" in a situation.
-
- Posts: 317
- Joined: Mon Jun 26, 2006 9:44 am
Re: Questions for the Stockfish team
My point:
elo 200 it's not 1/4, it's 1/4.16 or 3.16 to 1 against.
elo 400 it's not 1/16, it's 1/11 or 10 to 1
elo 600 it's not 1/64, it's 1/32.16 or 31.16 to 1
elo 800 it's not 1/256, it's 1/101 or 100 to 1
elo 1000 it's not 1/1024, it's 1/317.23 or 316.23 to 1
The odds are powers of 3.16227766 e.g. 3.16^5 = 316.23 approximately. You are computing powers of the probabilities, which is not valid in the context of your post.
And this time, please actually read what I write and actually think about what I write before replying.
elo 200 it's not 1/4, it's 1/4.16 or 3.16 to 1 against.
elo 400 it's not 1/16, it's 1/11 or 10 to 1
elo 600 it's not 1/64, it's 1/32.16 or 31.16 to 1
elo 800 it's not 1/256, it's 1/101 or 100 to 1
elo 1000 it's not 1/1024, it's 1/317.23 or 316.23 to 1
The odds are powers of 3.16227766 e.g. 3.16^5 = 316.23 approximately. You are computing powers of the probabilities, which is not valid in the context of your post.
And this time, please actually read what I write and actually think about what I write before replying.