mjlef wrote:I went through the same thing with Zillions and earlier chess programs. WHat I settled on is limit search depth (auto play to determine a rating for a 1 ply search, 2 ply, etc.
That's similar to what Aleks suggested. It should work, but I'd like something more continuous. The difference in strength between a 1 ply search and a 2 ply search is probably huge. Another disadvantage is that the ratings would have to be calibrated again for each new time control. A 1 ply search at blitz will obviously do much better against humans than a 1 ply search at a tournament time control.
For even worse play, randomly toss out moves---do not score them based on how likely you think a human would be to overlook it...just toss x% of moves with a 1 ply search. You can then use autoplay to score that as well. People overlook moves all the time...even strong players miss mate in 1 sometimes.
Glaurung never misses a mate in 1, even at the lowest level. Perhaps that alone is worth a considerable number of Elo points?
Tord
Many years ago my program RexChess had a feature where you could set the ELO rating and it would try to play at that strength level.
The old programs played close to 2000 ELO with about 5-6 ply searches in the middlegame, depending on the particular program. The rating curve is also well understood, so it's simply a matter of setting the level appropriately. One really good way to do it is to set the number of nodes searched and that's how Rex did it. It's dirt simply, your program already has this feature and it's smooth - you can calibrate it simply.
Here is the problem with this that most people do not understand that you will need to be aware of if you are not already. Scalability with humans is BETTER than scalability with computers. If you double the thinking time, a computer may play 60 or 70 ELO stronger but it's even better for humans.
This effect was more easily noticed 10 or 20 years ago because humans lost at speed chess, but did a little better at game in 10 minutes, and better still at game in 30 and so on. At tournament time controls, it was the humans that were superior despite the fact that computers play MUCH stronger at 40 moves in 2 hours than they do at game in 5 minutes.
So you cannot really have a fixed level where you can say the program plays at X strength without taking into account the time control. If you have a level you call 1800 it might easily win at fast chess, and badly lose at serious long tournament games.
Some comments about it:
1)In the past computers had worse branching factor and if you take the best programs of today and make them slower you are going to find that they earn more from time relative to programs of 1980-1990
It is not obvious without testing that humans today earn more from time(assuming that you make the program artificially significantly slower by emulating hardware that is 100 or 1000 times slower than the hardware of today).
2)Even if we assume that humans today earn more from time then it still does not mean that you cannot have a fixed level when you emulate 1800 level at all time controls.
If we denote s=number of seconds that you have as target time and
n=number of nodes per move then
you can have the formula
n=C*(s^2) when C is some constant and assuming that s is not very big there should be no problem for the computer to search the number of nodes that you ask it to search.
mjlef wrote:I went through the same thing with Zillions and earlier chess programs. WHat I settled on is limit search depth (auto play to determine a rating for a 1 ply search, 2 ply, etc.
That's similar to what Aleks suggested. It should work, but I'd like something more continuous. The difference in strength between a 1 ply search and a 2 ply search is probably huge. Another disadvantage is that the ratings would have to be calibrated again for each new time control. A 1 ply search at blitz will obviously do much better against humans than a 1 ply search at a tournament time control.
For even worse play, randomly toss out moves---do not score them based on how likely you think a human would be to overlook it...just toss x% of moves with a 1 ply search. You can then use autoplay to score that as well. People overlook moves all the time...even strong players miss mate in 1 sometimes.
Glaurung never misses a mate in 1, even at the lowest level. Perhaps that alone is worth a considerable number of Elo points?
Tord
Many years ago my program RexChess had a feature where you could set the ELO rating and it would try to play at that strength level.
The old programs played close to 2000 ELO with about 5-6 ply searches in the middlegame, depending on the particular program. The rating curve is also well understood, so it's simply a matter of setting the level appropriately. One really good way to do it is to set the number of nodes searched and that's how Rex did it. It's dirt simply, your program already has this feature and it's smooth - you can calibrate it simply.
Here is the problem with this that most people do not understand that you will need to be aware of if you are not already. Scalability with humans is BETTER than scalability with computers. If you double the thinking time, a computer may play 60 or 70 ELO stronger but it's even better for humans.
This effect was more easily noticed 10 or 20 years ago because humans lost at speed chess, but did a little better at game in 10 minutes, and better still at game in 30 and so on. At tournament time controls, it was the humans that were superior despite the fact that computers play MUCH stronger at 40 moves in 2 hours than they do at game in 5 minutes.
So you cannot really have a fixed level where you can say the program plays at X strength without taking into account the time control. If you have a level you call 1800 it might easily win at fast chess, and badly lose at serious long tournament games.
Some comments about it:
1)In the past computers had worse branching factor and if you take the best programs of today and make them slower you are going to find that they earn more from time relative to programs of 1980-1990
It is not obvious without testing that humans today earn more from time(assuming that you make the program artificially significantly slower by emulating hardware that is 100 or 1000 times slower than the hardware of today).
2)Even if we assume that humans today earn more from time then it still does not mean that you cannot have a fixed level when you emulate 1800 level at all time controls.
If we denote s=number of seconds that you have as target time and
n=number of nodes per move then
you can have the formula
n=C*(s^2) when C is some constant and assuming that s is not very big there should be no problem for the computer to search the number of nodes that you ask it to search.
Uri
You make a good point, computers may very well be more scalable than they used to be.
But that's not too relevant in the context of my suggestion to anyone wanting to implement levels based on ELO instead of time-control. The main point is that you need to integrate time-control into the equation if you are actually trying to match up a computer to a human and that it can be very naturally (in my opinion) by using a fixed node setting. (I actually advocate treating nodes like time tics so that you can have more natural levels compatible with your normal time control algorithm but that's a different discussion.)
And if you are right about the relative difference of scalability changing between humans and computers, then you could be more accurate by taking this into consideration. Of course that has to be studied and it may be difficult to reach a really firm conclusion.
My opinion is still that humans are more scalable until I'm shown otherwise but I acknowledge the possibility that this has changed.
The only thing I don't like about this, which I have pointed out in the past, is that you are crippling one part of the program but not the other. You are greatly weakening the tactical component, but not the positional judgement. It will still try, within its search horizon, to avoid weakening the kingside, its pawn structure, and such. I had worked on this for a friend at a game development company, and just whacking the search never produced anything that felt like a weak human. You end up having to have it make gross tactical blunders to offset pretty decent positional judgement...
My last attempt did both. Crippled the search _and_ the evaluation, so at lower levels it would not find the deep tactics, nor would it give much consideration to pawn majorities, king safety, nor other things that a 1200 level player would not have much idea about...
It just doesn't feel right to play a 1200 level player that knows to maintain a pristine pawn structure and king safety, and understands things like distant passed pawns and the like, which are generally well beyond a 1200 player's knowledge/skill level... When you play something weakened like that, if you are a decent real chess player, it just feels "off" somehow...
Don wrote:You make a good point, computers may very well be more scalable than they used to be.
But that's not too relevant in the context of my suggestion to anyone wanting to implement levels based on ELO instead of time-control. The main point is that you need to integrate time-control into the equation if you are actually trying to match up a computer to a human and that it can be very naturally (in my opinion) by using a fixed node setting. (I actually advocate treating nodes like time tics so that you can have more natural levels compatible with your normal time control algorithm but that's a different discussion.)
And if you are right about the relative difference of scalability changing between humans and computers, then you could be more accurate by taking this into consideration. Of course that has to be studied and it may be difficult to reach a really firm conclusion.
My opinion is still that humans are more scalable until I'm shown otherwise but I acknowledge the possibility that this has changed.
Even if Glaurung scales at close to the human rate, it was calibrated against TSCP which surely doesn't; and presumably they were tested at very fast time controls. Add this to whatever modern hardware has added and pit the result against inflated blitz ratings and it's hard to have any idea of what you're going to get.