Who is stronger at chess? Computers or Humans?

Milos · Post by **Milos** » Mon Feb 27, 2012 12:28 am

[quote="Adam Hair"]There have been some hints from different authors that they were getting more than 70 Elo per doubling at shorter search times. I have seen a citation that Levy and Newborn stated in How Computers Play Chess of 50 to 70 Elo per doubling, but I have not seen that confirmed in recent times. So, I decided to see if it was true, for I needed to know for another test.

The following list is the result of my testing the approximate gain in Elo for each doubling of thinking time, which should roughly equal the same increase due to doubling speed:

Code: Select all

Base time control: 6 sec + 0.1 sec
(2) : 2 x (6 sec + 0.1 sec) = 12 sec + 0.2 sec
(4) : 24 sec + 0.4 sec
(8) : 48 sec + 0.8 sec
(16): 96 sec + 1.6 sec

QX6700 @ 3.05 GHz
100 positions per match, each position twice (reversed colors)

The mean Elo increase per doubling in time is 117.94 +/- 54. If the Elo increases for doubling from the base time to 12 sec + 0.2 sec is discarded, the increase per doubling is 108.16 +/- 38.52.

I will not claim that there are no problems with my testing methodology, but you will have to point out to me exactly what the problems are. I understand that these numbers do not correspond with your expectations, but this data does have a little bit of support in that it corresponds with the results found by the authors of Dirty and Spandrel. 

If you do point out some plausible flaws, I would be willing to redo the study with corrections made to the methodology.[/quote]
Ok, I was away for the weekend, so the answer comes with a bit of delay :).
I have 2 major problems with your testing method.
First and the most important is time controls. At least for the first 3 time controls (basically for any TC where average time per move is less than 1 sec) the Elo differences tend to be greatly exaggerated. This is a well-known fact. It's not due to search/evaluation algorithm weaknesses, but TC and interface problem differences which make blunders more often and therefore effectively lower Elo. You can also see then by the extremely low draw percentage. Btw. these extremely short TCs are used in engine tuning exactly because they exaggerate differences so any change you make to the engine is more easily recognized as positive or negative. 

The second problem is related to the wide range of Elo differences in your list. I have no clue how you are selecting opponents, but when testing 1 engine with various TC's all opponents have to be the same, always with the same TC (for opponent, so there should be no same opponents with different TCs per opponent). Opponents should also be selected in a way that the weakest is slightly stronger than the tested engine at the shortest TC, while the strongest is slightly weaker then the tested engine at the longest TC and there should be no more than 3 doubling TCs for the tested engine since that would already give more than 200 Elo difference which is quite large.

IGarcia · Post by **IGarcia** » Mon Feb 27, 2012 3:03 am

Uri wrote:I think I will give up on chess. I absolutely suck at this game. Even after almost 20 years of practice, I still play terribly.

In Playchess my rating is only 1400 and that's a bad rating, cosidering that some on Playchess have ratings that are above 3300!

I often make terrible blunders already in the opening which cause me to lose a minor piece (like a knight or a bishop, sometimes even a rock) and I need to resign and lose the game. Many times I play the opening passively and reach a positional bust already on move 16.

And even if I do manage to reach a winning position, then I lose on time because I don't play fast enough.

In blind chess I suck even more. My friend defeated me 12-0 in blind chess. Blind chess is even much more difficult for me because I can't "see" the whole board in my mind and I can't "see" board positions far enough ahead.

I guess that some people have a natural or inborn talent for chess while other just don't.

I was there, but decide not to leave chess whatever my rank reach.

You can always enjoy chess by looking GM live games, replaying old masterpieces ones and using the computer to help analyze them. Also to analyze yours to find what was better (beside obvious blunders).

You can enjoy watching tennis without playing it at all.

Maybe if you take some advises from a strong player or pay a coach for some time you can work and get a big improvement. Leaving chess is a blunder!

Adam Hair · Post by **Adam Hair** » Mon Feb 27, 2012 5:39 am

[quote="Milos"][quote="Adam Hair"]There have been some hints from different authors that they were getting more than 70 Elo per doubling at shorter search times. I have seen a citation that Levy and Newborn stated in How Computers Play Chess of 50 to 70 Elo per doubling, but I have not seen that confirmed in recent times. So, I decided to see if it was true, for I needed to know for another test.

The following list is the result of my testing the approximate gain in Elo for each doubling of thinking time, which should roughly equal the same increase due to doubling speed:

Code: Select all

Base time control: 6 sec + 0.1 sec
(2) : 2 x (6 sec + 0.1 sec) = 12 sec + 0.2 sec
(4) : 24 sec + 0.4 sec
(8) : 48 sec + 0.8 sec
(16): 96 sec + 1.6 sec

QX6700 @ 3.05 GHz
100 positions per match, each position twice (reversed colors)

The mean Elo increase per doubling in time is 117.94 +/- 54. If the Elo increases for doubling from the base time to 12 sec + 0.2 sec is discarded, the increase per doubling is 108.16 +/- 38.52.

I will not claim that there are no problems with my testing methodology, but you will have to point out to me exactly what the problems are. I understand that these numbers do not correspond with your expectations, but this data does have a little bit of support in that it corresponds with the results found by the authors of Dirty and Spandrel. 

If you do point out some plausible flaws, I would be willing to redo the study with corrections made to the methodology.[/quote]
Ok, I was away for the weekend, so the answer comes with a bit of delay :).
I have 2 major problems with your testing method.
First and the most important is time controls. At least for the first 3 time controls (basically for any TC where average time per move is less than 1 sec) the Elo differences tend to be greatly exaggerated. This is a well-known fact. It's not due to search/evaluation algorithm weaknesses, but TC and interface problem differences which make blunders more often and therefore effectively lower Elo. You can also see then by the extremely low draw percentage. Btw. these extremely short TCs are used in engine tuning exactly because they exaggerate differences so any change you make to the engine is more easily recognized as positive or negative. [/quote]

I used these time controls because I needed to know the change in Elo with thinking times in this range. I do not expect that the Elo increase holds for longer time controls. But this is what I had in mind when I said "shorter thinking times". However, due to search improvements and hardware improvements, I would assume that the depths reached at these time controls may not be inferior to those reached in the tests that determined the 50 to 70 Elo increase per doubling. Which I assumed occurred before [i]How Computers Play Chess[/i] was published, where this "fact" was stated.

By the way, I think that extremely fast time controls are used in engine tuning in order to accumulate enough games in a reasonably amount of time in order to achieve some statistical significance. At least that is the rationale most authors use.

[quote="Milos"]
The second problem is related to the wide range of Elo differences in your list. I have no clue how you are selecting opponents, but when testing 1 engine with various TC's all opponents have to be the same, always with the same TC (for opponent, so there should be no same opponents with different TCs per opponent). Opponents should also be selected in a way that the weakest is slightly stronger than the tested engine at the shortest TC, while the strongest is slightly weaker then the tested engine at the longest TC and there should be no more than 3 doubling TCs for the tested engine since that would already give more than 200 Elo difference which is quite large.[/quote]

In my tests, each engine played every other engine at the same time control. Based on those results, the opponents for a certain engine at a certain time control were determined. Those opponents were in a +/- 100 Elo interval from the intermediate rating for that engine/TC.

I agree that what you suggest would give more precise results, but I am not certain that this would give more accurate results. However, I am willing to conduct some tests in the manner you suggest. Let's decide upon the the time controls (something reasonable that takes less than one month to complete on a quad) and engines. What are your suggestions?

Adam Hair · Post by **Adam Hair** » Mon Feb 27, 2012 5:49 am

kranium wrote:
Adam Hair wrote:
Milos wrote:
Adam Hair wrote:Newer engines seem to get 100 to 120 Elo per doubling, at least with shorter thinking times.
Maybe you should look at your own rating list (pick an engine and check single vs. quad core for example for Elo difference) before making such hm, hm, funny claims .

P.S. Just to realize how funny is what you've just wrote, think of a consequence of it, it would mean not only there is no diminishing return, but there is an additional gain with speeding up of hardware. In physics that would be equivalent to perpetuum mobile claim...
Perhaps you should think about whether or not I would state something without anything to back up my statement. Also, I made no claim that the increase in Elo is linear as the number of doublings increase.

Take a moment to read what I wrote and realize that I am stating something that I have measured and with no intention to make you look bad.

When I arrive home in a few hours, I will present my data.

Adam-
100 to 120 Elo per doubling?

CCRL:
Houdini 2.0c 64-bit 4CPU 3311
Houdini 2.0c 64-bit 3242

ELO change = + 69

is the CCRL published data (and Milos's posts) inaccurate in some way?

(i realize you may have been caught up in the excitement of volunteering yourself as moderator...
and needed to appear smart!)

but please...!

(Merci bien, AUB!...no one from CCRL got elected!)

I see that you edited this post after I read and responded to it. Well, I am glad that you are happy I am not a moderator. You in no way could be as happy as I am that I lost, but that does not matter. It is just good to know that I was involved in some way with something that made you feel better.

Rebel · Post by **Rebel** » Mon Feb 27, 2012 11:51 am

kranium wrote: Adam- 100 to 120 Elo per doubling?

Why so negative ?

Especially when it's about something many programmers already know and you apparently don't

In the 80's the branch factor was about 4.0 and the estimated elo gain when doubling speed already was roughly 50-70 elo because an engine became ½ ply deeper. Nowadays with a branch factor of >= 1.5 <= 2.0 a doubling of speed gives a full ply to 1½ ply deeper search, hence the elo gain goes up as well.

Marfan · Post by **Marfan** » Mon Feb 27, 2012 3:52 pm

MM wrote:
Uri wrote:
Dr.Wael Deeb wrote:Are still much better

Do you actually believe in this

Depends against whom. A leading chess algorithm like Stockfish 2.2.2JA or Houdini 2.0c Pro running on a Xeon E7-2870 PC would probably defeat 98% of chess players in the world.

But there is still this 2% of high-quality chess players left that this chess algorithm would probably lose to.

Hi,

the point is:

the engines have a very limited knowledge of chess. The few they know has been teached them by their programmers. For the rest, their strenght is based on calculation. In fact, more you give them power of calculation and more they become stronger and stronger.
They cannot plan. They search the best lines (they search what they think they are the best lines, even engines make many mistakes in analysis, just need to observe a game with the analysis of some engine).

humans have a vast and deep knowledge of chess.
They perfectly know what is important and what is not. They can plan easily, on short medium and long range.
They can judge, with a little calculation, if an apparently difficult endgame is won, draw or lost.

They suddenly recognize a bad piece (for example a bad bishop blocked by its own pawns, typical of the bishop b7 for black in many endgames and sometimes in middlegame).

They know (and they know what and why are) the main concept of a position on the board and calculation, for humans, it is only a method to be sure to go for the right road.

Althought in recent years the positional play of the engines has been improved, i think there is still a huge difference in this field in favour of humans as there is a huge (of course) difference of tactics ability in favour of engines.

The main point is: can humans compensate their relative weakness in tactics with their deep knowledge of basic principle of chess?

Let's make an example: sometimes, some very strong human (some years ago Carlsen) sacrifices its rook for a minor for two reasons: the bad coordination of the opponent pieces and the total control of the light squares.

How many engines would have done the same thing basing on these factors so relatively axtract?

Carlsen had a huge advantage for the whole game but its opponent found a good defence and it was a drew.

But it was an example.

I think a 2800 is not there cause its tactical ability against other humans.
I think he has a very deep knowledge of every corner of a chess game.

If he had a micro chip in the head, he would be almost perfect.

In the same way, if modern engines would have the neurons of a 2800 GMs would be almost perfect.

Mikhail Botvinnik, former world champion (many times) was famous for many things: one of this was that he forced himself to avoid any kind of tactical position against Tal in the rematch for the world title. And he won.

Would have he lost against a computer?

I don't know, who knows, he was strategical and positional (and tactical of course, like everybody).

Tigran Petrosian, probably the best defender of chess history (Bobby Fischer suffered so many times against him)..

The correspondant players are extremely strong, not only nowadays. In history they have been always very strong. The quality of their games have always been at the top.
Why?

It's obvious, because they have a huge amount of time to ponder.

But this thing, what does it mean?

It means simply that humans scales perfectly, that if you give enough time to think, human can explore every corner of the position and very hardly he can make a strategical or tactical mistake.

Everybody, of course, is free to have an idea and i don't think that what i wrote will make someone change its mind.

My hope is just that someone could consider it like a point a new, deeper thinking.

Thank you for reading

Best Regards

With a calculating power of perhaps 2-3 moves per second, the human reaches 2700+ Elo.

The computer needs perhaps one billion of moves per second to achieve the same rating.

It follows that the human is several millions of times better at chess than the strongest computers. Right?

G.M.

Milos · Post by **Milos** » Mon Feb 27, 2012 5:19 pm

Rebel wrote:In the 80's the branch factor was about 4.0 and the estimated elo gain when doubling speed already was roughly 50-70 elo because an engine became ½ ply deeper. Nowadays with a branch factor of >= 1.5 <= 2.0 a doubling of speed gives a full ply to 1½ ply deeper search, hence the elo gain goes up as well.

EBF is 1.5 to 2 only for low depths. Over mid-game depth of 30, EBF goes much higher since LMR is not so effective and null move is almost ineffective.
Hence diminishing returns, hence lower gain per doubling.
Moreover, stronger program, more diminishing returns, less gain from doubling.
As a consequence for stronger today's programs and TCs where depths go reasonably high (certainly over 20 in mid-game) 50-70 Elo per doubling still holds.
And if you don't believe me, take Houdini (or SF or Komodo) on a strong machine (at least strong i7) and test in 20 mid-game positions (1 core test for repeatability). Measure a time it takes to reach certain depth (but to be at least 20 mins of thinking time per position so you reach a respectable depth, this is on a single core therefore you have to give it a reasonably long time). Than double that time for each position and notice in how many positions the engine is able to complete 1 ply deeper search.
After that we can talk about your nonsensical 1½ ply deeper search per doubling...

P.S. It's not nice to mock other ppl when you yourself is also prone to write a lot of technical nonsense on this forum...

Milos · Post by **Milos** » Mon Feb 27, 2012 6:40 pm

Adam Hair wrote:I used these time controls because I needed to know the change in Elo with thinking times in this range. I do not expect that the Elo increase holds for longer time controls. But this is what I had in mind when I said "shorter thinking times". However, due to search improvements and hardware improvements, I would assume that the depths reached at these time controls may not be inferior to those reached in the tests that determined the 50 to 70 Elo increase per doubling. Which I assumed occurred before How Computers Play Chess was published, where this "fact" was stated.

When the article was published EBFs were much higher at those depths (which you effectively test with your ultra-short TCs) than today. That's one reason you should not test at those TC's (at longer TC's average EBF is a much better representation of the strength of the program).
Second reason is that at ultra-short TCs (where time per move is much lower than 1sec) a time you dedicate to a move is much smaller than the real time engine spends searching. That's because the overhead of interface, time management, uncertainty of time measurement, SMP implementation, etc. plays much higher % in total time per move engine has. This reduces when time per move is increased and when time per move goes over 1 sec it becomes practically negligible.

By the way, I think that extremely fast time controls are used in engine tuning in order to accumulate enough games in a reasonably amount of time in order to achieve some statistical significance. At least that is the rationale most authors use.

It's both. In order to see if the change is positive or not (to be included in the release) you need that absolute value of a change is higher than the variance of the measurement of that value. Variance you reduce by higher number of games (which is faster with ultra-short TCs) and absolute value you increase (exaggerate) with ultra-short TCs.

In my tests, each engine played every other engine at the same time control. Based on those results, the opponents for a certain engine at a certain time control were determined. Those opponents were in a +/- 100 Elo interval from the intermediate rating for that engine/TC.

That is wrong setup.
An engine that you test at different TCs has to have same opponents all the time and its opponents need to have the same thinking time in all the matches, i.e. thinking time of the opponent has to be independent of the thinking time of the tested engine.
Therefore I suggest you select the pool of 5 test engines (with the strength range from the fastest TC of tested engine to slowest TC of tested engine) and test then against a single engine that plays at 3 TCs, for example at 1'+1'', 2'+2'' and 4'+4''.
P.S. All the tests should be done on a single core per engine (no SMP involved).

Rebel · Post by **Rebel** » Mon Feb 27, 2012 8:21 pm

Milos wrote:
Rebel wrote:In the 80's the branch factor was about 4.0 and the estimated elo gain when doubling speed already was roughly 50-70 elo because an engine became ½ ply deeper. Nowadays with a branch factor of >= 1.5 <= 2.0 a doubling of speed gives a full ply to 1½ ply deeper search, hence the elo gain goes up as well.
EBF is 1.5 to 2 only for low depths. Over mid-game depth of 30, EBF goes much higher since LMR is not so effective and null move is almost ineffective. Hence diminishing returns, hence lower gain per doubling.

First of all we are not talking about a TC of 20 minutes per move but normal TC's (as Adam used) and the main reason for diminishing returns at such high TC's of 20 min is not LMR or null-move but a full hash table.

Furthermore I don't understand your comment about "null move is almost ineffective", the deeper the tree, the more recursive null-moves you get.

After that we can talk about your nonsensical 1½ ply deeper search per doubling...

P.S. It's not nice to mock other ppl when you yourself is also prone to write a lot of technical nonsense on this forum...

There seems to be a contradiction in your last 2 sentences

Werewolf · Post by **Werewolf** » Mon Feb 27, 2012 11:26 pm

Rebel wrote:
kranium wrote: Adam- 100 to 120 Elo per doubling?
Why so negative ?

Especially when it's about something many programmers already know and you apparently don't

In the 80's the branch factor was about 4.0 and the estimated elo gain when doubling speed already was roughly 50-70 elo because an engine became ½ ply deeper. Nowadays with a branch factor of >= 1.5 <= 2.0 a doubling of speed gives a full ply to 1½ ply deeper search, hence the elo gain goes up as well.

Can you suggest somewhere where I can read about this? I tried wikipedia, but no joy.

Branching Factors have always confused me because they seem so incredibly low (I would have imagined a factor of 10, a factor of 4 seems barely plausible but 1.5 seems ridiculous or witchcraft)

I don't know much about this - what articles are good, please?

Who is stronger at chess? Computers or Humans?

Who is stronger? Computer or Humans? How much?

Re: Who is stronger at chess? Computers or Humans?

Re: Who is stronger at chess? Computers or Humans?

Re: Who is stronger at chess? Computers or Humans?

Re: Who is stronger at chess? Computers or Humans?

Re: Who is stronger at chess? Computers or Humans?

Re: Who is stronger at chess? Computers or Humans?

Re: Who is stronger at chess? Computers or Humans?

Re: Who is stronger at chess? Computers or Humans?

Re: Who is stronger at chess? Computers or Humans?

Re: Who is stronger at chess? Computers or Humans?