Houdini 1.03a The New Nr 1 !!!

tomgdrums · Post by **tomgdrums** » Thu Jul 29, 2010 9:42 pm

Martin Thoresen wrote:
Dann Corbit wrote: Now, running 30 games is a great idea. But then don't imagine that you have decided who is strongest yet.
Dann, this is presicely what I am trying to explain to mr. Taylor.

My matches at tournament time control are not in any way intended to give a final answer to whether engine A is stronger than engine B.

To reach such conclusions at the very slow time control I use, a large number of computers would have to be used in order to get the game numbers high enough within a humane time-frame.

Even with 1500 games the error margins are quite high.

My matches and upcoming tournaments are intended to provide the viewers with some quality chess entertainment, nothing else.

Best Regards,
Martin

That is why I like your matches!! They are fun to follow and feel more like a sporting event than a cold "testing" of computing power.

Thanks for putting these matches on.

Osipov Jury · Post by **Osipov Jury** » Thu Jul 29, 2010 9:42 pm

Milos wrote:There is no obfuscation code in Rybka. That's a provable fact. There are bunch of proofs in disassembled code. Stop making up things.

Rybka 3 has some unused code, which slows down the speed, especially in 32-bit version. Vas simply did not have time to optimize the code and remove unused parts.

Milos · Post by **Milos** » Thu Jul 29, 2010 9:52 pm

Osipov Jury wrote:Rybka 3 has some unused code, which slows down the speed, especially in 32-bit version. Vas simply did not have time to optimize the code and remove unused parts.

Sure it has, it's also a known thing. However, this is neither called obfuscation, nor it reflects its slow down only is fast TC, but everywhere.

S.Taylor · Post by **S.Taylor** » Thu Jul 29, 2010 11:50 pm

Martin Thoresen wrote:
Dann Corbit wrote: Now, running 30 games is a great idea. But then don't imagine that you have decided who is strongest yet.
Dann, this is presicely what I am trying to explain to mr. Taylor.

My matches at tournament time control are not in any way intended to give a final answer to whether engine A is stronger than engine B.

To reach such conclusions at the very slow time control I use, a large number of computers would have to be used in order to get the game numbers high enough within a humane time-frame.

Even with 1500 games the error margins are quite high.

My matches and upcoming tournaments are intended to provide the viewers with some quality chess entertainment, nothing else.

Best Regards,
Martin

I don't even agree that all games are absolutely useless and say absolutely nothing about how an engine or a human plays, before a few thousand games have been played.
If this is the case, there can hardly be much quality chess entertainment.
When a machine moves, you are also interested in how much you think it ought to know why it is moving there, especially if you don't understand it.
It's hardly entertaining when you keep being sure the machine has a mistake, and you break your head trying to see what was behind it, only to realize that OF COURSE it was a stupid move.
You can't have much entertainment when it is nothing but disappointment over and over again. One wants to see things being proven over the board. It's not enough fun just to imagine and imagine no end, and mental masturbation etc. Then in the end you don't any conclusions about the positions and why this or that happened, as it is all just like random nonsense.
You want to see and apreaciate quality, and to know if and when there was or was not quality.
And you don't only want to know who won and how many, but you want to know what it looked like.
When you have all that, you only need to watch a few games to get a better idea how strong the programs are, than simply hearing the results of a few hundred games.
If you have no other interest or feeling or understanding, then of course, just play 10,000 games. But 20 games with quality (and long TC) show more, in almost every way.
Didn't Kasparov say that if a human beats a computer even one game, it shows that the human is superior in understanding.
It's not too much different if you test a small number, if it's done well, and you can see how the games went.
At no point did houdini or Stocfish get a plus score in (i don't think), it was only Rybka inching up and up, steadily.
There is so much to argue about this.
The only thing the statistics do is to get an exact rating. But the likelihood of this being being wildly different to after 20 quality games, is very low.
OK?

mwyoung · Post by **mwyoung** » Thu Jul 29, 2010 11:55 pm

Because Houdini was not clearly weaker in his match. If you think that it was you don't understand statistics, or the rating system.

I have played more games, and I will only speak for myself. Houdini is the strongest program I have every tested. And I have been testing chess program since the early 1980's.

It is the program I now use for all of my chess analysis.

If some don't want the give credit to Houdini, and not use the program. I have no problem with that.

S.Taylor · Post by **S.Taylor** » Fri Jul 30, 2010 12:20 am

mwyoung wrote:Because Houdini was not clearly weaker in his match. If you think that it was you don't understand statistics, or the rating system.

I have played more games, and I will only speak for myself. Houdini is the strongest program I have every tested. And I have been testing chess program since the early 1980's.

It is the program I now use for all of my chess analysis.

If some don't want the give credit to Houdini, and not use the program. I have no problem with that.

When he tested Rybka vs Stockfish, there was hardly any moments that Rybka did not get better positions, and its wins were convincing throught most of the games. SF got a few draws too, and very few rare wins, but they appeared very narrowly achieved, usually.
Houdini 1.02 looked much more up to par. But it, 1.03 has been struggling vs SF like equals. So it looks like there was very little improvement in i ts general play.

I respect you highly, as you have been deeply involved in this many years. But i thought that i was right that it is said much too coldly and easily "but there were not enough games". What happened to expert chess players and analyzers? (who would know BEFORE, what would happen after many many games)

And another of many questions, why isn't it required of humans to play thousands of games before any rating is even suggested for him/her? A GM title could also be given to a player who is only 2150 elo (after a few thousand games), based on 3 norms or whatever. (if it's an error of 200, then 2150 could be seen as even 2550, i think).

mwyoung · Post by **mwyoung** » Fri Jul 30, 2010 12:52 am

If you did get 3 GM norms, and were only 2150. And BTW this would never happen. You would not become a Grandmaster until you have a rating of 2500+. This is also required.

mwyoung · Post by **mwyoung** » Fri Jul 30, 2010 2:04 am

S.Taylor wrote:
Martin Thoresen wrote:
Dann Corbit wrote: Now, running 30 games is a great idea. But then don't imagine that you have decided who is strongest yet.
Dann, this is presicely what I am trying to explain to mr. Taylor.

My matches at tournament time control are not in any way intended to give a final answer to whether engine A is stronger than engine B.

To reach such conclusions at the very slow time control I use, a large number of computers would have to be used in order to get the game numbers high enough within a humane time-frame.

Even with 1500 games the error margins are quite high.

My matches and upcoming tournaments are intended to provide the viewers with some quality chess entertainment, nothing else.

Best Regards,
Martin
I don't even agree that all games are absolutely useless and say absolutely nothing about how an engine or a human plays, before a few thousand games have been played.
If this is the case, there can hardly be much quality chess entertainment.
When a machine moves, you are also interested in how much you think it ought to know why it is moving there, especially if you don't understand it.
It's hardly entertaining when you keep being sure the machine has a mistake, and you break your head trying to see what was behind it, only to realize that OF COURSE it was a stupid move.
You can't have much entertainment when it is nothing but disappointment over and over again. One wants to see things being proven over the board. It's not enough fun just to imagine and imagine no end, and mental masturbation etc. Then in the end you don't any conclusions about the positions and why this or that happened, as it is all just like random nonsense.
You want to see and apreaciate quality, and to know if and when there was or was not quality.
And you don't only want to know who won and how many, but you want to know what it looked like.
When you have all that, you only need to watch a few games to get a better idea how strong the programs are, than simply hearing the results of a few hundred games.
If you have no other interest or feeling or understanding, then of course, just play 10,000 games. But 20 games with quality (and long TC) show more, in almost every way.
Didn't Kasparov say that if a human beats a computer even one game, it shows that the human is superior in understanding.
It's not too much different if you test a small number, if it's done well, and you can see how the games went.
At no point did houdini or Stocfish get a plus score in (i don't think), it was only Rybka inching up and up, steadily.
There is so much to argue about this.
The only thing the statistics do is to get an exact rating. But the likelihood of this being being wildly different to after 20 quality games, is very low.
OK?

With this logic you can make almost any program or anyone better then anyone else. This is subjective fallacy

"Didn't Kasparov say that if a human beats a computer even one game, it shows that the human is superior in understanding."

Since Kasparov logic can not be wrong. I will assume Chess Genius from 1994 was "Superior in chess Understanding" to GM Kasparov. They played 2 games and Chess Genius won 1 1/2 to 1/2.

Because: If (A) beats (B) in even one game. It shows (A) is superior in understanding to (B).

[Event "Intel Chess Grand Prix (active)"]
[Site "London (England)"]
[Date "1994.??.??"]
[EventDate "?"]
[Round "1"]
[Result "0-1"]
[White "Garry Kasparov"]
[Black "Genius (Computer)"]
[ECO "D11"]
[WhiteElo "?"]
[BlackElo "?"]
[PlyCount "120"]

1.c4 c6 2.d4 d5 3.Nf3 Nf6 4.Qc2 dxc4 5.Qxc4 Bf5 6.Nc3 Nbd7
7.g3 e6 8.Bg2 Be7 9.O-O O-O 10.e3 Ne4 11.Qe2 Qb6 12.Rd1 Rad8
13.Ne1 Ndf6 14.Nxe4 Nxe4 15.f3 Nd6 16.a4 Qb3 17.e4 Bg6 18.Rd3
Qb4 19.b3 Nc8 20.Nc2 Qb6 21.Bf4 c5 22.Be3 cxd4 23.Nxd4 Bc5
24.Rad1 e5 25.Nc2 Rxd3 26.Qxd3 Ne7 27.b4 Bxe3+ 28.Qxe3 Rd8
29.Rxd8+ Qxd8 30.Bf1 b6 31.Qc3 f6 32.Bc4+ Bf7 33.Ne3 Qd4
34.Bxf7+ Kxf7 35.Qb3+ Kf8 36.Kg2 Qd2+ 37.Kh3 Qe2 38.Ng2 h5
39.Qe3 Qc4 40.Qd2 Qe6+ 41.g4 hxg4 42.fxg4 Qc4 43.Qe1 Qb3+
44.Ne3 Qd3 45.Kg3 Qxe4 46.Qd2 Qf4+ 47.Kg2 Qd4 48.Qxd4 exd4
49.Nc4 Nc6 50.b5 Ne5 51.Nd6 d3 52.Kf2 Nxg4+ 53.Ke1 Nxh2 54.Kd2
Nf3+ 55.Kxd3 Ke7 56.Nf5+ Kf7 57.Ke4 Nd2+ 58.Kd5 g5 59.Nd6+ Kg6
60.Kd4 Nb3+ 0-1

IGarcia · Post by **IGarcia** » Fri Jul 30, 2010 3:39 am

STaylor is right in several ways.

With low amount of games is not possible to exactly determine the
rating, elo or any other measure. But if a some games are played in
equal conditions, like Martin is doing, we can allow for a moment that the
result be taken as a representative measure (with a big +/- margin) of a
more precise value.

Results like 16-16 18-14 or even 20-12 speaks about engines with
similar strength. And the engine loosing can be a great rival, with
excellent moves, even better than the winner, but loosing endgames
because no endgame bases, or the difference can be a opening book.

So, for the pure rating go play 10.000 games, but the a real good engine
can not be the one wining those thousands blitz

James Constance · Post by **James Constance** » Fri Jul 30, 2010 4:14 am

michiguel wrote:
James Constance wrote: Sorry I've missed these posts, as I sometimes spend time away from the forum - do you have a link? Is there a scientific way of measuring how similar one engine is to another and has this been done in the case of Rybka and Ippolit?

Yes, it has been done, in terms of "move selection".

Just an example of a long discussion
http://www.talkchess.com/forum/viewtopi ... 02&t=32112

Miguel

Thanks - an interesting discussion.

Houdini 1.03a The New Nr 1 !!!

Re: Houdini 1.03a The New Nr 1 !!!

Re: Houdini 1.03a The New Nr 1 !!!

Re: Houdini 1.03a The New Nr 1 !!!

Re: Houdini 1.03a The New Nr 1 !!!

Re: Houdini 1.03a The New Nr 1 !!!

Re: Houdini 1.03a The New Nr 1 !!!

Re: Houdini 1.03a The New Nr 1 !!!

Re: Houdini 1.03a The New Nr 1 !!!

Re: Houdini 1.03a The New Nr 1 !!!

Re: Houdini 1.03a The New Nr 1 !!!