A new way to compare chess programs

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: A new way to compare chess programs

Post by MM »

Don wrote:
MM wrote:
Mike S. wrote:
MM wrote:P.S. Don't forget the tactics, it is the weak part of your engine.
The following test result indicates the opposite:

http://rybkaforum.net/cgi-bin/rybkaforu ... pid=414852

Most results are from four cpu cores, but we find Komodo 4's result on one core only (64/100) not far from e.g. Zappa Mexico II (65) or Rybka 4.1 (70) which used four cores each. Komodo 3 scored even better: 71/100. :mrgreen:

An engine which is known to have both tactical style and tactical strength, Spark 1.0 (but it's clearly weaker than Komodo overall, in games) solved 79 on four cores.

The test is not public but I think it is quite difficult. The best result so far was 90/100.
Thank you for the link.
In my opinion this list in pretty unsound mainly for 3 reasons:

1.It maxes a mix between 1 core and 4 cores engines.
2. You can't be sure that if an engine solves a certain number of tests with one core, will be able to solve many others on 4 cores.
3. Not all tactical tests are identical. I mean, one engine can be able to solve a certain kind of tactical test and not able to solve another, it could depend by the ''theme'' of the test.

Anyway i'm pretty confident regarding what i wrote because i watched with my eyes hundred games of Komodo 4 (that i bought) and saw its tactical weakness, at least against Houdini.

I simply made a comparison between the strenght in tactics and the strenght in positional play of Komodo, so i see that the problem of Komodo is mainly in tactics.
Best Regards
When you see such a position, put it in a file with a description and fen and send it to us please when you get a few. We are only interested if the move represents a real blunder, not a move that loses in a losing position anyway.

We have had too many people send us examples that did hold up by this measure, they would just show us positions where Komodo was already losing and then Komodo would play some move that would be met by a spectacular response - making the move look like a terrible blunder (but it was already in a deal lost position.) So please make sure you have a legitimate blunder and not just a move that you don't like. There were 2 or 3 shown on this forum and I refuted them all by showing that ALL moves lose.

The converse happens too, someone showed us a position where Komodo "missed" the winning move - but there was nothing wrong with Komodo's move, it was just not as spectacular as the more obvious winning move.

But we are always interested in legitimate examples so please feel free to bundle up some examples and send them to us.
Hi Don,

generally speaking i understand your statement but i partially don't agree with your reasoning.

I mean, when Komodo is in a bad postion (lets say -0.60) and doesn't see anything and moves and its opponent moves scoring (-2.10) for Komodo, well in this case i call it tactical weakness unless Komodo had no other moves to prevent the ''killer'' move of its opponent.

I mean, when i talk about tactical weakness, i dont mean missing spectacular moves when many other moves win the same. I mean NOT seing winning move when there are and moving something else, but if this ''something else'' in not clearly winning then i call it tactical mstake.

I mean, it's not question of ''choice'' of the move , Komodo or any other engine does see or doesn't see. It's not particularly relevant if it chooses the 2nd or the 3rd winning move but it misses it, it's a tactical mistake.

More, i think it's not clear the concept ''lost position''. A position can ''appear'' lost (let's say -1.50) and Komodo plays and its opponent moves scoring -3.90. In this case, perhaps the position was already lost but perhaps it wasn't. What really makes the difference is if Komodo sees or doesn't see the killer move.

And more, tactical ability is also to be able to drive the position to have a huge power on a certain side of the board, if it is positionally correct, even much before any dramatic tactical conclusion (like Alekhine used to do). Not only see or not to see the final tactical shot.

When i have some time, i will run a match against Houdini and send you all interesting situations i found but i have Komodo 4, so i really don't know how much helpful it could be because you already own a much stronger version under works.

Thank you

Best Regards
MM
Uri Blass
Posts: 10316
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: A new way to compare chess programs

Post by Uri Blass »

Maurizio,I disagree with you about the definition of tactical ability.
I disagree about the following:
" tactical ability is also to be able to drive the position to have a huge power on a certain side of the board, if it is positionally correct, even much before any dramatic tactical conclusion"

I consider it to be positional ability and not tactical ability and probably the solution to this problem is to improve the evaluation function and not to modify the search.

I understand that
Don does not care about moves that do not change the practical result.

If you can show him cases when komodo lost a game against houdini when houdini in the same position could make a draw against itself then it is more interesting for him.

It is the same for me

In the last tournament game that I lost

I got the following position

[D]r3r1k1/p1pn1ppp/4q3/1p1pPPB1/2nP4/P1PQ2N1/6PP/R3R1K1 b - - 0 19


I lost after 19...Qb6 20.Kh1 Qc6 21.Nh5 f6 22.exf6 Nxf6 23.Bxf6

21...f6 and 22...Nxf6 are tactical mistakes based on computer analysis
but I think that I lost the game because of positional weakness and not because of tactics because moves that the computer suggest give no hope for black.

Note that many chess programs do not evaluate the diagram correctly
and give evaluation close to equality in the diagram but black is clearly losing.

Komodo3(I do not have 4) is relatively better than houdini here and at least gives at least something near 0.5 pawns for white or better at depth 10 or higher.

with houdini I can get only scores near 0.25 pawns for white if I do not search for a big depth and the problem here is evaluation.

Stockfish seems to have better evaluation here and it can get in a few seconds score of more than +1 pawns(I know that pawns of different programs are not equivalent but +1 of stockfish is more than +0.5 of komodo that is more than +0.25 of houdini).
User avatar
Master Om
Posts: 450
Joined: Wed Nov 24, 2010 10:57 am
Location: INDIA

Re: A new way to compare chess programs

Post by Master Om »

Uri Blass wrote:Maurizio,I disagree with you about the definition of tactical ability.
I disagree about the following:
" tactical ability is also to be able to drive the position to have a huge power on a certain side of the board, if it is positionally correct, even much before any dramatic tactical conclusion"

I consider it to be positional ability and not tactical ability and probably the solution to this problem is to improve the evaluation function and not to modify the search.

I understand that
Don does not care about moves that do not change the practical result.

If you can show him cases when komodo lost a game against houdini when houdini in the same position could make a draw against itself then it is more interesting for him.

It is the same for me

In the last tournament game that I lost

I got the following position

[D]r3r1k1/p1pn1ppp/4q3/1p1pPPB1/2nP4/P1PQ2N1/6PP/R3R1K1 b - - 0 19


I lost after 19...Qb6 20.Kh1 Qc6 21.Nh5 f6 22.exf6 Nxf6 23.Bxf6

21...f6 and 22...Nxf6 are tactical mistakes based on computer analysis
but I think that I lost the game because of positional weakness and not because of tactics because moves that the computer suggest give no hope for black.

Note that many chess programs do not evaluate the diagram correctly
and give evaluation close to equality in the diagram but black is clearly losing.


Komodo3(I do not have 4) is relatively better than houdini here and at least gives at least something near 0.5 pawns for white or better at depth 10 or higher.

with houdini I can get only scores near 0.25 pawns for white if I do not search for a big depth and the problem here is evaluation.

Stockfish seems to have better evaluation here and it can get in a few seconds score of more than +1 pawns(I know that pawns of different programs are not equivalent but +1 of stockfish is more than +0.5 of komodo that is more than +0.25 of houdini).
First Of All Playing Tactics in a positional set up and Playing positionally in a Tactical Set up are to different thing. tactics and positional Judgement although are closely related there are differences on implementing. Houdini is good in both aspects with a balance i.e. it has equal ability in both but not best. Komodo is weak in both because its a positonal program. It seeks for positional moves even in tactics is there. But it gets there a bit late. Houdini and Komodo 4 both have poor king safety eval.
Stockfish is different in this aspect.


New game
r3r1k1/p1pn1ppp/4q3/1p1pPPB1/2nP4/P1PQ2N1/6PP/R3R1K1 b - - 0 1

Analysis by SFish 0412f4a x64 SSE4.2:

19...Qb6 20.Kh1 f6 21.e6 Qd6 22.exd7 Qxd7 23.Bf4 c6 24.Reb1 a5 25.a4 bxa4 26.Rxa4 c5 27.Qc2 cxd4 28.cxd4 Rac8 29.Qb3 Qa7 30.Qc3 Qd7 31.Qa1 Re7 32.Kg1 Ra8 33.h4 Rae8 34.Ra2 h6 35.Rf1
+- (2.50) Depth: 32/56 00:08:13 3606mN
(Prakash, Bhubaneswar 23.06.2012)


New game
r3r1k1/p1pn1ppp/4q3/1p1pPPB1/2nP4/P1PQ2N1/6PP/R3R1K1 b - - 0 1

Analysis by Houdini 2.0c Pro x64 Z:

19...Qb6 20.Kh1 h6 21.Bh4 g5 22.fxg6 Qxg6 23.Qf3 c6 24.Nf5 Nf8 25.Ne7+ Rxe7 26.Bxe7 Ne6 27.Qf2 a5 28.Rf1 Ra7 29.Bh4 Nf8 30.Bf6 Ne6 31.Qf5 Ra8 32.Qxg6+ fxg6 33.Be7 Re8 34.Bd6
+- (1.67) Depth: 21/39 00:00:00 1554kN
(Prakash, Bhubaneswar 23.06.2012)

Within 10 secs....
New game
r3r1k1/p1pn1ppp/4q3/1p1pPPB1/2nP4/P1PQ2N1/6PP/R3R1K1 b - - 0 1

Analysis by Houdini 2.0c Pro x64 GTB:

19...Qb6 20.Kh1 h6 21.Bh4 Nf8 22.Nh5 Nh7 23.f6 g5 24.Bg3 Kh8 25.Qf3 Nf8 26.Qxd5 Ne6 27.Qe4 Qa5 28.Qd3 c6 29.a4 Nb2 30.Qc2 Nxa4
+/- (0.94) Depth: 18/52 00:00:11 120mN
(Prakash, Bhubaneswar 23.06.2012)


New game
r3r1k1/p1pn1ppp/4q3/1p1pPPB1/2nP4/P1PQ2N1/6PP/R3R1K1 b - - 0 1

Analysis by Sting SF 120610:

19...Qb6 20.Kh1 h6 21.Bf4 a5 22.Nh5 a4 23.Qf3 Qc6 24.Qg3 g5 25.e6 fxe6 26.Bxg5 hxg5 27.Qxg5+
+- (1.81 --) Depth: 21 00:00:19 92307kN
(Prakash, Bhubaneswar 23.06.2012)


My Zappa personality thinks weird!!
New game
r3r1k1/p1pn1ppp/4q3/1p1pPPB1/2nP4/P1PQ2N1/6PP/R3R1K1 b - - 0 1

Analysis by Zappa Mexico II x64 EKS-500:

19...Ncxe5 20.dxe5 Nxe5 21.fxe6 Nxd3 22.Red1 Ne5 23.Rxd5 Rxe6 24.Rxb5 a6 25.Rbb1 f6 26.Bf4 g5 27.Bxe5 Rxe5
+- (1.99) Depth: 16/51 00:00:30 130mN
(Prakash, Bhubaneswar 23.06.2012)



New game
r3r1k1/p1pn1ppp/4q3/1p1pPPB1/2nP4/P1PQ2N1/6PP/R3R1K1 b - - 0 1

Analysis by Spark-1.0 KA:

19...Qb6 20.Nh5 Rxe5 21.Rxe5 Ndxe5 22.Qg3 Ng6 23.Rf1 Qd6 24.Qh3 Nf8 25.Qg4 Ng6 26.Nxg7 Kxg7 27.fxg6 Qxg6 28.Bf6+ Kg8 29.Qf4 Qe4 30.Qg5+ Qg6 31.Qxd5 Nb6
+- (4.01 --) Depth: 20/42 00:00:23 352mN
(Prakash, Bhubaneswar 23.06.2012)
Always Expect the Unexpected
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: A new way to compare chess programs

Post by MM »

Uri Blass wrote:Maurizio,I disagree with you about the definition of tactical ability.
I disagree about the following:
" tactical ability is also to be able to drive the position to have a huge power on a certain side of the board, if it is positionally correct, even much before any dramatic tactical conclusion"

I consider it to be positional ability and not tactical ability and probably the solution to this problem is to improve the evaluation function and not to modify the search.
Hi Uri, probably i expressed bad. I meant that one (Houdini for example) is predisposed to create pressure on one side, in order to use its tactical weapons, following its style.

Sometimes engines underestimate the ''weight'' of many pieces concentrating around the area of the opponent's king but many times the attacker finds a tactical shot when maybe the evaluation of the defender is good.

I think positional ability can also drive to positions in which sooner or later there is a tactical shot but can also drive to winning positions when there's no need of tactical shots.

Best Regards
MM
overtond
Posts: 22
Joined: Fri Oct 08, 2010 7:10 pm

Re: A new way to compare chess programs

Post by overtond »

Larry, you mention K5 as being released at some point - do you have any information on K5 in terms of its strengths/weaknesses from a chess perspective. I realise you have a vested interest here but I welcome your views (privately if you prefer) on where you think Komodo is at in terms of a chess analysis partner. Also, any thoughts yet on when this will be available and price would also be very useful.

Cheers,
David