ChessGUI 0.245f is available

Ryan Benitez · Post by **Ryan Benitez** » Tue Dec 17, 2013 5:55 pm

mcostalba wrote:
hgm wrote: There is a clear standard for how scores should be reported, however (centiPawn), both in WB protocol and UCI. If engines report scores that should be multiplied by a fixed factor to be meaningfully compared with other engine scores, these engines are non-compliant, and should be fixed.
You have the obnoxious attitude to talk about thinks you ignore, throwing in your idea and erroneously think this must be the reality.

Of course you have also the arrogance to never reconsider what you have said previously after people explains to you, but I already know very well these 2 fellows (hand-waving and presumption) go always hand in hand.

P.S: Your argumentation about scores as probability of winning are completely off and confirm you don't know how engines are tuned/developed.

I think there must be some misunderstanding. As I understand it HGM is just trying to push to keep things standardized. I don't see any long term conflict of interest in that.

casaschi · Post by **casaschi** » Tue Dec 17, 2013 6:33 pm

Graham Banks wrote:The other feature that would be useful would be the evaluation given in brackets after each move in the displayed pgn.

It would be much better to have an agreed standard to display engine info into the PGN.

A proposed extension to the PGN standard was suggested years ago: http://www.enpassant.dk/chess/palview/enhancedpgn.htm

According to this proposal, custom "comment tags" could be defined using this pattern within comments:

Code: Select all

&#91;%tag value&#93;

So for the engine evaluation you could have:

Code: Select all

1. e4 &#123; &#91;%eval +0.2&#93; &#125; e5 &#123; &#91;%eval -0.3&#93; &#125;

This has two advantages compared to other custom formats:
1) PGN readers/viewer will have automatic routines to read comment tags and deal with new ones relatively easily
2) everybody will read your PGN and have an easier life understanding the meaning of the values
3) if you have to collate several PGN from different sources you would not have to guess different formats

Just compare how easily you'd process these:

Code: Select all

 1. e4 &#123; +0.2 12 90&#58;00 0.01 e4 c5 c4 e5 &#125;

Code: Select all

 1. e4 &#123; &#91;%eval +0.2&#93; &#91;%depth 12&#93; &#91;%clk 90&#58;00&#93; &#91;%emt 0.01&#93; &#91;%pv e4 c5 c4 e5&#93;&#125;

Matthias Gemuh · Post by **Matthias Gemuh** » Tue Dec 17, 2013 6:43 pm

hgm wrote:... You encourage non-compliancy amongst engine builders, by catering to their bugs. This will eventually lead to chaos.

I think that is a very bad thing. If Stockfish reports wrong scores, it should be fixed in Stockfish.

GUI developers should take responsibility to guard the standards.

A standard cannot exist in scoring because bonuses are awarded on highly subjective bases. This means many users will always bump into engines whose scores look like they need fixing, even if the authors tried to keep standards.
This means the user should be given the opportunity to scale things.
If most GUIs that can analyse had this score scaling feature, SF would be more widely used for analysis (judging from user comments as in TCEC chat).

Edmund · Post by **Edmund** » Tue Dec 17, 2013 6:50 pm

Matthias Gemuh wrote:A standard cannot exist in scoring because bonuses are awarded on highly subjective bases. This means many users will always bump into engines whose scores look like they need fixing, even if the authors tried to keep standards.
This means the user should be given the opportunity to scale things.
If most GUIs that can analyse had this score scaling feature, SF would be more widely used for analysis (judging from user comments as in TCEC chat).

Either give the opportunity to scale to the user or change the standard to force engines to reporting already scaled scores.

Matthias Gemuh · Post by **Matthias Gemuh** » Tue Dec 17, 2013 6:52 pm

casaschi wrote:
Graham Banks wrote:The other feature that would be useful would be the evaluation given in brackets after each move in the displayed pgn.
It would be much better to have an agreed standard to display engine info into the PGN.

...

What Graham is talking about does not involve, nor affect PGN files.
It is stuff that is generated and spit only on screen.

hgm · Post by **hgm** » Tue Dec 17, 2013 7:05 pm

casaschi wrote:It would be much better to have an agreed standard to display engine info into the PGN.

A proposed extension to the PGN standard was suggested years ago: http://www.enpassant.dk/chess/palview/enhancedpgn.htm

According to this proposal, custom "comment tags" could be defined using this pattern within comments:
Code: Select all
&#91;%tag value&#93;
So for the engine evaluation you could have:
Code: Select all
1. e4 &#123; &#91;%eval +0.2&#93; &#125; e5 &#123; &#91;%eval -0.3&#93; &#125;
This has two advantages compared to other custom formats:
1) PGN readers/viewer will have automatic routines to read comment tags and deal with new ones relatively easily
2) everybody will read your PGN and have an easier life understanding the meaning of the values
3) if you have to collate several PGN from different sources you would not have to guess different formats

Just compare how easily you'd process these:
Code: Select all
 1. e4 &#123; +0.2 12 90&#58;00 0.01 e4 c5 c4 e5 &#125;
Code: Select all
 1. e4 &#123; &#91;%eval +0.2&#93; &#91;%depth 12&#93; &#91;%clk 90&#58;00&#93; &#91;%emt 0.01&#93; &#91;%pv e4 c5 c4 e5&#93;&#125; 

The problem is that I consider this absolutely awful. I would rather die than use that, and would not hesitate to vote down the proposal any time. And I don't think I am the only one.

So yes, I agree that it would be preferable to have a standard. But I don't think it should just be any standard, but rather a standard everyone could agree on.

The first format you give is actually extremely easy to parse, provided there is an agreed-upon ordering of the numbers. WinBoard (and I think many other interfaces) use a slightly different format, with a slash, like {+0.20/12 7 ...} where score/depth is (optionally) followed by the time thought about the move in seconds. This always served me very well, and is not too messy if you want to read through the PGN 'bye eye'. IMO the excessive use of brackets, and a totally redundant % sign make the second format totally unreadable.

I also think it is a bad idea to hide the engine PV in a comment; much better to store it as a recursive variation, so that standard PGN variation-tree walking can recognize it and play through the PV like it would play through any variation. If some identification is needed to label it as a PV I would do it in a PGN-compatible way like

({PV} e4 e5 Nf3 Ng6 ...)

but that already looks like 'overdoing' things.

hgm · Post by **hgm** » Tue Dec 17, 2013 7:18 pm

Matthias Gemuh wrote:A standard cannot exist in scoring because bonuses are awarded on highly subjective bases. This means many users will always bump into engines whose scores look like they need fixing, even if the authors tried to keep standards.

Well, if one engine values a passer at 0.1 Pawn, and another at 0.5 Pawn, there is obviously little you can do. Multiplying one score by 5 to erase that difference would make that engine report +5.00 when it is a Pwn ahead in absence of passers, which most certainly would not be what you want. People will have to accept that different engines will value different positional traits in a different way. Otherwise they would not really be different engines. And this cannot be solved by an overall multiplication. This would only help if the engine had a reasonable consisten over- or under-valuation of all possible evaluation terms.

If most GUIs that can analyse had this score scaling feature, SF would be more widely used for analysis (judging from user comments as in TCEC chat).

Seems to me that you are saying here is equivalent to saying that people consider Stockfish as so broken that they even refuse to use it (for analysis), despite the fact that it tops the rating lists. And for no other reason than that it doesn't divide its reported scores by 1.9 itself.

If that is all true, what would be the compelling reason for the Stockfish authors NOT to do that division?

hgm · Post by **hgm** » Tue Dec 17, 2013 7:24 pm

Matthias Gemuh wrote:What Graham is talking about does not involve, nor affect PGN files.
It is stuff that is generated and spit only on screen.

Indeed, I think this is what he means. But isn't that what most GUIs already (optionally) do? E.g. in the WinBoard Move History window (so not the PGN!) it looks like this:

Code: Select all

1. Kd2 &#123;+0.20/15&#125; Sd8 2. a4 &#123;+0.08/17&#125; b6 3. Sc2 &#123;+0.16/16&#125; b5 4. Sf2 &#123;+0.28/17&#125; Gg8 5. h4 &#123;+0.24/16&#125; g6 6. h5 &#123;+0.28/15&#125; Sf8 7. h6 &#123;+0.44/16&#125; hxh6 8. Rxh6 &#123;+0.40/15&#125; Kf9 9. Rh2 &#123;+0.80/13&#125; P@h7 10. Gde2 &#123;+0.88/13&#125; e6 11. P@h6 &#123;+1.52/14&#125; hxh6 12. Rxh6 &#123;+1.24/13&#125; P@h7 13. Rh4 &#123;+1.04/14&#125; Sde7 14. Rg4 &#123;+1.00/13&#125; Ge8 15. a5 &#123;+0.68/15&#125; Sd6 16. i4 &#123;+0.96/14&#125; c6 17. Ni3 &#123;+1.04/13&#125; Sc5 18. Ba3 &#123;+0.84/13&#125; Sg7 19. Re4 &#123;+0.60/13&#125; e5 20. Rxe5 &#123;+1.20/14&#125; b4 21. bxb4 &#123;+0.84/14&#125; Sf6 22. Rh5 &#123;-400.00/15&#125; Sg5 23. Sb3 &#123;+1.36/13&#125; a6 24. axa6 &#123;+1.72/14&#125; Lxa6 25. P@a5 &#123;+0.32/14&#125; Lxa5 26. P@a4 &#123;-0.36/14&#125; P@b2 27. Bxb2 &#123;-400.00/15&#125; Sxb4 28. P@b7 &#123;-2.60/14&#125; Rxb7 29. Sxb4 &#123;-3.32/13&#125; Rxb4 30. S@c1 &#123;-4.40/12&#125; P@a2 31. Lxa2 &#123;-7.20/11&#125; Rb3+ 32. axa5 &#123;-7.84/11&#125; +Rxa2 33. L@e4 &#123;-9.84/11&#125; P@e5 34. Lxe5 &#123;-15.16/11&#125; Bxe5 35. P@b3 &#123;-22.68/12&#125; +Rxb1 36. e4 &#123;-18.24/10&#125; N@c4 37. cxc4 &#123;-21.32/12&#125; Bxb2+ 38. Sxb2 &#123;-21.80/11&#125; +Rxb2 39. Ke3 &#123;-400.00/11&#125; B@a7 40. B@d4 &#123;-21.88/9&#125; S@c3 41. f4 &#123;-22.96/9&#125; Sxd4+ 42. dxd4 &#123;-400.00/10&#125; B@c1 43. Kf3 &#123;-23.24/9&#125; Bxf4+ 44. Kg2 &#123;+0.00/1&#125; P@e3 45. S@f3 &#123;-25.52/9&#125; exe2+ 46. Gxe2 &#123;-28.28/9&#125; Bxd4 47. P@d2 &#123;-27.80/9&#125; Bxf2+ 48. Gxf2 &#123;-22.76/9&#125; +Bxf3 49. Gxf3 &#123;-29.92/9&#125; +Rxd2 50. B@f2 &#123;-400.00/10&#125; S@h3 51. Rxh3 &#123;-399.68/9&#125; S@f1 52. Kxf1 &#123;-399.72/12&#125; S@e2 53. Kg1 &#123;-399.76/12&#125; G@g2 54. Kxg2 &#123;+0.00/1&#125; Sxf3+ 55. Kh1 &#123;-399.84/8&#125; G@g1 56. Ki2 &#123;-399.88/6&#125; +Rxf2 57. B@g2 &#123;-399.92/4&#125; +Rxg2 58. S@h2 &#123;-399.96/2&#125; +Rh1#

The stuff in braces can be displayed in a lighter font.

michiguel · Post by **michiguel** » Tue Dec 17, 2013 7:42 pm

hgm wrote:
Matthias Gemuh wrote:A standard cannot exist in scoring because bonuses are awarded on highly subjective bases. This means many users will always bump into engines whose scores look like they need fixing, even if the authors tried to keep standards.
Well, if one engine values a passer at 0.1 Pawn, and another at 0.5 Pawn, there is obviously little you can do. Multiplying one score by 5 to erase that difference would make that engine report +5.00 when it is a Pwn ahead in absence of passers, which most certainly would not be what you want. People will have to accept that different engines will value different positional traits in a different way. Otherwise they would not really be different engines. And this cannot be solved by an overall multiplication. This would only help if the engine had a reasonable consisten over- or under-valuation of all possible evaluation terms.

That is what happens. When many engines hover around values of 0.5, Houdini is around 0.25, and SF is around 1.0ish.

No engine is right or wrong, but they do differ quite consistently among each other.
Miguel

If most GUIs that can analyse had this score scaling feature, SF would be more widely used for analysis (judging from user comments as in TCEC chat).
Seems to me that you are saying here is equivalent to saying that people consider Stockfish as so broken that they even refuse to use it (for analysis), despite the fact that it tops the rating lists. And for no other reason than that it doesn't divide its reported scores by 1.9 itself.

If that is all true, what would be the compelling reason for the Stockfish authors NOT to do that division?

hgm · Post by **hgm** » Tue Dec 17, 2013 7:54 pm

michiguel wrote:No engine is right or wrong, but they do differ quite consistently among each other.

Well, I would say that when an engine does not hover around +1.00 when presented with a position where it is a Pawn ahead without any compensation for the opponent (such as classical Pawn odds, deleting f2 or f7 from the opening position), but says +0.50 or +2.00 instead, it is clearly wrong. If engines systematically differ a factor between their score, it should not be very difficult to find out which one is miscalibrated, using such a test.

I don't see what the advantage is to have some engines systematically reporting about double or half of what other. And it is aparently believed by Matthias that users also do not appreciate that very much, to the point where they even do not want to use the engine at all. So it seems that this is a serious problem, that warrants fixing, and people don't just shrug it off.

I just don't understand why it would be considered a good solution to have a dozen GUIs 'fix' this problem (actually meaning they just offer the user to fix the problem, requireing him to determine the needed correction factor himself), while it could be fixed by just changing one engine.

ChessGUI 0.245f is available

Re: ChessGUI 0.245f is available

Re: ChessGUI 0.245f is available

Re: ChessGUI 0.245f is available

Re: ChessGUI 0.245f is available

Re: ChessGUI 0.245f is available

Re: ChessGUI 0.245f is available

Re: ChessGUI 0.245f is available

Re: ChessGUI 0.245f is available

Re: ChessGUI 0.245f is available

Re: ChessGUI 0.245f is available