Discussion of anything and everything relating to chess playing software and machines.
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
-
Rebel
- Posts: 5881
- Joined: Thu Aug 18, 2011 10:04 am
Post
by Rebel » Mon Feb 10, 2020 11:10 am
MRI (
Match
Result
Inspector) is a tool to extract valuable information from a PGN engine-engine match provided the PGN output is created by ChessBase, Cutechess or Arena.
It's currently in alpha stage (not downloadable yet) and I prefer some feedback for improvements, more ideas before a beta release.
Functions:
1. Suspect opening lines overview
2. Crazy games (incompatible scores)
3. Games that should have been won
4. Drop in score (horizon effects mostly)
5. Double opening detector
6. Lost games analysis
Examples:
1. Suspect opening lines overview
Time to remove opening lines like these to remove from your test set?
2. Crazy games (incompatible scores)
When 2 engines both show a very good positive score then usually one is wrong.
Obviously ProDeo overestimated its passed pawns evaluation.
Ethereal too optimistic.
3. Games that should have been won
When an engine shows a score of +9.92 one might expect it to win the game, not so for Stockfish 8.
Whoops, a bug.
4. Drop in score (horizon effects mostly)
5. Double opening detector
Code: Select all
1. e4 {book} e6 {book} 2. d4 {book} d5 {book} 3. e5 {book} c5 {book}
4. c3 {book} Nc6 {book} 5. Nf3 {book} Bd7 {book} 6. Be2 {book} f6 {book}
7. O-O {book} Qc7 {book} 8. Re1 {book} O-O-O {book} 9. Bb5 {+0.54/17 7.5s}
and:
Code: Select all
1. e4 {book} e6 {book} 2. d4 {book} d5 {book} 3. e5 {book} c5 {book}
4. c3 {book} Nc6 {book} 5. Nf3 {book} f6 {book} 6. Bd3 {book} Qc7 {book}
7. O-O {book} Bd7 {book} 8. Re1 {book} O-O-O {book} 9. Bb5 {+0.61/17 7.3s}
end in the same start position at move 9.
6. Lost games analysis
This option tries to find the moment where an engine starts to lose, it's not perfect but valuable it is.
90% of coding is debugging, the other 10% is writing bugs.
-
Rebel
- Posts: 5881
- Joined: Thu Aug 18, 2011 10:04 am
Post
by Rebel » Mon Feb 10, 2020 11:17 am
MRI output of a 1000 game match between Stockfish 8 and Komodo 10 at 40/80.
http://rebel13.nl/output.htm
90% of coding is debugging, the other 10% is writing bugs.
-
Ratosh
- Posts: 77
- Joined: Mon Apr 16, 2018 4:56 pm
Post
by Ratosh » Tue Feb 11, 2020 8:32 pm
Great tool! Some things i would like to see:
Report:
- Window for`Drop` (able to find X score drop in Y plies).
- Show FEN positions in html report (easier to see/copy the FEN).
- Phase overview in functions (Like the phase overview, but for functions - e.g. number of score drops per phase)
Functions:
- Reverse games with different outcome (Show first diverged move and score).
Really like pgn output files.
-
Leo
- Posts: 1000
- Joined: Fri Sep 16, 2016 4:55 pm
- Location: USA/Minnesota
- Full name: Leo Anger
Post
by Leo » Tue Feb 11, 2020 8:49 pm
Looks interesting and useful.
Advanced Micro Devices fan.
-
Dann Corbit
- Posts: 12142
- Joined: Wed Mar 08, 2006 7:57 pm
- Location: Redmond, WA USA
-
Contact:
Post
by Dann Corbit » Tue Feb 11, 2020 9:08 pm
Suggestion:
Emit fully decorated EPD (with all the analysis from the logs) for any of the unusual data points found (inverted score, sudden drop, etc.)
It would be useful for building test suites that are tuned to an engines specific problems.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
Rebel
- Posts: 5881
- Joined: Thu Aug 18, 2011 10:04 am
Post
by Rebel » Tue Feb 11, 2020 11:53 pm
Ratosh wrote: ↑Tue Feb 11, 2020 8:32 pm
Great tool! Some things i would like to see:
Report:
- Window for`Drop` (able to find X score drop in Y plies).
Click on view, ir's all in the created PGN.
Ratosh wrote: ↑Tue Feb 11, 2020 8:32 pm
[*]Show FEN positions in html report (easier to see/copy the FEN)
Makes sense.
Ratosh wrote: ↑Tue Feb 11, 2020 8:32 pm
[*]Phase overview in functions (Like the phase overview, but for functions - e.g. number of score drops per phase)
[/list]
I rewrote the Phase Overview, an example of a match when developng Benjamin against Fruit 2.3
Code: Select all
Phase Won Games (numbers) Late Endgame Match
Overview MIDG END1 END2 END3 QUEEN ROOK LIGHT PAWN Score
Fruit 2.3 1065 56 699 281 24 79 62 4 2724.5 (54.5%)
Benjamin 991 24 458 184 21 48 39 2 2275.5 (45.5%)
Phase Won games % Late Endgame Match
Overview MIDG END1 END2 END3 QUEEN ROOK LIGHT PAWN Score
Fruit 2.3 51.8 70.0 60.4 60.4 53.3 62.2 61.4 66.7 2724.5 (54.5%)
Benjamin 48.2 30.0 39.6 39.6 46.7 37.8 38.6 33.3 2275.5 (45.5%)
Depths MIDG END1 END2 END3 BOOK TIME
Benjamin 11.5 12.0 13.6 15.3 8.0 (moves) 0:00
Fruit 2.3 11.6 12.1 15.8 18.7 8.0 (moves) 0:00
What immediately springs in mind is Benjamin's weak point, the endgame and looking at the depths it's likely outsearched.
Ratosh wrote: ↑Tue Feb 11, 2020 8:32 pm
Functions:
- Reverse games with different outcome (Show first diverged move and score).
Nice idea, I am afraid that list will be long.
Ratosh wrote: ↑Tue Feb 11, 2020 8:32 pm
Really like pgn output files.
Thank you.
90% of coding is debugging, the other 10% is writing bugs.
-
Rebel
- Posts: 5881
- Joined: Thu Aug 18, 2011 10:04 am
Post
by Rebel » Tue Feb 11, 2020 11:56 pm
Dann Corbit wrote: ↑Tue Feb 11, 2020 9:08 pm
Suggestion:
Emit fully decorated EPD (with all the analysis from the logs) for any of the unusual data points found (inverted score, sudden drop, etc.)
It would be useful for building test suites that are tuned to an engines specific problems.
Will do.
90% of coding is debugging, the other 10% is writing bugs.
-
Ferdy
- Posts: 4591
- Joined: Sun Aug 10, 2008 1:15 pm
- Location: Philippines
Post
by Ferdy » Wed Feb 12, 2020 8:59 am
Rebel wrote: ↑Mon Feb 10, 2020 11:10 am
6. Lost games analysis
All good, and I like that feature, showing the fen where it first made a suboptimal move.
Two things:
1. A single blunder that cost the game
2. An initial small mistake that leads to defeat
-
Guenther
- Posts: 3938
- Joined: Wed Oct 01, 2008 4:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
-
Contact:
Post
by Guenther » Wed Feb 12, 2020 9:47 am
Rebel wrote: ↑Mon Feb 10, 2020 11:10 am
MRI (
Match
Result
Inspector) is a tool to extract valuable information from a PGN engine-engine match provided the PGN output is created by ChessBase, Cutechess or Arena.
It's currently in alpha stage (not downloadable yet) and I prefer some feedback for improvements, more ideas before a beta release.
...
Functions:
...
3. Games that should have been won
...
The title of 3. is a bit missleading, a part of the games are actually not won, but missevaluated, so there never was a win, but a fata morgana.
The example in your post, might be even worse, a problem on the user side, also indicated by an incomprehensible sudden depth loss.
(ofc I wont rule out completely a hash bug in SF8)
Strange low depths single outsiders would be an interesting thing to find anyway ;-)
(do this since long with some stats sheets to sanity check pgn files found in talkchess from time to time)
It is impossible for me to reproduce that incredible score with SF8.
Neither with 5men Syzygy nor with no TBS at all. The score is always between 0.17 and 0.33 for all depths up to 60.