Page 1 of 3

MRI - Match Result Inspector

Posted: Mon Feb 10, 2020 11:10 am
by Rebel
MRI (Match Result Inspector) is a tool to extract valuable information from a PGN engine-engine match provided the PGN output is created by ChessBase, Cutechess or Arena.

It's currently in alpha stage (not downloadable yet) and I prefer some feedback for improvements, more ideas before a beta release.

Image

Functions:
1. Suspect opening lines overview
2. Crazy games (incompatible scores)
3. Games that should have been won
4. Drop in score (horizon effects mostly)
5. Double opening detector
6. Lost games analysis


Examples:

1. Suspect opening lines overview

Time to remove opening lines like these to remove from your test set?

2. Crazy games (incompatible scores)
When 2 engines both show a very good positive score then usually one is wrong.

Obviously ProDeo overestimated its passed pawns evaluation.


Ethereal too optimistic.

3. Games that should have been won
When an engine shows a score of +9.92 one might expect it to win the game, not so for Stockfish 8.


Whoops, a bug.

4. Drop in score (horizon effects mostly)


5. Double opening detector

Code: Select all

1. e4 {book} e6 {book} 2. d4 {book} d5 {book} 3. e5 {book} c5 {book}
4. c3 {book} Nc6 {book} 5. Nf3 {book} Bd7 {book} 6. Be2 {book} f6 {book}
7. O-O {book} Qc7 {book} 8. Re1 {book} O-O-O {book} 9. Bb5 {+0.54/17 7.5s}
and:

Code: Select all

1. e4 {book} e6 {book} 2. d4 {book} d5 {book} 3. e5 {book} c5 {book}
4. c3 {book} Nc6 {book} 5. Nf3 {book} f6 {book} 6. Bd3 {book} Qc7 {book}
7. O-O {book} Bd7 {book} 8. Re1 {book} O-O-O {book} 9. Bb5 {+0.61/17 7.3s}
end in the same start position at move 9.

6. Lost games analysis
This option tries to find the moment where an engine starts to lose, it's not perfect but valuable it is.

Re: MRI - Match Result Inspector

Posted: Mon Feb 10, 2020 11:17 am
by Rebel
MRI output of a 1000 game match between Stockfish 8 and Komodo 10 at 40/80.

http://rebel13.nl/output.htm

Re: MRI - Match Result Inspector

Posted: Tue Feb 11, 2020 8:32 pm
by Ratosh
Great tool! Some things i would like to see:
Report:
  • Window for`Drop` (able to find X score drop in Y plies).
  • Show FEN positions in html report (easier to see/copy the FEN).
  • Phase overview in functions (Like the phase overview, but for functions - e.g. number of score drops per phase)
Functions:
  • Reverse games with different outcome (Show first diverged move and score).
Really like pgn output files.

Re: MRI - Match Result Inspector

Posted: Tue Feb 11, 2020 8:49 pm
by Leo
Looks interesting and useful.

Re: MRI - Match Result Inspector

Posted: Tue Feb 11, 2020 9:08 pm
by Dann Corbit
Suggestion:
Emit fully decorated EPD (with all the analysis from the logs) for any of the unusual data points found (inverted score, sudden drop, etc.)
It would be useful for building test suites that are tuned to an engines specific problems.

Re: MRI - Match Result Inspector

Posted: Tue Feb 11, 2020 11:53 pm
by Rebel
Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
Great tool! Some things i would like to see:
Report:
  • Window for`Drop` (able to find X score drop in Y plies).
Click on view, ir's all in the created PGN.
Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
[*]Show FEN positions in html report (easier to see/copy the FEN)
Makes sense.
Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
[*]Phase overview in functions (Like the phase overview, but for functions - e.g. number of score drops per phase)
[/list]
I rewrote the Phase Overview, an example of a match when developng Benjamin against Fruit 2.3

Code: Select all

Phase          Won Games (numbers)           Late Endgame           Match
Overview       MIDG END1 END2 END3      QUEEN ROOK LIGHT PAWN       Score
Fruit 2.3      1065   56  699  281         24   79    62    4   2724.5 (54.5%)
Benjamin        991   24  458  184         21   48    39    2   2275.5 (45.5%)

Phase              Won games %               Late Endgame           Match
Overview       MIDG END1 END2 END3      QUEEN ROOK LIGHT PAWN       Score
Fruit 2.3      51.8 70.0 60.4 60.4       53.3 62.2  61.4 66.7   2724.5 (54.5%)
Benjamin       48.2 30.0 39.6 39.6       46.7 37.8  38.6 33.3   2275.5 (45.5%)

Depths         MIDG END1 END2 END3     BOOK             TIME
Benjamin       11.5 12.0 13.6 15.3     8.0 (moves)      0:00
Fruit 2.3      11.6 12.1 15.8 18.7     8.0 (moves)      0:00
What immediately springs in mind is Benjamin's weak point, the endgame and looking at the depths it's likely outsearched.
Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
Functions:
  • Reverse games with different outcome (Show first diverged move and score).
Nice idea, I am afraid that list will be long.
Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
Really like pgn output files.
Thank you.

Re: MRI - Match Result Inspector

Posted: Tue Feb 11, 2020 11:56 pm
by Rebel
Dann Corbit wrote:
Tue Feb 11, 2020 9:08 pm
Suggestion:
Emit fully decorated EPD (with all the analysis from the logs) for any of the unusual data points found (inverted score, sudden drop, etc.)
It would be useful for building test suites that are tuned to an engines specific problems.
Will do.

Re: MRI - Match Result Inspector

Posted: Wed Feb 12, 2020 8:59 am
by Ferdy
Rebel wrote:
Mon Feb 10, 2020 11:10 am
6. Lost games analysis
All good, and I like that feature, showing the fen where it first made a suboptimal move.
Two things:
1. A single blunder that cost the game
2. An initial small mistake that leads to defeat

Re: MRI - Match Result Inspector

Posted: Wed Feb 12, 2020 9:27 am
by Guenther
Ratosh wrote:
Tue Feb 11, 2020 8:32 pm
Great tool! Some things i would like to see:
Report:
  • Window for`Drop` (able to find X score drop in Y plies).
  • Show FEN positions in html report (easier to see/copy the FEN).

    ...
The two points above exist already in Toms Game Analyser and it even has a GUI and eval graphs.

viewtopic.php?f=7&t=66554&p=750330&hili ... er#p750330
viewtopic.php?t=62066&highlight=game+analyser

Re: MRI - Match Result Inspector

Posted: Wed Feb 12, 2020 9:47 am
by Guenther
Rebel wrote:
Mon Feb 10, 2020 11:10 am
MRI (Match Result Inspector) is a tool to extract valuable information from a PGN engine-engine match provided the PGN output is created by ChessBase, Cutechess or Arena.

It's currently in alpha stage (not downloadable yet) and I prefer some feedback for improvements, more ideas before a beta release.

...

Functions:
...
3. Games that should have been won
...
The title of 3. is a bit missleading, a part of the games are actually not won, but missevaluated, so there never was a win, but a fata morgana.
The example in your post, might be even worse, a problem on the user side, also indicated by an incomprehensible sudden depth loss.
(ofc I wont rule out completely a hash bug in SF8)

Strange low depths single outsiders would be an interesting thing to find anyway ;-)
(do this since long with some stats sheets to sanity check pgn files found in talkchess from time to time)

Code: Select all

115... Nc2 +9.92/29
It is impossible for me to reproduce that incredible score with SF8.
Neither with 5men Syzygy nor with no TBS at all. The score is always between 0.17 and 0.33 for all depths up to 60.