MEA and temere.epd

Rebel · Post by **Rebel** » Tue Apr 07, 2020 7:53 pm

I created another set, 5000 random positions from a 9.7 million EPD database. The pattern remains the same.

http://rebel13.nl/mea/dc1-100ms.html
http://rebel13.nl/mea/dc1-250ms.html
http://rebel13.nl/mea/dc1-500ms.html

I will create 2 more of 5000 and if the pattern remains the same we end up with 20,000 positions which maybe (emphasis added) can serve as a second opinion for engine changes. 5000 positions at 100ms takes 10 minutes, 20,000 (just) 40-45 minutes.

Rebel · Post by **Rebel** » Wed Apr 08, 2020 10:24 am

I have now 3 x 5000 sets. First try to research if the system can serve as a second opinion for eng-eng testing.

Ethereal 11.75 vs Ethereal 12 is interesting.

Code: Select all

Time control  |  ET11.75 | ET12
CCRL blitz    |   3348   | 3374
CCRL 40/15    |   3266   | 3261

I suppose Andrew is not happy with the 40/15 result.

Three runs with MEA, each 5000 positions,in all 3 cases Ethereal 12 scored better than 11.75

Code: Select all

    EPD  : epd\temere.epd
    Time : 100ms
                                                      Solving    Max   Total   Time   Hash          
    Engine           Score   Used Time Found   Pos     Time     Score   Rate    ms     Mb  Cpu  CCRL
 1  Ethereal 12      79920  00:10:21.8  2664  4975  00:00:33.3  149250  0.535    100   128    1  3261
 2  Ethereal 11.75   78660  00:10:21.3  2622  4975  00:00:33.0  149250  0.527    100   128    1  3266

Code: Select all

    EPD  : epd\dc1.epd
    Time : 100ms
                                                      Solving    Max   Total   Time   Hash          
    Engine           Score   Used Time Found   Pos     Time     Score   Rate    ms     Mb  Cpu  CCRL
 1  Ethereal 12      92850  00:10:25.2  3095  5000  00:00:25.4  150000  0.619    100   128    1  3261
 2  Ethereal 11.75   91500  00:10:25.0  3050  5000  00:00:24.2  150000  0.610    100   128    1  3266

Code: Select all

    EPD  : epd\dc-112.epd
    Time : 100ms
                                                      Solving    Max   Total   Time   Hash          
    Engine           Score   Used Time Found   Pos     Time     Score   Rate    ms     Mb  Cpu  CCRL
 1  Ethereal 12      93450  00:10:25.4  3115  5000  00:00:23.4  150000  0.623    100   128    1  3261
 2  Ethereal 11.75   91470  00:10:25.3  3049  5000  00:00:23.6  150000  0.610    100   128    1  3266

Next Vajolet 2.4, 2.5, 2.6, 2.7 and 2.8

Terje · Post by **Terje** » Wed Apr 08, 2020 10:57 am

Rebel wrote: ↑Wed Apr 08, 2020 10:24 am I have now 3 x 5000 sets. First try to research if the system can serve as a second opinion for eng-eng testing.

Ethereal 11.75 vs Ethereal 12 is interesting.
Code: Select all
Time control  |  ET11.75 | ET12
CCRL blitz    |   3348   | 3374
CCRL 40/15    |   3266   | 3261
I suppose Andrew is not happy with the 40/15 result.

...

Current CCRL 40/15:
3266 +-15
3261 +-18

These error bars are so big that a comparison of the two scores is pointless; I doubt Andrew thinks much of it. 12.00 has scored ~20 elo better in most lists

Dann Corbit · Post by **Dann Corbit** » Fri Apr 10, 2020 8:39 pm

In this archive:

is a small subset of the temere.epd file (1440 records) which I analyzed with LC0 at 2 minutes each.
I had multi-pv set to 5

This is the output for the first record:

Code: Select all

5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -23; bm Re3; pv Re3 Rf2 Rc3 Ng6 Re1 Nf4 h3 h5 Rg3 Ng6 Rc3 h4 Kg1 Rd2 Rf1 Ne5 Kh2 Rd4 Bb5 Kg7 a4 Rd2 Rf4 Raxc2 Rxc2 Rxc2 Kg1 Ng6 Rf5;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -25; bm Re8+; pv Re8+ Kg7 Re3 Rf2 h3 h5 Rc3 Nf5 Re1 Nd4 Kh2 Nxc2 Re4 Nxa3 Rg3+ Kf6 Bd1 Rf5 Ra4 Rd2;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -57; bm Re4; pv Re4 Ng6 Ree1 Rxa3 Ra1 Rc3 Bb3 Rc5 Rgf1 Rxf1+ Rxf1 Ne5 Kg1 Kg7 Ra1 f5 Kf2;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -33; bm Bb5; pv Bb5 Rxa3 h3 Ng6 Kh2 Kg7 Rc1 Rc3 Ra1 Rc5 c4 Ne5 Re4 Rf2;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -34; bm h3; pv h3 Rxa3 Bb5 Ng6 Kh2 Kg7 Rc1 Rc3 Ra1 Rc5 c4 Ne5 Re4 Rf2;

I think it might be interesting to compare with the existing data (likely compiled by stockfish).
I have not done any analysis yet, but I guess we might see that the quiet positions LC0 will do better and the tactical ones SF will fare better (when they disagree, which will be a small percentage of the time).

Dann Corbit · Post by **Dann Corbit** » Fri Apr 10, 2020 8:40 pm

I should mention that I have removed the spurious e.p. flags.
It reduces my data set size, and prevents redundant analysis.

Rebel · Post by **Rebel** » Sat Apr 11, 2020 7:13 am

Dann Corbit wrote: ↑Fri Apr 10, 2020 8:39 pm In this archive:

is a small subset of the temere.epd file (1440 records) which I analyzed with LC0 at 2 minutes each.
I had multi-pv set to 5

This is the output for the first record:
Code: Select all
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -23; bm Re3; pv Re3 Rf2 Rc3 Ng6 Re1 Nf4 h3 h5 Rg3 Ng6 Rc3 h4 Kg1 Rd2 Rf1 Ne5 Kh2 Rd4 Bb5 Kg7 a4 Rd2 Rf4 Raxc2 Rxc2 Rxc2 Kg1 Ng6 Rf5;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -25; bm Re8+; pv Re8+ Kg7 Re3 Rf2 h3 h5 Rc3 Nf5 Re1 Nd4 Kh2 Nxc2 Re4 Nxa3 Rg3+ Kf6 Bd1 Rf5 Ra4 Rd2;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -57; bm Re4; pv Re4 Ng6 Ree1 Rxa3 Ra1 Rc3 Bb3 Rc5 Rgf1 Rxf1+ Rxf1 Ne5 Kg1 Kg7 Ra1 f5 Kf2;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -33; bm Bb5; pv Bb5 Rxa3 h3 Ng6 Kh2 Kg7 Rc1 Rc3 Ra1 Rc5 c4 Ne5 Re4 Rf2;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -34; bm h3; pv h3 Rxa3 Bb5 Ng6 Kh2 Kg7 Rc1 Rc3 Ra1 Rc5 c4 Ne5 Re4 Rf2;
I think it might be interesting to compare with the existing data (likely compiled by stockfish).
I have not done any analysis yet, but I guess we might see that the quiet positions LC0 will do better and the tactical ones SF will fare better (when they disagree, which will be a small percentage of the time).

Or the reverse way, short opening lines, Lc0 on top, Stockfish on place 7, even after Gideon

http://rebel13.nl/rebel13/oke.html

Rebel · Post by **Rebel** » Fri Apr 17, 2020 7:53 am

Recap from the OP -

MEA is a tool (written by Ferdinand Mosca) that analyses an EPD position set in STS-style assigning bonus points pre-defined in each EPD record.

temere.epd (Latin for random) is a 4975 position set created from a much bigger random EPD collection in an intelligent way with as goal to produce a reasonable reliable ranking list of engines with an estimated error bar of -25/+25 elo at fast time controls like 100ms, 250ms, 500ms etc.

http://rebel13.nl/misc/mea.html

----------

I think (without boasting) I succeeded reasonable well. Next step, trying the impossible, narrow the gap to -5/+5 elo in order that it can serve as a second opinion vs regular volume eng-eng testing and/or a tool to quickly find out if a program change makes sense to start the long volume eng-eng testing.

Two examples, Stockfish 5 since it is the last version that has UCI support for tuneable evaluation parameters like Mobility, King Safety and 2) the free Komodo 10 with the tuneable 'Dynamism' setting.

Komodo 'Dynamism' first. The default setting is 110. The assumption is that 110 is a well tuned value and that the system will report the 110 setting as best. With the first set (about 10,000 positions) this is the case.

Code: Select all

    EPD  : edp\set1.epd
    Time : 1000 ms
                                                      Solving     Max   Total   Time   Hash          
    Engine           Score   Used Time  Found   Pos     Time     Score   Rate    ms     Mb   Cpu  CCRL
 1  K10-default      151009  02:44:00.6  5248  9706  00:13:47.8  291180  0.519   1000   128    1  3266
 2  K10-Dyn-115      150887  02:44:00.7  5231  9706  00:13:03.8  291180  0.518   1000   128    1  3266
 3  K10-Dyn-100      149507  02:44:00.6  5205  9706  00:14:17.6  291180  0.513   1000   128    1  3266
 4  K10-Dyn-120      149096  02:44:00.7  5168  9706  00:12:50.4  291180  0.512   1000   128    1  3266
 5  K10-Dyn-105      148766  02:44:00.7  5171  9706  00:13:41.9  291180  0.511   1000   128    1  3266
 6  K10-Dyn-125      148704  02:44:00.6  5149  9706  00:12:33.1  291180  0.511   1000   128    1  3266
 7  K10-Dyn-90       147766  02:44:00.7  5129  9706  00:13:10.9  291180  0.507   1000   128    1  3266
 8  K10-Dyn-75       145564  02:44:00.6  5059  9706  00:13:28.7  291180  0.500   1000   128    1  3266

Or in more eye-pleasing html - http://rebel13.nl/mea/k10-dyna-1000ms.html

Next Stockfish 5, we tune 8 mobility settings in one run (8 threads), once again the assumption is that the default setting 100 is well tuned.

Code: Select all

    EPD  : epd\set1.epd
    Time : 1000 ms
                                                      Solving     Max    Total   Time   Hash          
    Engine           Score   Used Time Found   Pos     Time      Score    Rate    ms    Mb   Cpu CCRL
 1  sf5-default      150273  02:44:01.0  5189  9706  00:12:29.6  291180  0.516   1000   128   1  3200
 2  sf5-mob-80       148315  02:44:00.6  5125  9706  00:12:38.4  291180  0.509   1000   128   1  3200
 3  sf5-mob-95       148172  02:44:00.7  5120  9706  00:12:33.1  291180  0.509   1000   128   1  3200
 4  sf5-mob-105      147959  02:44:00.7  5108  9706  00:12:12.0  291180  0.508   1000   128   1  3200
 5  sf5-mob-150      147807  02:44:00.6  5097  9706  00:11:50.2  291180  0.508   1000   128   1  3200
 6  sf5-mob-125      146920  02:44:00.6  5071  9706  00:12:07.7  291180  0.505   1000   128   1  3200
 7  sf5-mob-115      145874  02:44:00.9  5037  9706  00:12:10.2  291180  0.501   1000   128   1  3200
 8  sf5-mob-60       145792  02:44:00.8  5031  9706  00:11:52.1  291180  0.501   1000   128   1  3200

And once again the default setting nicely on top.

more eye-pleasing - http://rebel13.nl/mea/sf5-mob-1000ms.html

In both cases, 8 settings tested in one run in 90-95 minutes, how long does 7 eng-eng matches of 20,000-30,000 bullet games take?

--------

And now I am looking for 2 more top-engines with tuneable paramters to make sure these 2 cases are not a lucky random case to fool me.

Any good suggestions?

Alayan · Post by **Alayan** » Fri Apr 17, 2020 8:40 am

Can you compile yourself if provided with code ? I could have a look at creating a branch with some tunable parameter in Ethereal if so.

Rebel · Post by **Rebel** » Fri Apr 17, 2020 9:12 am

Alayan wrote: ↑Fri Apr 17, 2020 8:40 am Can you compile yourself if provided with code ? I could have a look at creating a branch with some tunable parameter in Ethereal if so.

Oh great, but I am a github dummy, so please provide a Windows executable, I will PM you my mail address.

xr_a_y · Post by **xr_a_y** » Fri Apr 17, 2020 9:13 am

xr_a_y wrote: ↑Tue Apr 07, 2020 2:05 pm Not using given scores for alternatives moves, Minic @1sec per position is finding 3147 best moves (the are 4975 fen in the file). Isn't that too much good results for Minic ?

Sorry to come back with the same question, but it is normal that Minic 1sec search on 1 thread is performing 3127/4975 best move only ? Isn't that breaking the +-/25 elo estimate ?

MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd