I created another set, 5000 random positions from a 9.7 million EPD database. The pattern remains the same.
http://rebel13.nl/mea/dc1-100ms.html
http://rebel13.nl/mea/dc1-250ms.html
http://rebel13.nl/mea/dc1-500ms.html
I will create 2 more of 5000 and if the pattern remains the same we end up with 20,000 positions which maybe (emphasis added) can serve as a second opinion for engine changes. 5000 positions at 100ms takes 10 minutes, 20,000 (just) 40-45 minutes.
MEA and temere.epd
Moderators: hgm, Rebel, chrisw
-
- Posts: 6995
- Joined: Thu Aug 18, 2011 12:04 pm
Re: MEA and temere.epd
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 6995
- Joined: Thu Aug 18, 2011 12:04 pm
Re: MEA and temere.epd
I have now 3 x 5000 sets. First try to research if the system can serve as a second opinion for eng-eng testing.
Ethereal 11.75 vs Ethereal 12 is interesting.
I suppose Andrew is not happy with the 40/15 result.
Three runs with MEA, each 5000 positions,in all 3 cases Ethereal 12 scored better than 11.75
Next Vajolet 2.4, 2.5, 2.6, 2.7 and 2.8
Ethereal 11.75 vs Ethereal 12 is interesting.
Code: Select all
Time control | ET11.75 | ET12
CCRL blitz | 3348 | 3374
CCRL 40/15 | 3266 | 3261
Three runs with MEA, each 5000 positions,in all 3 cases Ethereal 12 scored better than 11.75
Code: Select all
EPD : epd\temere.epd
Time : 100ms
Solving Max Total Time Hash
Engine Score Used Time Found Pos Time Score Rate ms Mb Cpu CCRL
1 Ethereal 12 79920 00:10:21.8 2664 4975 00:00:33.3 149250 0.535 100 128 1 3261
2 Ethereal 11.75 78660 00:10:21.3 2622 4975 00:00:33.0 149250 0.527 100 128 1 3266
Code: Select all
EPD : epd\dc1.epd
Time : 100ms
Solving Max Total Time Hash
Engine Score Used Time Found Pos Time Score Rate ms Mb Cpu CCRL
1 Ethereal 12 92850 00:10:25.2 3095 5000 00:00:25.4 150000 0.619 100 128 1 3261
2 Ethereal 11.75 91500 00:10:25.0 3050 5000 00:00:24.2 150000 0.610 100 128 1 3266
Code: Select all
EPD : epd\dc-112.epd
Time : 100ms
Solving Max Total Time Hash
Engine Score Used Time Found Pos Time Score Rate ms Mb Cpu CCRL
1 Ethereal 12 93450 00:10:25.4 3115 5000 00:00:23.4 150000 0.623 100 128 1 3261
2 Ethereal 11.75 91470 00:10:25.3 3049 5000 00:00:23.6 150000 0.610 100 128 1 3266
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 347
- Joined: Tue Nov 19, 2019 4:34 am
- Location: https://github.com/TerjeKir/weiss
- Full name: Terje Kirstihagen
Re: MEA and temere.epd
Current CCRL 40/15:Rebel wrote: ↑Wed Apr 08, 2020 10:24 am I have now 3 x 5000 sets. First try to research if the system can serve as a second opinion for eng-eng testing.
Ethereal 11.75 vs Ethereal 12 is interesting.I suppose Andrew is not happy with the 40/15 result.Code: Select all
Time control | ET11.75 | ET12 CCRL blitz | 3348 | 3374 CCRL 40/15 | 3266 | 3261
...
3266 +-15
3261 +-18
These error bars are so big that a comparison of the two scores is pointless; I doubt Andrew thinks much of it. 12.00 has scored ~20 elo better in most lists
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: MEA and temere.epd
In this archive:
is a small subset of the temere.epd file (1440 records) which I analyzed with LC0 at 2 minutes each.
I had multi-pv set to 5
This is the output for the first record:
I think it might be interesting to compare with the existing data (likely compiled by stockfish).
I have not done any analysis yet, but I guess we might see that the quiet positions LC0 will do better and the tactical ones SF will fare better (when they disagree, which will be a small percentage of the time).
is a small subset of the temere.epd file (1440 records) which I analyzed with LC0 at 2 minutes each.
I had multi-pv set to 5
This is the output for the first record:
Code: Select all
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -23; bm Re3; pv Re3 Rf2 Rc3 Ng6 Re1 Nf4 h3 h5 Rg3 Ng6 Rc3 h4 Kg1 Rd2 Rf1 Ne5 Kh2 Rd4 Bb5 Kg7 a4 Rd2 Rf4 Raxc2 Rxc2 Rxc2 Kg1 Ng6 Rf5;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -25; bm Re8+; pv Re8+ Kg7 Re3 Rf2 h3 h5 Rc3 Nf5 Re1 Nd4 Kh2 Nxc2 Re4 Nxa3 Rg3+ Kf6 Bd1 Rf5 Ra4 Rd2;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -57; bm Re4; pv Re4 Ng6 Ree1 Rxa3 Ra1 Rc3 Bb3 Rc5 Rgf1 Rxf1+ Rxf1 Ne5 Kg1 Kg7 Ra1 f5 Kf2;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -33; bm Bb5; pv Bb5 Rxa3 h3 Ng6 Kh2 Kg7 Rc1 Rc3 Ra1 Rc5 c4 Ne5 Re4 Rf2;
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -34; bm h3; pv h3 Rxa3 Bb5 Ng6 Kh2 Kg7 Rc1 Rc3 Ra1 Rc5 c4 Ne5 Re4 Rf2;
I have not done any analysis yet, but I guess we might see that the quiet positions LC0 will do better and the tactical ones SF will fare better (when they disagree, which will be a small percentage of the time).
Last edited by Dann Corbit on Fri Apr 10, 2020 9:04 pm, edited 1 time in total.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: MEA and temere.epd
I should mention that I have removed the spurious e.p. flags.
It reduces my data set size, and prevents redundant analysis.
It reduces my data set size, and prevents redundant analysis.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 6995
- Joined: Thu Aug 18, 2011 12:04 pm
Re: MEA and temere.epd
Or the reverse way, short opening lines, Lc0 on top, Stockfish on place 7, even after GideonDann Corbit wrote: ↑Fri Apr 10, 2020 8:39 pm In this archive:
is a small subset of the temere.epd file (1440 records) which I analyzed with LC0 at 2 minutes each.
I had multi-pv set to 5
This is the output for the first record:I think it might be interesting to compare with the existing data (likely compiled by stockfish).Code: Select all
5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -23; bm Re3; pv Re3 Rf2 Rc3 Ng6 Re1 Nf4 h3 h5 Rg3 Ng6 Rc3 h4 Kg1 Rd2 Rf1 Ne5 Kh2 Rd4 Bb5 Kg7 a4 Rd2 Rf4 Raxc2 Rxc2 Rxc2 Kg1 Ng6 Rf5; 5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -25; bm Re8+; pv Re8+ Kg7 Re3 Rf2 h3 h5 Rc3 Nf5 Re1 Nd4 Kh2 Nxc2 Re4 Nxa3 Rg3+ Kf6 Bd1 Rf5 Ra4 Rd2; 5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -57; bm Re4; pv Re4 Ng6 Ree1 Rxa3 Ra1 Rc3 Bb3 Rc5 Rgf1 Rxf1+ Rxf1 Ne5 Kg1 Kg7 Ra1 f5 Kf2; 5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -33; bm Bb5; pv Bb5 Rxa3 h3 Ng6 Kh2 Kg7 Rc1 Rc3 Ra1 Rc5 c4 Ne5 Re4 Rf2; 5k2/5p1p/3p1r2/p2P4/B6n/P7/r1P1R1PP/6RK w - - acd 13/33; acs 116; acn 952676; ce -34; bm h3; pv h3 Rxa3 Bb5 Ng6 Kh2 Kg7 Rc1 Rc3 Ra1 Rc5 c4 Ne5 Re4 Rf2;
I have not done any analysis yet, but I guess we might see that the quiet positions LC0 will do better and the tactical ones SF will fare better (when they disagree, which will be a small percentage of the time).
http://rebel13.nl/rebel13/oke.html
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 6995
- Joined: Thu Aug 18, 2011 12:04 pm
Re: MEA and temere.epd
Recap from the OP -
MEA is a tool (written by Ferdinand Mosca) that analyses an EPD position set in STS-style assigning bonus points pre-defined in each EPD record.
temere.epd (Latin for random) is a 4975 position set created from a much bigger random EPD collection in an intelligent way with as goal to produce a reasonable reliable ranking list of engines with an estimated error bar of -25/+25 elo at fast time controls like 100ms, 250ms, 500ms etc.
http://rebel13.nl/misc/mea.html
----------
I think (without boasting) I succeeded reasonable well. Next step, trying the impossible, narrow the gap to -5/+5 elo in order that it can serve as a second opinion vs regular volume eng-eng testing and/or a tool to quickly find out if a program change makes sense to start the long volume eng-eng testing.
Two examples, Stockfish 5 since it is the last version that has UCI support for tuneable evaluation parameters like Mobility, King Safety and 2) the free Komodo 10 with the tuneable 'Dynamism' setting.
Komodo 'Dynamism' first. The default setting is 110. The assumption is that 110 is a well tuned value and that the system will report the 110 setting as best. With the first set (about 10,000 positions) this is the case.
Or in more eye-pleasing html - http://rebel13.nl/mea/k10-dyna-1000ms.html
Next Stockfish 5, we tune 8 mobility settings in one run (8 threads), once again the assumption is that the default setting 100 is well tuned.
And once again the default setting nicely on top.
more eye-pleasing - http://rebel13.nl/mea/sf5-mob-1000ms.html
In both cases, 8 settings tested in one run in 90-95 minutes, how long does 7 eng-eng matches of 20,000-30,000 bullet games take?
--------
And now I am looking for 2 more top-engines with tuneable paramters to make sure these 2 cases are not a lucky random case to fool me.
Any good suggestions?
MEA is a tool (written by Ferdinand Mosca) that analyses an EPD position set in STS-style assigning bonus points pre-defined in each EPD record.
temere.epd (Latin for random) is a 4975 position set created from a much bigger random EPD collection in an intelligent way with as goal to produce a reasonable reliable ranking list of engines with an estimated error bar of -25/+25 elo at fast time controls like 100ms, 250ms, 500ms etc.
http://rebel13.nl/misc/mea.html
----------
I think (without boasting) I succeeded reasonable well. Next step, trying the impossible, narrow the gap to -5/+5 elo in order that it can serve as a second opinion vs regular volume eng-eng testing and/or a tool to quickly find out if a program change makes sense to start the long volume eng-eng testing.
Two examples, Stockfish 5 since it is the last version that has UCI support for tuneable evaluation parameters like Mobility, King Safety and 2) the free Komodo 10 with the tuneable 'Dynamism' setting.
Komodo 'Dynamism' first. The default setting is 110. The assumption is that 110 is a well tuned value and that the system will report the 110 setting as best. With the first set (about 10,000 positions) this is the case.
Code: Select all
EPD : edp\set1.epd
Time : 1000 ms
Solving Max Total Time Hash
Engine Score Used Time Found Pos Time Score Rate ms Mb Cpu CCRL
1 K10-default 151009 02:44:00.6 5248 9706 00:13:47.8 291180 0.519 1000 128 1 3266
2 K10-Dyn-115 150887 02:44:00.7 5231 9706 00:13:03.8 291180 0.518 1000 128 1 3266
3 K10-Dyn-100 149507 02:44:00.6 5205 9706 00:14:17.6 291180 0.513 1000 128 1 3266
4 K10-Dyn-120 149096 02:44:00.7 5168 9706 00:12:50.4 291180 0.512 1000 128 1 3266
5 K10-Dyn-105 148766 02:44:00.7 5171 9706 00:13:41.9 291180 0.511 1000 128 1 3266
6 K10-Dyn-125 148704 02:44:00.6 5149 9706 00:12:33.1 291180 0.511 1000 128 1 3266
7 K10-Dyn-90 147766 02:44:00.7 5129 9706 00:13:10.9 291180 0.507 1000 128 1 3266
8 K10-Dyn-75 145564 02:44:00.6 5059 9706 00:13:28.7 291180 0.500 1000 128 1 3266
Next Stockfish 5, we tune 8 mobility settings in one run (8 threads), once again the assumption is that the default setting 100 is well tuned.
Code: Select all
EPD : epd\set1.epd
Time : 1000 ms
Solving Max Total Time Hash
Engine Score Used Time Found Pos Time Score Rate ms Mb Cpu CCRL
1 sf5-default 150273 02:44:01.0 5189 9706 00:12:29.6 291180 0.516 1000 128 1 3200
2 sf5-mob-80 148315 02:44:00.6 5125 9706 00:12:38.4 291180 0.509 1000 128 1 3200
3 sf5-mob-95 148172 02:44:00.7 5120 9706 00:12:33.1 291180 0.509 1000 128 1 3200
4 sf5-mob-105 147959 02:44:00.7 5108 9706 00:12:12.0 291180 0.508 1000 128 1 3200
5 sf5-mob-150 147807 02:44:00.6 5097 9706 00:11:50.2 291180 0.508 1000 128 1 3200
6 sf5-mob-125 146920 02:44:00.6 5071 9706 00:12:07.7 291180 0.505 1000 128 1 3200
7 sf5-mob-115 145874 02:44:00.9 5037 9706 00:12:10.2 291180 0.501 1000 128 1 3200
8 sf5-mob-60 145792 02:44:00.8 5031 9706 00:11:52.1 291180 0.501 1000 128 1 3200
more eye-pleasing - http://rebel13.nl/mea/sf5-mob-1000ms.html
In both cases, 8 settings tested in one run in 90-95 minutes, how long does 7 eng-eng matches of 20,000-30,000 bullet games take?
--------
And now I am looking for 2 more top-engines with tuneable paramters to make sure these 2 cases are not a lucky random case to fool me.
Any good suggestions?
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 550
- Joined: Tue Nov 19, 2019 8:48 pm
- Full name: Alayan Feh
Re: MEA and temere.epd
Can you compile yourself if provided with code ? I could have a look at creating a branch with some tunable parameter in Ethereal if so.
-
- Posts: 6995
- Joined: Thu Aug 18, 2011 12:04 pm
Re: MEA and temere.epd
Oh great, but I am a github dummy, so please provide a Windows executable, I will PM you my mail address.
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: MEA and temere.epd
Sorry to come back with the same question, but it is normal that Minic 1sec search on 1 thread is performing 3127/4975 best move only ? Isn't that breaking the +-/25 elo estimate ?