MEA and temere.epd

xr_a_y · Post by **xr_a_y** » Thu Apr 30, 2020 9:09 pm

Be carrefull if your copy paste is right, then your code is wrong because of a spurious comma

2.1,  2,0,  1.9,

look at the 2,0 instead of 2.0 !

Rebel · Post by **Rebel** » Thu Apr 30, 2020 10:25 pm

xr_a_y wrote: ↑Thu Apr 30, 2020 9:09 pm Be carrefull if your copy paste is right, then your code is wrong because of a spurious comma
Code: Select all
2.1,  2,0,  1.9,
look at the 2,0 instead of 2.0 !

Code ok, copy & paste also, making it more readable in the post not, but thanks for pointing out.

abulmo2 · Post by **abulmo2** » Tue May 05, 2020 12:31 pm

first, thank you for all your clarifications.

Rebel wrote: ↑Thu Apr 30, 2020 5:57 pm The idea behind the tool is that one runs it at fast time control since you need so many positions to get a reliable result, hence the tool concentrates itself at time controls between 100-500ms, 1000ms max. This gives the following two tables, ms (milliseconds) and a corresponding table pt (points).
Code: Select all
int ms [] =    {  10,  20,  30,  40,  50,  60,  70,  80,  90,  100,  125,  150,  175,  200,  250,  300,  400,  500,  600,  700,  800,  900,   -1 };
double pt [] = { 3.0, 2.9, 2.8, 2.7, 2.6, 2.5, 2.4, 2.3, 2.2,  2.1,  2,0,  1.9,  1.8,  1.7,  1.6,  1.5,  1.4,  1.3,  1.2,  1.1,  1.0,  1.0,  0.0 }; 
Example: c0 "Qa4=10, Rxc2=9, f6=5, Kg7=1";

If your engine plays none of the above moves, no points.

Suppose it plays Qa4 it gets 10 points * pt[x]; where x is the index found in the ms table when the engine had Qa4 as best move and remained stable to the end.

And so we may get:

Qa4 found at 8ms and stable, 10 * 3.0 = 30 points, the maximum.
Qa4 found at 190ms, 10 * 1.7 = 17 points.

Rxc2 found at 70ms, 9 * 2.4 = 21 points.

f6 found at 5ms, 5 * 3.0 = 15 points.
f6 found at 100ms, 5 * 2.1 = 10 points.

Kg7 found at 5ms, 1 * 3.0 = 3 points.
Kg7 found at 100ms, 1 * 2.1 = 2 points.

So, if an engine gets stuck to the 2nd bestmove (Rxc2) found early (at 70 ms), it will get a better score than another engine switching later (at 190 ms) from Rxc2 to Qa4 ?! I am doubtful about this time bonus...

Rebel · Post by **Rebel** » Tue May 05, 2020 12:41 pm

abulmo2 wrote: ↑Tue May 05, 2020 12:31 pm first, thank you for all your clarifications.
Rebel wrote: ↑Thu Apr 30, 2020 5:57 pm The idea behind the tool is that one runs it at fast time control since you need so many positions to get a reliable result, hence the tool concentrates itself at time controls between 100-500ms, 1000ms max. This gives the following two tables, ms (milliseconds) and a corresponding table pt (points).
Code: Select all
int ms [] =    {  10,  20,  30,  40,  50,  60,  70,  80,  90,  100,  125,  150,  175,  200,  250,  300,  400,  500,  600,  700,  800,  900,   -1 };
double pt [] = { 3.0, 2.9, 2.8, 2.7, 2.6, 2.5, 2.4, 2.3, 2.2,  2.1,  2,0,  1.9,  1.8,  1.7,  1.6,  1.5,  1.4,  1.3,  1.2,  1.1,  1.0,  1.0,  0.0 }; 
Example: c0 "Qa4=10, Rxc2=9, f6=5, Kg7=1";

If your engine plays none of the above moves, no points.

Suppose it plays Qa4 it gets 10 points * pt[x]; where x is the index found in the ms table when the engine had Qa4 as best move and remained stable to the end.

And so we may get:

Qa4 found at 8ms and stable, 10 * 3.0 = 30 points, the maximum.
Qa4 found at 190ms, 10 * 1.7 = 17 points.

Rxc2 found at 70ms, 9 * 2.4 = 21 points.

f6 found at 5ms, 5 * 3.0 = 15 points.
f6 found at 100ms, 5 * 2.1 = 10 points.

Kg7 found at 5ms, 1 * 3.0 = 3 points.
Kg7 found at 100ms, 1 * 2.1 = 2 points.
So, if an engine gets stuck to the 2nd bestmove (Rxc2) found early (at 70 ms), it will get a better score than another engine switching later (at 190 ms) from Rxc2 to Qa4 ?! I am doubtful about this time bonus...

Yes, but note that second (third/fourth) best moves with 9 points are almost as good as the best move. And volume (10,000 positions at least) is extremely important to weed out randomness.

abulmo2 · Post by **abulmo2** » Thu May 07, 2020 2:11 pm

Rebel wrote: ↑Tue May 05, 2020 12:41 pm
abulmo2 wrote: ↑Tue May 05, 2020 12:31 pm I am doubtful about this time bonus...
Yes, but note that second (third/fourth) best moves with 9 points are almost as good as the best move. And volume (10,000 positions at least) is extremely important to weed out randomness.

I just tested on various Amoeba's versions (1.0 to 3.2 in development), and using my own MEA implementation¹ the impact of time bonus. It seems to just add noise to the score.
Testing condition: lcx-1.epd file, 0.1s per move, 1 thread, 256 Mb Hash, Amoeba's versions 1.0 - 1.4, 2.0 - 2.8, 3.0, 3.1, 3.2dev. All these engines have played gauntlet game against various chess engines at 40/1 with ponder on. Their ratings (measured by Ordo) scaled from 2653 to 3094). I then ran an the on the lcx-1.epd file and try to predict their Elo using a linear model from: 1) the count of bestmove found compared to the bm field, the MEA score using the c0 field, and the MEA score with time bonus. I got the following r² (coefficient of variation) :

Code: Select all

 - best move found count: r² = 0.87
 - MEA score: r² = 0.91
 - MEA score with time bonus: r² = 0.46

So the MEA score is indeed an improvement over just the best move found count. The time bonus, on the other hand looks like a regression.

¹ for the time bonus, I do not use an array but a logarithmic formula that reproduces the value of your array. My program also does not assume the best move is scored 10, but check for the highest value in the set of best moves.

Rebel · Post by **Rebel** » Fri May 08, 2020 7:07 am

abulmo2 wrote: ↑Thu May 07, 2020 2:11 pm
Rebel wrote: ↑Tue May 05, 2020 12:41 pm
abulmo2 wrote: ↑Tue May 05, 2020 12:31 pm I am doubtful about this time bonus...
Yes, but note that second (third/fourth) best moves with 9 points are almost as good as the best move. And volume (10,000 positions at least) is extremely important to weed out randomness.
I just tested on various Amoeba's versions (1.0 to 3.2 in development), and using my own MEA implementation¹ the impact of time bonus. It seems to just add noise to the score.

Let's check if we are talking about the same. Hypothetical example from the start position:

depth-1 bm Nf3 time 5 ms
depth-2 bm Nf3 time 8 ms
depth-3 bm e4 time 15 ms
depth-4 bm e4 time 25 ms
depth-5 bm d4 time 50 ms
depth-6 bm d4 time 100 ms
depth-7 bm d4 time 200 ms
depth-8 bm e4 time 300 ms
depth-9 bm e4 time 400 ms
depth 10 final move e4 600 ms

For applying a time bonus, which milliesecond value do you use? The blue or the red?

Testing condition: lcx-1.epd file, 0.1s per move, 1 thread, 256 Mb Hash, Amoeba's versions 1.0 - 1.4, 2.0 - 2.8, 3.0, 3.1, 3.2dev. All these engines have played gauntlet game against various chess engines at 40/1 with ponder on. Their ratings (measured by Ordo) scaled from 2653 to 3094). I then ran an the on the lcx-1.epd file and try to predict their Elo using a linear model from: 1) the count of bestmove found compared to the bm field, the MEA score using the c0 field, and the MEA score with time bonus. I got the following r² (coefficient of variation) :
Code: Select all
 - best move found count: r² = 0.87
 - MEA score: r² = 0.91
 - MEA score with time bonus: r² = 0.46
So the MEA score is indeed an improvement over just the best move found count.

Just for the record, the "bm" can't be trusted, the EPD'S come from various sources, the MEA moves in "c0" are what matters. And secondly, my experience with sfx sets are better than with the lcx sets.

¹ for the time bonus, I do not use an array but a logarithmic formula that reproduces the value of your array. My program also does not assume the best move is scored 10, but check for the highest value in the set of best moves.

Well, if it works, it works. I don't pretend to have the best formula, just that the system has potential.

Alayan · Post by **Alayan** » Fri May 08, 2020 6:55 pm

Rebel wrote: ↑Fri May 08, 2020 7:07 am Let's check if we are talking about the same. Hypothetical example from the start position:

depth-1 bm Nf3 time 5 ms
depth-2 bm Nf3 time 8 ms
depth-3 bm e4 time 15 ms
depth-4 bm e4 time 25 ms
depth-5 bm d4 time 50 ms
depth-6 bm d4 time 100 ms
depth-7 bm d4 time 200 ms
depth-8 bm e4 time 300 ms
depth-9 bm e4 time 400 ms
depth 10 final move e4 600 ms

For applying a time bonus, which milliesecond value do you use? The blue or the red?

This example is a good argument against the time bonus. Nf3, e4 and d4 are extremely close, pretty much equivalent in performance, how is a strong preference for one (so it stays preferred at all iterations) a sign of superiority compared to an engine switching back and forth ?

An engine switching back and forth between 1st and 2nd best move in the list could get a worse score than an engine sticking to the 2nd best move all along, how does that make sense ?

Now, maybe this doesn't happen on enough positions to have a big impact, but I don't like it much. In your tests, was the time bonus really useful to improve ordering of engines ?

Rebel · Post by **Rebel** » Fri May 08, 2020 7:15 pm

Alayan wrote: ↑Fri May 08, 2020 6:55 pm
Rebel wrote: ↑Fri May 08, 2020 7:07 am Let's check if we are talking about the same. Hypothetical example from the start position:

depth-1 bm Nf3 time 5 ms
depth-2 bm Nf3 time 8 ms
depth-3 bm e4 time 15 ms
depth-4 bm e4 time 25 ms
depth-5 bm d4 time 50 ms
depth-6 bm d4 time 100 ms
depth-7 bm d4 time 200 ms
depth-8 bm e4 time 300 ms
depth-9 bm e4 time 400 ms
depth 10 final move e4 600 ms

For applying a time bonus, which milliesecond value do you use? The blue or the red?
This example is a good argument against the time bonus. Nf3, e4 and d4 are extremely close, pretty much equivalent in performance, how is a strong preference for one (so it stays preferred at all iterations) a sign of superiority compared to an engine switching back and forth ?

An engine switching back and forth between 1st and 2nd best move in the list could get a worse score than an engine sticking to the 2nd best move all along, how does that make sense ?

Now, maybe this doesn't happen on enough positions to have a big impact, but I don't like it much. In your tests, was the time bonus really useful to improve ordering of engines ?

It was a big, provided the blue color time was taken. I do understand your reasoning but as you already point out yourself volume (like in eng-eng testing) is a sharp razor. Nevertheless I will make an option that excludes the time bonus, much has changed for the better in the between time and see what happens.

Rebel · Post by **Rebel** » Sat May 09, 2020 11:37 am

Alayan wrote: ↑Fri May 08, 2020 6:55 pm
Rebel wrote: ↑Fri May 08, 2020 7:07 am Let's check if we are talking about the same. Hypothetical example from the start position:

depth-1 bm Nf3 time 5 ms
depth-2 bm Nf3 time 8 ms
depth-3 bm e4 time 15 ms
depth-4 bm e4 time 25 ms
depth-5 bm d4 time 50 ms
depth-6 bm d4 time 100 ms
depth-7 bm d4 time 200 ms
depth-8 bm e4 time 300 ms
depth-9 bm e4 time 400 ms
depth 10 final move e4 600 ms

For applying a time bonus, which milliesecond value do you use? The blue or the red?
Now, maybe this doesn't happen on enough positions to have a big impact, but I don't like it much. In your tests, was the time bonus really useful to improve ordering of engines ?

I ran Rubichess, 84.xxx positions, 250ms time-bonus=on and time-bonus=off

Code: Select all

    EPD  : epd\ProDeo.epd
    Time : 250ms
                                                       Solving       Max    Total  Time  Hash          
    Engine           Score   Used Time   Found   Pos     Time       Score    Rate   ms    Mb  Cpu CCRL
 1  time-bonus=on   1710890  06:43:40.3  40733  84617  00:19:22.2  2538510  67.4%   250   128  1  3200
 2  time-bonus=off   603229  06:43:40.3  40733  84617  00:19:22.2   846170  71.3%   250   128  1  3200

So, the way I coded things a difference of almost 4% is not noise.

The time bonus is based on the simple observation that Ethereal 12 will find moves much faster than Ethereal 8 because 12 is stronger and thus it should rewarded.

abulmo2 · Post by **abulmo2** » Sat May 09, 2020 12:03 pm

Rebel wrote: ↑Fri May 08, 2020 7:07 am
abulmo2 wrote: ↑Thu May 07, 2020 2:11 pm I just tested on various Amoeba's versions (1.0 to 3.2 in development), and using my own MEA implementation¹ the impact of time bonus. It seems to just add noise to the score.
Let's check if we are talking about the same. Hypothetical example from the start position:

depth-1 bm Nf3 time 5 ms
depth-2 bm Nf3 time 8 ms
depth-3 bm e4 time 15 ms
depth-4 bm e4 time 25 ms
depth-5 bm d4 time 50 ms
depth-6 bm d4 time 100 ms
depth-7 bm d4 time 200 ms
depth-8 bm e4 time 300 ms
depth-9 bm e4 time 400 ms
depth 10 final move e4 600 ms

For applying a time bonus, which milliesecond value do you use? The blue or the red?

The last blue one.

Just for the record, the "bm" can't be trusted, the EPD'S come from various sources, the MEA moves in "c0" are what matters. And secondly, my experience with sfx sets are better than with the lcx sets.

In my experience, bm moves looked better than the MEA scores with time bonus.
I did again the experiment with sfx-1.epd and get similar results.
Note that Amoeba is a strange beast much better at playing games than at solving positions. Maybe the problem is in my program?

MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd

Re: MEA and temere.epd