Code: Select all
2.1, 2,0, 1.9,
Moderators: hgm, Rebel, chrisw
Code: Select all
2.1, 2,0, 1.9,
Code ok, copy & paste also, making it more readable in the post not, but thanks for pointing out.xr_a_y wrote: ↑Thu Apr 30, 2020 9:09 pm Be carrefull if your copy paste is right, then your code is wrong because of a spurious comma
look at the 2,0 instead of 2.0 !Code: Select all
2.1, 2,0, 1.9,
So, if an engine gets stuck to the 2nd bestmove (Rxc2) found early (at 70 ms), it will get a better score than another engine switching later (at 190 ms) from Rxc2 to Qa4 ?! I am doubtful about this time bonus...Rebel wrote: ↑Thu Apr 30, 2020 5:57 pm The idea behind the tool is that one runs it at fast time control since you need so many positions to get a reliable result, hence the tool concentrates itself at time controls between 100-500ms, 1000ms max. This gives the following two tables, ms (milliseconds) and a corresponding table pt (points).
Example: c0 "Qa4=10, Rxc2=9, f6=5, Kg7=1";Code: Select all
int ms [] = { 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 250, 300, 400, 500, 600, 700, 800, 900, -1 }; double pt [] = { 3.0, 2.9, 2.8, 2.7, 2.6, 2.5, 2.4, 2.3, 2.2, 2.1, 2,0, 1.9, 1.8, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1, 1.0, 1.0, 0.0 };
If your engine plays none of the above moves, no points.
Suppose it plays Qa4 it gets 10 points * pt[x]; where x is the index found in the ms table when the engine had Qa4 as best move and remained stable to the end.
And so we may get:
Qa4 found at 8ms and stable, 10 * 3.0 = 30 points, the maximum.
Qa4 found at 190ms, 10 * 1.7 = 17 points.
Rxc2 found at 70ms, 9 * 2.4 = 21 points.
f6 found at 5ms, 5 * 3.0 = 15 points.
f6 found at 100ms, 5 * 2.1 = 10 points.
Kg7 found at 5ms, 1 * 3.0 = 3 points.
Kg7 found at 100ms, 1 * 2.1 = 2 points.
Yes, but note that second (third/fourth) best moves with 9 points are almost as good as the best move. And volume (10,000 positions at least) is extremely important to weed out randomness.abulmo2 wrote: ↑Tue May 05, 2020 12:31 pm first, thank you for all your clarifications.
So, if an engine gets stuck to the 2nd bestmove (Rxc2) found early (at 70 ms), it will get a better score than another engine switching later (at 190 ms) from Rxc2 to Qa4 ?! I am doubtful about this time bonus...Rebel wrote: ↑Thu Apr 30, 2020 5:57 pm The idea behind the tool is that one runs it at fast time control since you need so many positions to get a reliable result, hence the tool concentrates itself at time controls between 100-500ms, 1000ms max. This gives the following two tables, ms (milliseconds) and a corresponding table pt (points).
Example: c0 "Qa4=10, Rxc2=9, f6=5, Kg7=1";Code: Select all
int ms [] = { 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 250, 300, 400, 500, 600, 700, 800, 900, -1 }; double pt [] = { 3.0, 2.9, 2.8, 2.7, 2.6, 2.5, 2.4, 2.3, 2.2, 2.1, 2,0, 1.9, 1.8, 1.7, 1.6, 1.5, 1.4, 1.3, 1.2, 1.1, 1.0, 1.0, 0.0 };
If your engine plays none of the above moves, no points.
Suppose it plays Qa4 it gets 10 points * pt[x]; where x is the index found in the ms table when the engine had Qa4 as best move and remained stable to the end.
And so we may get:
Qa4 found at 8ms and stable, 10 * 3.0 = 30 points, the maximum.
Qa4 found at 190ms, 10 * 1.7 = 17 points.
Rxc2 found at 70ms, 9 * 2.4 = 21 points.
f6 found at 5ms, 5 * 3.0 = 15 points.
f6 found at 100ms, 5 * 2.1 = 10 points.
Kg7 found at 5ms, 1 * 3.0 = 3 points.
Kg7 found at 100ms, 1 * 2.1 = 2 points.
I just tested on various Amoeba's versions (1.0 to 3.2 in development), and using my own MEA implementation¹ the impact of time bonus. It seems to just add noise to the score.
Code: Select all
- best move found count: r² = 0.87
- MEA score: r² = 0.91
- MEA score with time bonus: r² = 0.46
Let's check if we are talking about the same. Hypothetical example from the start position:abulmo2 wrote: ↑Thu May 07, 2020 2:11 pmI just tested on various Amoeba's versions (1.0 to 3.2 in development), and using my own MEA implementation¹ the impact of time bonus. It seems to just add noise to the score.
Just for the record, the "bm" can't be trusted, the EPD'S come from various sources, the MEA moves in "c0" are what matters. And secondly, my experience with sfx sets are better than with the lcx sets.Testing condition: lcx-1.epd file, 0.1s per move, 1 thread, 256 Mb Hash, Amoeba's versions 1.0 - 1.4, 2.0 - 2.8, 3.0, 3.1, 3.2dev. All these engines have played gauntlet game against various chess engines at 40/1 with ponder on. Their ratings (measured by Ordo) scaled from 2653 to 3094). I then ran an the on the lcx-1.epd file and try to predict their Elo using a linear model from: 1) the count of bestmove found compared to the bm field, the MEA score using the c0 field, and the MEA score with time bonus. I got the following r² (coefficient of variation) :So the MEA score is indeed an improvement over just the best move found count.Code: Select all
- best move found count: r² = 0.87 - MEA score: r² = 0.91 - MEA score with time bonus: r² = 0.46
Well, if it works, it works. I don't pretend to have the best formula, just that the system has potential.¹ for the time bonus, I do not use an array but a logarithmic formula that reproduces the value of your array. My program also does not assume the best move is scored 10, but check for the highest value in the set of best moves.
This example is a good argument against the time bonus. Nf3, e4 and d4 are extremely close, pretty much equivalent in performance, how is a strong preference for one (so it stays preferred at all iterations) a sign of superiority compared to an engine switching back and forth ?Rebel wrote: ↑Fri May 08, 2020 7:07 am Let's check if we are talking about the same. Hypothetical example from the start position:
depth-1 bm Nf3 time 5 ms
depth-2 bm Nf3 time 8 ms
depth-3 bm e4 time 15 ms
depth-4 bm e4 time 25 ms
depth-5 bm d4 time 50 ms
depth-6 bm d4 time 100 ms
depth-7 bm d4 time 200 ms
depth-8 bm e4 time 300 ms
depth-9 bm e4 time 400 ms
depth 10 final move e4 600 ms
For applying a time bonus, which milliesecond value do you use? The blue or the red?
It was a big, provided the blue color time was taken. I do understand your reasoning but as you already point out yourself volume (like in eng-eng testing) is a sharp razor. Nevertheless I will make an option that excludes the time bonus, much has changed for the better in the between time and see what happens.Alayan wrote: ↑Fri May 08, 2020 6:55 pmThis example is a good argument against the time bonus. Nf3, e4 and d4 are extremely close, pretty much equivalent in performance, how is a strong preference for one (so it stays preferred at all iterations) a sign of superiority compared to an engine switching back and forth ?Rebel wrote: ↑Fri May 08, 2020 7:07 am Let's check if we are talking about the same. Hypothetical example from the start position:
depth-1 bm Nf3 time 5 ms
depth-2 bm Nf3 time 8 ms
depth-3 bm e4 time 15 ms
depth-4 bm e4 time 25 ms
depth-5 bm d4 time 50 ms
depth-6 bm d4 time 100 ms
depth-7 bm d4 time 200 ms
depth-8 bm e4 time 300 ms
depth-9 bm e4 time 400 ms
depth 10 final move e4 600 ms
For applying a time bonus, which milliesecond value do you use? The blue or the red?
An engine switching back and forth between 1st and 2nd best move in the list could get a worse score than an engine sticking to the 2nd best move all along, how does that make sense ?
Now, maybe this doesn't happen on enough positions to have a big impact, but I don't like it much. In your tests, was the time bonus really useful to improve ordering of engines ?
I ran Rubichess, 84.xxx positions, 250ms time-bonus=on and time-bonus=offAlayan wrote: ↑Fri May 08, 2020 6:55 pmNow, maybe this doesn't happen on enough positions to have a big impact, but I don't like it much. In your tests, was the time bonus really useful to improve ordering of engines ?Rebel wrote: ↑Fri May 08, 2020 7:07 am Let's check if we are talking about the same. Hypothetical example from the start position:
depth-1 bm Nf3 time 5 ms
depth-2 bm Nf3 time 8 ms
depth-3 bm e4 time 15 ms
depth-4 bm e4 time 25 ms
depth-5 bm d4 time 50 ms
depth-6 bm d4 time 100 ms
depth-7 bm d4 time 200 ms
depth-8 bm e4 time 300 ms
depth-9 bm e4 time 400 ms
depth 10 final move e4 600 ms
For applying a time bonus, which milliesecond value do you use? The blue or the red?
Code: Select all
EPD : epd\ProDeo.epd
Time : 250ms
Solving Max Total Time Hash
Engine Score Used Time Found Pos Time Score Rate ms Mb Cpu CCRL
1 time-bonus=on 1710890 06:43:40.3 40733 84617 00:19:22.2 2538510 67.4% 250 128 1 3200
2 time-bonus=off 603229 06:43:40.3 40733 84617 00:19:22.2 846170 71.3% 250 128 1 3200
The last blue one.Rebel wrote: ↑Fri May 08, 2020 7:07 amLet's check if we are talking about the same. Hypothetical example from the start position:
depth-1 bm Nf3 time 5 ms
depth-2 bm Nf3 time 8 ms
depth-3 bm e4 time 15 ms
depth-4 bm e4 time 25 ms
depth-5 bm d4 time 50 ms
depth-6 bm d4 time 100 ms
depth-7 bm d4 time 200 ms
depth-8 bm e4 time 300 ms
depth-9 bm e4 time 400 ms
depth 10 final move e4 600 ms
For applying a time bonus, which milliesecond value do you use? The blue or the red?
In my experience, bm moves looked better than the MEA scores with time bonus.Just for the record, the "bm" can't be trusted, the EPD'S come from various sources, the MEA moves in "c0" are what matters. And secondly, my experience with sfx sets are better than with the lcx sets.