Thanks Dann, Ferdy, I will look into it.
In the meantime -
Fun with Komodo 9 and King Safety - http://rebel13.nl/k9.html
Komodo 9.42 and Dynamism - http://rebel13.nl/k942.html
ProDeo - http://rebel13.nl/mea.html
So 1-5 different time controls possible, then in the second overview the overall (average) rating is calculated, the blue ratings.
Feedback welcome.
MRL - The MEA Rating List
Moderators: hgm, Rebel, chrisw
-
- Posts: 6995
- Joined: Thu Aug 18, 2011 12:04 pm
Re: MRL - The MEA Rating List
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 3019
- Joined: Wed Mar 08, 2006 9:57 pm
- Location: Rio de Janeiro, Brazil
Re: MRL - The MEA Rating List
I love the idea, but from my own experience having to clean out the WAC test suite, I know how important it is for all positions to be correct and have unique solutions. Otherwise seeing it not find 500 positions, or whatever the number, will be meaningless if those official solutions were wrong in the first place.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: MRL - The MEA Rating List
Yes, my formula is quite complicated.
But, I think that (for instance) it is crucial to include depth.
And a dominant score at a lower depth should be considered (and probably reanalyzed).
However, your efforts are quite interesting.
I have two computer programs I use to analyze the data to produce the scores.
I try to normalize to 10.9,8...,2,1 scoring
But, I think that (for instance) it is crucial to include depth.
And a dominant score at a lower depth should be considered (and probably reanalyzed).
However, your efforts are quite interesting.
I have two computer programs I use to analyze the data to produce the scores.
I try to normalize to 10.9,8...,2,1 scoring
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 1563
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: MRL - The MEA Rating List
Why is it important for test positions to have only one unique solution? When there are several moves leading to a significant advantage and your engine chooses one of them, there is nothing wrong with that. The WAC test, like it is, is very usefull to quickly check whether an engine is tactically broken or not. For instance Leela scores incredibly bad at WAC, but this doesn't mean that WAC is broken, it means that Leela is tactically broken, maybe it can compensate for this weakness by superior positional play, I don't know, but you certainly don't need to clean out or repair WAC just to make Leela look better.Albert Silver wrote: ↑Thu Jun 21, 2018 7:36 pm I love the idea, but from my own experience having to clean out the WAC test suite, I know how important it is for all positions to be correct and have unique solutions. Otherwise seeing it not find 500 positions, or whatever the number, will be meaningless if those official solutions were wrong in the first place.
-
- Posts: 3019
- Joined: Wed Mar 08, 2006 9:57 pm
- Location: Rio de Janeiro, Brazil
Re: MRL - The MEA Rating List
Make Leela look better? What are you talking about?Joost Buijs wrote: ↑Fri Jun 22, 2018 8:43 amWhy is it important for test positions to have only one unique solution? When there are several moves leading to a significant advantage and your engine chooses one of them, there is nothing wrong with that. The WAC test, like it is, is very usefull to quickly check whether an engine is tactically broken or not. For instance Leela scores incredibly bad at WAC, but this doesn't mean that WAC is broken, it means that Leela is tactically broken, maybe it can compensate for this weakness by superior positional play, I don't know, but you certainly don't need to clean out or repair WAC just to make Leela look better.Albert Silver wrote: ↑Thu Jun 21, 2018 7:36 pm I love the idea, but from my own experience having to clean out the WAC test suite, I know how important it is for all positions to be correct and have unique solutions. Otherwise seeing it not find 500 positions, or whatever the number, will be meaningless if those official solutions were wrong in the first place.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: MRL - The MEA Rating List
I tried coding it up, but I felt horrified when I looked at the scores. I am sure it is purely psychological, but in the example above Bd7 gets 6 points.Ferdy wrote: ↑Thu Jun 21, 2018 5:46 am {snip}
I had been studying how engine score can be converted to a point system. One method to handle the negative scores is by using logistic function.
Example from reb-glob.epdCode: Select all
scoring_rate = 1/[1 + 10 ^(-score_cp/400)]
Example calculations:Code: Select all
rnbq1rk1/pp2bppp/5n2/2pN4/2Pp4/3Q1NP1/PP2PPBP/R1B2RK1 b - - bm Nc6; c3 Nc6; acd 34; ce -3; rnbq1rk1/pp2bppp/5n2/2pN4/2Pp4/3Q1NP1/PP2PPBP/R1B2RK1 b - - bm Nxd5; c3 Nc6; acd 26; ce -37; rnbq1rk1/pp2bppp/5n2/2pN4/2Pp4/3Q1NP1/PP2PPBP/R1B2RK1 b - - bm Re8; c3 Nc6; acd 26; ce -49; rnbq1rk1/pp2bppp/5n2/2pN4/2Pp4/3Q1NP1/PP2PPBP/R1B2RK1 b - - bm Be6; c3 Nc6; acd 26; ce -79; rnbq1rk1/pp2bppp/5n2/2pN4/2Pp4/3Q1NP1/PP2PPBP/R1B2RK1 b - - bm Na6; c3 Nc6; acd 26; ce -85; rnbq1rk1/pp2bppp/5n2/2pN4/2Pp4/3Q1NP1/PP2PPBP/R1B2RK1 b - - bm a5; c3 Nc6; acd 26; ce -104; rnbq1rk1/pp2bppp/5n2/2pN4/2Pp4/3Q1NP1/PP2PPBP/R1B2RK1 b - - bm h6; c3 Nc6; acd 26; ce -106; rnbq1rk1/pp2bppp/5n2/2pN4/2Pp4/3Q1NP1/PP2PPBP/R1B2RK1 b - - bm Bg4; c3 Nc6; acd 26; ce -122; rnbq1rk1/pp2bppp/5n2/2pN4/2Pp4/3Q1NP1/PP2PPBP/R1B2RK1 b - - bm Bd7; c3 Nc6; acd 26; ce -140;
rnbq1rk1/pp2bppp/5n2/2pN4/2Pp4/3Q1NP1/PP2PPBP/R1B2RK1 b - - bm Nc6; c3 Nc6; acd 34; ce -3;
ce = -3
sr (scoring_rate) = 1/(1+10^(-(-3)/400)) = 0.49568
rnbq1rk1/pp2bppp/5n2/2pN4/2Pp4/3Q1NP1/PP2PPBP/R1B2RK1 b - - bm Bd7; c3 Nc6; acd 26; ce -140;
ce = -140
sr = 1/(1+10^(-(-140)/400)) = 0.30876
If you want the top to get 10 points,
factor = 10/0.49568 = 20.174
bm Nc6, ce = -3, sr = 0.49568
pt = factor * sr = 10
bm Bd7, ce = -140, sr = 0.30876
pt = factor * sr = 6
{snip}
Bd7 is truly an awful move, considering all the better alternatives.
In the high end games in my collection, only two moves have been played: Nc6 and Nxd5.
Now, I realize that mathematically, the logistic fit makes perfect sense, especially considering how the Elo values are calculated.
But the artist in me cannot stomach those values.
The way that I perform the calculations is rather arbitrary, and the code for doing it is attached.
The view that I use looks like this:
USE [Chess];
CREATE view [dbo].[Reb] as
SELECT Epd, abm, c3, ace, aacd, apv
FROM [Chess].[dbo].[epd_and_alternates]
--WHERE Epd IN (select Epd from BareEPD)
WHERE Epd IN (select Epd from Rebel)
UNION
SELECT Epd, bm, c3, ce, acd, pv
FROM Epd
--WHERE Epd IN (select Epd from BareEPD)
WHERE Epd IN (select Epd from Rebel);
and the view epd_and_alternates looks like this:
USE [Chess];
CREATE VIEW [dbo].[epd_and_alternates]
AS
SELECT dbo.AlternateEvals.ace, dbo.AlternateEvals.apv, dbo.AlternateEvals.aacd, dbo.AlternateEvals.aacn, dbo.AlternateEvals.aam, dbo.AlternateEvals.abm,
dbo.AlternateEvals.apm, dbo.AlternateEvals.aid, dbo.AlternateEvals.aacs, dbo.Epd.Epd, dbo.Epd.acn, dbo.Epd.acd, dbo.Epd.ce, dbo.Epd.pv, dbo.Epd.am, dbo.Epd.bm,
dbo.Epd.dm, dbo.Epd.pm, dbo.Epd.id, dbo.Epd.CheckoutCount, dbo.Epd.c0, dbo.Epd.c1, dbo.Epd.c2, dbo.Epd.c3, dbo.Epd.c4, dbo.Epd.acs
FROM dbo.AlternateEvals INNER JOIN
dbo.Epd ON dbo.AlternateEvals.EpdID = dbo.Epd.EpdID;
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: MRL - The MEA Rating List
One issue on the scoring of 1 to 10 is that if the solutions provided is only 8 and the top 8 is scored 1, and there are more than 8 legal moves in the position, the scaling of score are only fitted to the top 8. This scoring range may help measure the top engines since those are capable of finding moves within the top 8 moves, for weaker engines it would be a different story.Dann Corbit wrote: ↑Thu Jun 21, 2018 8:21 pm Yes, my formula is quite complicated.
But, I think that (for instance) it is crucial to include depth.
And a dominant score at a lower depth should be considered (and probably reanalyzed).
However, your efforts are quite interesting.
I have two computer programs I use to analyze the data to produce the scores.
I try to normalize to 10.9,8...,2,1 scoring
Example from original rebel.epd
Code: Select all
r3r1k1/1p3nqp/2pp4/p4p2/Pn3P1Q/2N4P/1PPR2P1/3R1BK1 w - - bm Ne2; c0 "positional scores are: Ne2=10, g4=6, Bd3=5, Rxd6=2, Re1=2, Qh5=1, Kh2=1, Be2=1"; id "rebel.pos.01";
To solve these issues, make the system use a 100 point system i.e top move is set to 100 and other moves will get a score based on the logistic function, score_rate = 1/(1+10^(-score/400))
Using SF analyze the above position at multipv set to legal_move, and using 100 pt system.
ScoreRate = 1/(1+10^(-engce/400))
factor = 100/ScoreRate_1 = 100/0.68382
pts_1 = factor * ScoreRate_1 = 100
pts_2 = factor * ScoreRate_2 = 95
dep column is the depth searched by SF. To improve the correctness of the analysis of all moves, SF can be run to exceed that 24.
legal_move: 38
Code: Select all
num move engce dep ScoreRate pts
1 Ne2 134 24 0.68382 100
2 Nb1 110 24 0.65322 95
3 Re1 77 24 0.60903 89
4 Be2 50 24 0.57146 83
5 Kh2 46 24 0.56582 82
6 Bd3 42 24 0.56015 81
7 Qh5 40 24 0.55731 81
8 Rc1 36 24 0.55162 80
9 Re2 27 24 0.53878 78
10 g4 27 24 0.53878 78
11 Kh1 10 24 0.51439 75
12 Bc4 0 24 0.50000 73
13 g3 0 24 0.50000 73
14 Qf2 0 24 0.50000 73
15 Rb1 -6 24 0.49137 71
16 Ra1 -6 24 0.49137 71
17 Rf2 -8 24 0.48849 71
18 Kf2 -28 24 0.45979 67
19 Rd4 -74 24 0.39509 57
20 Nb5 -110 24 0.34678 50
21 Rd3 -138 24 0.31123 45
22 Rxd6 -152 24 0.29422 43
23 Bb5 -173 24 0.26975 39
24 Na2 -178 24 0.26412 38
25 b3 -400 24 0.09091 13
26 Ba6 -428 24 0.07844 11
27 Rd5 -432 24 0.07679 11
28 Nd5 -438 24 0.07438 10
29 Ne4 -463 24 0.06506 9
30 Qe1 -742 24 0.01377 2
31 Qg5 -945 24 0.00432 0
32 Qxh7+ -1161 24 0.00125 0
33 Qh6 -1239 24 0.00080 0
34 Qf6 -1240 24 0.00079 0
35 Qg4 -1259 24 0.00071 0
36 Qg3 -1267 24 0.00068 0
37 Qe7 -1267 24 0.00068 0
38 Qd8 -1280 24 0.00063 0
Code: Select all
r3r1k1/1p3nqp/2pp4/p4p2/Pn3P1Q/2N4P/1PPR2P1/3R1BK1 w - - bm Ne2; id "rebel.pos.01"; c0 "Ne2=100, Nb1=95, Re1=89, Be2=83, Kh2=82, Bd3=81, Qh5=81, Rc1=80, Re2=78, g4=78, Kh1=75, Bc4=73, g3=73, Qf2=73, Rb1=71, Ra1=71, Rf2=71, Kf2=67, Rd4=57, Nb5=50, Rd3=45, Rxd6=43, Bb5=39, Na2=38, b3=13, Ba6=11, Rd5=11, Nd5=10, Ne4=9, Qe1=2";