The future of chess and elo ratings

lkaufman · Post by **lkaufman** » Sun Oct 04, 2015 4:38 pm

Laskos wrote:

Laskos wrote:
Michel wrote:
Code: Select all
Summary&#58; 
                 players,      min,      max,     Gcnt,     Wcnt,     Lcnt,     Dcnt,  perf&#40;%), 
               TexelHead,     0.00,     0.40,    12822,     3968,     3009,     5845,   53.74, 
             Texel105a32,     0.00,     0.40,    12761,     3813,     3144,     5804,   52.62, 

Total real games&#58; 37100 
Eval window&#58; 0.00 to 0.40 
Divisor&#58; 1 
Move range&#58; 8 to 8 
Elpased time&#58; 1.36m 


Summary&#58; 
                 players,      min,      max,     Gcnt,     Wcnt,     Lcnt,     Dcnt,  perf&#40;%), 
               TexelHead,    -0.40,     0.00,    12248,     3020,     3676,     5552,   47.32, 
             Texel105a32,    -0.40,     0.00,    12407,     2895,     3866,     5646,   46.09, 

Total real games&#58; 37100 
Eval window&#58; -0.40 to 0.00 
Divisor&#58; 1 
Move range&#58; 8 to 8 
Elpased time&#58; 0.73m 
I am still confused here. Wouldn't there be a large overlap between the
games evaluated by TexelHead as [0.00,0.40] and those by evaluated
by Texel105a32 as [-0.40,0.00] (they are playing each other right?).

So if I want independent games shouldn't I take only the games by TexelHead (i.e. the first row in both results matrices)?
Yes, there is an overlap, although not perfect. And yes, this is actually almost double counting. Now, if you take only one of them, I am afraid it may lose statistical significance, there should be more games in the [0.80,1.20] range, but I cannot find a database with sufficiently many independent unbalanced openings.

I found a large databases (60,000 games) with two related Stockfishes. Now, that seems really significant, at least to my sloppy derivation:

Balanced:

Code: Select all

Summary&#58;
                 players,      min,      max,     Gcnt,     Wcnt,     Lcnt,     Dcnt,  perf&#40;%),
                      S2,     0.00,     0.30,    11538,     3214,     2167,     6157,   54.54,
                      S1,     0.00,     0.30,    10726,     2406,     2654,     5666,   48.84,

Total real games&#58; 60000
Eval window&#58; 0.00 to 0.30
Divisor&#58; 1
Move range&#58; 8 to 8
Elpased time&#58; 0.76m

Summary&#58;
                 players,      min,      max,     Gcnt,     Wcnt,     Lcnt,     Dcnt,  perf&#40;%),
                      S2,    -0.30,     0.00,    10605,     2357,     2605,     5643,   48.83,
                      S1,    -0.30,     0.00,    10333,     1791,     3104,     5438,   43.65,

Total real games&#58; 60000
Eval window&#58; -0.30 to 0.00
Divisor&#58; 1
Move range&#58; 8 to 8
Elpased time&#58; 0.81m

Unbalanced:

Code: Select all

Summary&#58;
                 players,      min,      max,     Gcnt,     Wcnt,     Lcnt,     Dcnt,  perf&#40;%),
                      S2,     0.90,     1.30,     1115,      544,      144,      427,   67.94,
                      S1,     0.90,     1.30,     1650,      600,      370,      680,   56.97,


Total real games&#58; 60000
Eval window&#58; 0.90 to 1.30
Divisor&#58; 1
Move range&#58; 8 to 8
Elpased time&#58; 0.71m

Summary&#58;
                 players,      min,      max,     Gcnt,     Wcnt,     Lcnt,     Dcnt,  perf&#40;%),
                      S2,    -1.30,    -0.90,      970,      186,      359,      425,   41.08,
                      S1,    -1.30,    -0.90,      964,      127,      472,      365,   32.11,

Total real games&#58; 60000
Eval window&#58; -1.30 to -0.90
Divisor&#58; 1
Move range&#58; 8 to 8
Elpased time&#58; 0.71m

My basic idea was that by using positions near the threshold of win/draw, the draw percentage can be pulled closer to 50%. But in this sample, due to fast time limit, it is only about 53% anyway, so I wouldn't expect very much benefit from using unbalanced positions. You need games at a much longer time limit, where the draw percentage is much higher, to test the idea properly.

Laskos · Post by **Laskos** » Mon Oct 05, 2015 10:04 pm

lkaufman wrote:
My basic idea was that by using positions near the threshold of win/draw, the draw percentage can be pulled closer to 50%. But in this sample, due to fast time limit, it is only about 53% anyway, so I wouldn't expect very much benefit from using unbalanced positions. You need games at a much longer time limit, where the draw percentage is much higher, to test the idea properly.

It seems not as clear cut as 50% W and 50% D, because what we really have are W,D,L. If I go to longer 15''+0.15'', I get a higher draw rate, but in this case, even if it suggests a gain, it cannot acquire statistical significance, it's hard for me to play the necessary quantity of games.

Code: Select all

Unbalance 0.00-0.20
Score of S2 vs S1&#58; 341 - 291 - 1368  &#91;0.512&#93; 2000
ELO difference&#58; 9
Finished match

Unbalance 0.70-0.80
Score of S2 vs S1&#58; 628 - 510 - 862  &#91;0.529&#93; 2000
ELO difference&#58; 21
Finished match

Uri Blass · Post by **Uri Blass** » Tue Oct 06, 2015 12:57 am

A different idea may be to find complex balanced positions.

It is possible to define the complexity of a chess position(relative to some set of engines and conditions they play) by the following experiment

Make a tournament between all the engines in the set from the relevant position(every pair of engines play 2 games white and black) and find the difference in rating between the best engine and the worst engine.

The difference in rating is the complexity of the position.

simple draw is going to have complexity 0 because all engines are going to get draws.

Same also for simple win for white because every 2 engines are going to get 1-1

Now the question is if we can find chess positions that have a complexity that is significantly higher than the complexity of the opening position.

lkaufman · Post by **lkaufman** » Tue Oct 06, 2015 4:28 am

Uri Blass wrote:A different idea may be to find complex balanced positions.

It is possible to define the complexity of a chess position(relative to some set of engines and conditions they play) by the following experiment

Make a tournament between all the engines in the set from the relevant position(every pair of engines play 2 games white and black) and find the difference in rating between the best engine and the worst engine.

The difference in rating is the complexity of the position.

simple draw is going to have complexity 0 because all engines are going to get draws.

Same also for simple win for white because every 2 engines are going to get 1-1

Now the question is if we can find chess positions that have a complexity that is significantly higher than the complexity of the opening position.

That is a very clever idea! To get a big enough sample from a small number of engines, just run the event at multiple time controls. Finding chess positions with higher complexity than the opening position is extremely easy, because the opening position is symmetrical and therefore automatically at least somewhat drawish. Just picking positions from an opening database based on stats should do the job while sticking to "reasonable" chess positions. I'll leave it to the mathematicians here to figure out the optimum formula for doing that.

Nelson Hernandez · Post by **Nelson Hernandez** » Tue Oct 06, 2015 6:51 pm

Larry,

I wish I had seen your comments when you was first published them. It seems we independently came to exactly the same conclusion.

Stage 3 of the present seasons of TCEC will exclusively consist of unbalanced openings for precisely the reasons you state. This has been my plan for the past year. I expect this change of approach to be so controversial that an explanation has been prepared that will be released prior to the start of the next stage. Look for it.

The Superfinal is going to be two-thirds balanced, like last season, and one-third like Stage 3--unbalanced. The Superfinal will vary considerably from season to season as it is our intention to bring in a rotating cast of guest openings experts to select 50 opening positions in future seasons.

Nelson

Frank Quisinsky · Post by **Frank Quisinsky** » Tue Oct 06, 2015 7:47 pm

Hi Larry,

I found my way (but it's a lot of work because I do that since years).

Step 1:
I collected all the games from stronger GM / best correspondence games in two groups.

Group 1:
Good known opening theory-games undo move 10 (for different openings, like E99 for an example ... move 12 or a bit higher).

Group 2:
Critical games in Group 2 up to move number 6.

Step 2:
Now the book need priorities (or A01 is the same as B33). A good indication for priorities of course ... Super GM theory. But I am using encyclopaedia of chess openings (different real books too).

Step 3:
"Ready" and the work can start with eng-eng games ...

Each bad line (engines produced in eng-eng games) must be sorted out from the opening book. All new produced games are to add again and again in the book (the lines with 6 moves only are fill out by engines undo move 12) ... the deal!

I created a mix ...

1. Engines find out lines (move number 6-12) and
2. GM / correspondence games.
3. From my real opening books I added a lot of material.

And the result is an opening book I am using for my FCT Rating List. I have now around 90.000 balanced lines in my book. Around 10.000 are find out by engines. This lines you will never find in GM-Theory databases or real books.

That is perfect for my work and believe me ... I know so many opening books but no other book will give me the following results:

Example:
2050 games test-run ...
Komodo vs. place 2 - 21.

You find maybe 2-3 double lines / systems in the 50 games matches and all ECO codes will be play ... for sure ... A01 rarely and B33 often ... because the priorities are working.

In my opinion the best way but again ...

All this is a lot of work because 18.5% from all GM games / best correspondence games are deactivated with "F" in my book ... not good ... because each of my TOP-50 engines must working with the lines. Different of the engines don't understand complicated lines and that is really a problem for such a book I try to create.

Eeach of TOP-50 engines should find balanced positions. That is the next deal and it works.

Best
Frank

PS:
And all the tools Ferdinand created for us helps a lot to create such an "Balanced-Moster-Opening-Book".

Frank Quisinsky · Post by **Frank Quisinsky** » Tue Oct 06, 2015 8:05 pm

I forget ...
This opening book is optimated with 302.000 eng-eng games played on my Systems with time controls a bit better as CEGT 40 in 20.

Shredder GUI *.bkt Format:
http://www.amateurschach.de/download/_f ... 151006.zip

If you have interest, means if you try out Shredder GUI for testing engines, you will have a lot of fun with this work ... I am sure!

Working daily on it ...
Of course not perfect but in the near of ...
After my opening book stats to 98.8% perfect (from 100 new FCT games 1.2% lines my book produced are critical). After I start with my work ~ 78,5%.

Best
Frank

lkaufman · Post by **lkaufman** » Tue Oct 06, 2015 8:57 pm

Nelson Hernandez wrote:Larry,

I wish I had seen your comments when you was first published them. It seems we independently came to exactly the same conclusion.

Stage 3 of the present seasons of TCEC will exclusively consist of unbalanced openings for precisely the reasons you state. This has been my plan for the past year. I expect this change of approach to be so controversial that an explanation has been prepared that will be released prior to the start of the next stage. Look for it.

The Superfinal is going to be two-thirds balanced, like last season, and one-third like Stage 3--unbalanced. The Superfinal will vary considerably from season to season as it is our intention to bring in a rotating cast of guest openings experts to select 50 opening positions in future seasons.

Nelson

That's good to hear, I totally approve. There is more need for unbalanced positions in the superfinal than in Stage 3, but I won't quibble about that.

carldaman · Post by **carldaman** » Wed Oct 07, 2015 1:41 am

Frank Quisinsky wrote:Hi Larry,

I found my way (but it's a lot of work because I do that since years).

Step 1:
I collected all the games from stronger GM / best correspondence games in two groups.

Group 1:
Good known opening theory-games undo move 10 (for different openings, like E99 for an example ... move 12 or a bit higher).

Group 2:
Critical games in Group 2 up to move number 6.

Step 2:
Now the book need priorities (or A01 is the same as B33). A good indication for priorities of course ... Super GM theory. But I am using encyclopaedia of chess openings (different real books too).

Step 3:
"Ready" and the work can start with eng-eng games ...

Each bad line (engines produced in eng-eng games) must be sorted out from the opening book. All new produced games are to add again and again in the book (the lines with 6 moves only are fill out by engines undo move 12) ... the deal!

I created a mix ...

1. Engines find out lines (move number 6-12) and
2. GM / correspondence games.
3. From my real opening books I added a lot of material.

And the result is an opening book I am using for my FCT Rating List. I have now around 90.000 balanced lines in my book. Around 10.000 are find out by engines. This lines you will never find in GM-Theory databases or real books.

That is perfect for my work and believe me ... I know so many opening books but no other book will give me the following results:

Example:
2050 games test-run ...
Komodo vs. place 2 - 21.

You find maybe 2-3 double lines / systems in the 50 games matches and all ECO codes will be play ... for sure ... A01 rarely and B33 often ... because the priorities are working.

In my opinion the best way but again ...

All this is a lot of work because 18.5% from all GM games / best correspondence games are deactivated with "F" in my book ... not good ... because each of my TOP-50 engines must working with the lines. Different of the engines don't understand complicated lines and that is really a problem for such a book I try to create.

Eeach of TOP-50 engines should find balanced positions. That is the next deal and it works.

Best
Frank

PS:
And all the tools Ferdinand created for us helps a lot to create such an "Balanced-Moster-Opening-Book".

Hi Frank,

In the boldfaced part, you seem to imply that each opening must be handled, or understood as playable, by every engine, before it can be called fair or balanced.

However, engine chess doesn't always work that way. Sometimes one or several engines will understand one opening far better than other engines and thus benefit from it by outplaying the opposition. It would be unfair to such engines to omit these opening lines since doing so would punish them for being better than other engines in that area.

That is the whole point of testing - to find out which engines are better/worse and by how much. If one levels the playing field too much, then these differences will be blurred and covered up, which runs counter to good testing methods.

Sorry if I misconstrued your comments and please correct me if I'm wrong.

Regards,
CL

Frank Quisinsky · Post by **Frank Quisinsky** » Wed Oct 07, 2015 2:15 am

Hi Chris,

you know that all available engines - newest versions (TOP-50) - are playing on FCT Rating List.

Only a hand full engines don't understand different main lines. Can be see in eval very easy (good examples: Alfil, Vajolet, SmarThink, Fizbo and in most of cases in open positions). Different other engines can Play ECO codes very exactly and produced much better results (Texel is a good example).

Belivie me, I am working now so many time on this book and know the openings and engines styles. In most of cases I know if I have to check variants in detail if I see this one ...

Example:
SmarThink - Fizbo and both give -0.8 after opening book moves. Most of other engines give -0.2 for the same position. Allways the same openings I must check in detail ... A57, A99, E99, C30-C39 and so one.

I deactivated over 2.000 lines in my opening book. Possible that I am wrong in maybe 100 cases, I don't know. You can see the lines I deactivated (f = bad line, e = fast draw game undo move 20).

I checked the ECO codes again and again ... all main lines from all 500 ECO codes are active.

I think an opening book for all engines is most compliacted. To create an opening book for one engine is very easy.

With the chess knowledge I have I can't make it better and the system I am using for my book optimations works fine.

Try out my book.
1.000 games between two engines you like.
After such a test-run you have to check the database and you can see ...

1. Optimal priorities for most improtant ECO codes.
2. No double games to 99,85%.
3. You will find perhaps 10 games with critical lines only.

The result of all is a very exactly Rating List. At the moment are 400 games in my 120.800 database with critical lines (all are corrected in my book).

Bad is only ... the 400 games aren't replayed. But I think this one isn't important. Easy, I overview the games (I check each of the games). The new tool by Ferdinand helps a lot.

But I am really shocked to find 400 bad lines in my book after I check the 120.800 database with the new tool by Ferdinand.

Best
Frank

The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: The future of chess and elo ratings

Re: I found my way ...

Re: I found my way ...

Re: The future of chess and elo ratings

Re: I found my way ...

Re: I found my way ...