ADRL 40/120

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Rebel
Posts: 7297
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

ADRL 40/120

Post by Rebel »

https://rebel7775.wixsite.com/rebel/kop ... adrl-blitz

ADRL 40/120

. Advantage SF dropped from +192 to 154
. Draw rate increased from 51.2% to 56.7%

Unfortunately I could not finish the total of 3640 games due to a blip in the electricity net which shut off the PC. Cutechess -recovery did not work.
90% of coding is debugging, the other 10% is writing bugs.
Frank Quisinsky
Posts: 6888
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: ADRL 40/120

Post by Frank Quisinsky »

Hi Ed,

I made in the past comparable stats with Elo differents with my FEOBOS balanced book.
Same opponents in a group with 40 moves in 4, 8, 12, 20, 40 minutes

Draw quote goes higher, move-average goes higher and Elo differences from place 1 to the others has fallen.
From 40 moves in 4 to 40 moves in 40 was the different to Stockfish 14 (I made the test Stockfish 14 was available) 46 Elo with 24 engines in the field.

To your experiment:
With a bigger list of engines the different will be much smaller as in your results. Clearly weaker engines made more draws vs. Stockfish with longer time-controls.

I hope that I can produce with my balanced FEOBOS book more short games for optimize my test-set database. I am working a very long time on it. One of the reasons for my FCP-Tourneys. At the moment I have 324 balanced positions, TOP-Engines produced a really high quote on 1:0, 0:1 results. 500 positions would be great to have. The biggest problem I have ... interesting balanced positions with good changes for a short win with black pieces are rarely. Furthermore, to many of the positions I have are from the same ECO-Codes. The 324 positions I have are from 71 ECO codes only.

Thanks for your test-results!!
I like such things!

Best
Frank
Modern Times
Posts: 3699
Joined: Thu Jun 07, 2012 11:02 pm

Re: ADRL 40/120

Post by Modern Times »

Great work Ed!
User avatar
Rebel
Posts: 7297
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: ADRL 40/120

Post by Rebel »

Frank Quisinsky wrote: Wed Oct 11, 2023 9:56 pm To your experiment:
With a bigger list of engines the different will be much smaller as in your results. Clearly weaker engines made more draws vs. Stockfish with longer time-controls.
That's indeed one of those dilemmas we are currently facing. A draw vs a 300 elo weaker engine will cost SF elo. But the weaker engine is still a super strong 3400 elo rated engine and from balanced positions hardly will make mistakes resulting in many draws suppressing the elo of SF.

Feeding SF with positions where there is always something to play for it becomes clear how superior the engine is and why it has won the last 7 TCEC tournaments in a row.
90% of coding is debugging, the other 10% is writing bugs.
Frank Quisinsky
Posts: 6888
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: ADRL 40/120

Post by Frank Quisinsky »

Hello Ed,

you wrote:
A draw vs a 300 elo weaker engine will cost SF elo ...

And for all others with 300 Elo differences to weaker engines the same.
The main problem I have in my still running tourney and all other FCP Tourneys in the past.

For around 20 years we made some experiments?!
How big should be the difference between the first and the last place in a tournament group of engines. We compare the results from tourneys with the own rating systems. The result at this time was ~250 Elo. Maybe today with neural-network and "super-elo results" the situation is changed?

Fact is ...
Hiarcs 15.2 and Counter 5.0 NN aren't strong enough for my still running tourney. Two really great engines (Counter is a strong defensive engine, can see a draw soon and move average is here for the strength Counter have perfect).

I think in maybe 1-2 years we have more as 50 engines with a max. Elo difference to Stockfish (higher time controls) from not more as 250 Elo. To many "new young talents" are on the way.

Best
Frank
Modern Times
Posts: 3699
Joined: Thu Jun 07, 2012 11:02 pm

Re: ADRL 40/120

Post by Modern Times »

Frank Quisinsky wrote: Thu Oct 12, 2023 5:58 pm Hello Ed,

For around 20 years we made some experiments?!
How big should be the difference between the first and the last place in a tournament group of engines. We compare the results from tourneys with the own rating systems. The result at this time was ~250 Elo. Maybe today with neural-network and "super-elo results" the situation is changed?
250 Elo spread measured with what software - Ordo, BayesElo, Glicko, elostat... all will give a different spread.
Frank Quisinsky
Posts: 6888
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: ADRL 40/120

Post by Frank Quisinsky »

EloStat used for the older experiments!
Modern Times
Posts: 3699
Joined: Thu Jun 07, 2012 11:02 pm

Re: ADRL 40/120

Post by Modern Times »

Frank Quisinsky wrote: Thu Oct 12, 2023 6:35 pm EloStat used for the older experiments!
Not sure what the result would be using Elostat, but for my Chess324 Top 15

Bayeselo - 231 Elo spread

Ordo - 282 Elo spread

Both are valid, but personally I'm not a fan of the greater spread that Ordo measures.
Frank Quisinsky
Posts: 6888
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: ADRL 40/120

Post by Frank Quisinsky »

Hi,

I think that is fully OK with the differents you have.

More or less I have to think about it for my still running tournament. Around 300 is not OK. Elo from Shredder tournament table is around the same EloStat do.

The reason I can't add engines like Stockfish, or Komodo, Berserk ... to strong.
Ethereal I don't have.

What I can do is to add non NN versions of Stockfish, Komodo, Ethereal ... but honestly, I like it more to see "for me newer engines" in a tournament. Not again and again the strongest engines.

Today a new Pawn 2.0 is available.
I tested a bit on my second and third system.
Very strong ... the new version is perfect for my field of engines!

I think I have to changed my tournament again.

Keep up your good work.

Best
Frank
Modern Times
Posts: 3699
Joined: Thu Jun 07, 2012 11:02 pm

Re: ADRL 40/120

Post by Modern Times »

Modern Times wrote: Thu Oct 12, 2023 6:59 pm
Frank Quisinsky wrote: Thu Oct 12, 2023 6:35 pm EloStat used for the older experiments!
Not sure what the result would be using Elostat, but for my Chess324 Top 15

Bayeselo - 231 Elo spread

Ordo - 282 Elo spread

Both are valid, but personally I'm not a fan of the greater spread that Ordo measures.
EloStat - 270 Elo spread. So actually, it is closer to Ordo than it is to bayesElo.