https://rebel7775.wixsite.com/rebel/kop ... adrl-blitz
ADRL 40/120
. Advantage SF dropped from +192 to 154
. Draw rate increased from 51.2% to 56.7%
Unfortunately I could not finish the total of 3640 games due to a blip in the electricity net which shut off the PC. Cutechess -recovery did not work.
ADRL 40/120
Moderator: Ras
-
- Posts: 7297
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
ADRL 40/120
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 6888
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: ADRL 40/120
Hi Ed,
I made in the past comparable stats with Elo differents with my FEOBOS balanced book.
Same opponents in a group with 40 moves in 4, 8, 12, 20, 40 minutes
Draw quote goes higher, move-average goes higher and Elo differences from place 1 to the others has fallen.
From 40 moves in 4 to 40 moves in 40 was the different to Stockfish 14 (I made the test Stockfish 14 was available) 46 Elo with 24 engines in the field.
To your experiment:
With a bigger list of engines the different will be much smaller as in your results. Clearly weaker engines made more draws vs. Stockfish with longer time-controls.
I hope that I can produce with my balanced FEOBOS book more short games for optimize my test-set database. I am working a very long time on it. One of the reasons for my FCP-Tourneys. At the moment I have 324 balanced positions, TOP-Engines produced a really high quote on 1:0, 0:1 results. 500 positions would be great to have. The biggest problem I have ... interesting balanced positions with good changes for a short win with black pieces are rarely. Furthermore, to many of the positions I have are from the same ECO-Codes. The 324 positions I have are from 71 ECO codes only.
Thanks for your test-results!!
I like such things!
Best
Frank
I made in the past comparable stats with Elo differents with my FEOBOS balanced book.
Same opponents in a group with 40 moves in 4, 8, 12, 20, 40 minutes
Draw quote goes higher, move-average goes higher and Elo differences from place 1 to the others has fallen.
From 40 moves in 4 to 40 moves in 40 was the different to Stockfish 14 (I made the test Stockfish 14 was available) 46 Elo with 24 engines in the field.
To your experiment:
With a bigger list of engines the different will be much smaller as in your results. Clearly weaker engines made more draws vs. Stockfish with longer time-controls.
I hope that I can produce with my balanced FEOBOS book more short games for optimize my test-set database. I am working a very long time on it. One of the reasons for my FCP-Tourneys. At the moment I have 324 balanced positions, TOP-Engines produced a really high quote on 1:0, 0:1 results. 500 positions would be great to have. The biggest problem I have ... interesting balanced positions with good changes for a short win with black pieces are rarely. Furthermore, to many of the positions I have are from the same ECO-Codes. The 324 positions I have are from 71 ECO codes only.
Thanks for your test-results!!
I like such things!
Best
Frank
-
- Posts: 7297
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: ADRL 40/120
That's indeed one of those dilemmas we are currently facing. A draw vs a 300 elo weaker engine will cost SF elo. But the weaker engine is still a super strong 3400 elo rated engine and from balanced positions hardly will make mistakes resulting in many draws suppressing the elo of SF.Frank Quisinsky wrote: ↑Wed Oct 11, 2023 9:56 pm To your experiment:
With a bigger list of engines the different will be much smaller as in your results. Clearly weaker engines made more draws vs. Stockfish with longer time-controls.
Feeding SF with positions where there is always something to play for it becomes clear how superior the engine is and why it has won the last 7 TCEC tournaments in a row.
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 6888
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: ADRL 40/120
Hello Ed,
you wrote:
A draw vs a 300 elo weaker engine will cost SF elo ...
And for all others with 300 Elo differences to weaker engines the same.
The main problem I have in my still running tourney and all other FCP Tourneys in the past.
For around 20 years we made some experiments?!
How big should be the difference between the first and the last place in a tournament group of engines. We compare the results from tourneys with the own rating systems. The result at this time was ~250 Elo. Maybe today with neural-network and "super-elo results" the situation is changed?
Fact is ...
Hiarcs 15.2 and Counter 5.0 NN aren't strong enough for my still running tourney. Two really great engines (Counter is a strong defensive engine, can see a draw soon and move average is here for the strength Counter have perfect).
I think in maybe 1-2 years we have more as 50 engines with a max. Elo difference to Stockfish (higher time controls) from not more as 250 Elo. To many "new young talents" are on the way.
Best
Frank
you wrote:
A draw vs a 300 elo weaker engine will cost SF elo ...
And for all others with 300 Elo differences to weaker engines the same.
The main problem I have in my still running tourney and all other FCP Tourneys in the past.
For around 20 years we made some experiments?!
How big should be the difference between the first and the last place in a tournament group of engines. We compare the results from tourneys with the own rating systems. The result at this time was ~250 Elo. Maybe today with neural-network and "super-elo results" the situation is changed?
Fact is ...
Hiarcs 15.2 and Counter 5.0 NN aren't strong enough for my still running tourney. Two really great engines (Counter is a strong defensive engine, can see a draw soon and move average is here for the strength Counter have perfect).
I think in maybe 1-2 years we have more as 50 engines with a max. Elo difference to Stockfish (higher time controls) from not more as 250 Elo. To many "new young talents" are on the way.
Best
Frank
-
- Posts: 3699
- Joined: Thu Jun 07, 2012 11:02 pm
Re: ADRL 40/120
250 Elo spread measured with what software - Ordo, BayesElo, Glicko, elostat... all will give a different spread.Frank Quisinsky wrote: ↑Thu Oct 12, 2023 5:58 pm Hello Ed,
For around 20 years we made some experiments?!
How big should be the difference between the first and the last place in a tournament group of engines. We compare the results from tourneys with the own rating systems. The result at this time was ~250 Elo. Maybe today with neural-network and "super-elo results" the situation is changed?
-
- Posts: 6888
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: ADRL 40/120
EloStat used for the older experiments!
-
- Posts: 3699
- Joined: Thu Jun 07, 2012 11:02 pm
Re: ADRL 40/120
Not sure what the result would be using Elostat, but for my Chess324 Top 15
Bayeselo - 231 Elo spread
Ordo - 282 Elo spread
Both are valid, but personally I'm not a fan of the greater spread that Ordo measures.
-
- Posts: 6888
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: ADRL 40/120
Hi,
I think that is fully OK with the differents you have.
More or less I have to think about it for my still running tournament. Around 300 is not OK. Elo from Shredder tournament table is around the same EloStat do.
The reason I can't add engines like Stockfish, or Komodo, Berserk ... to strong.
Ethereal I don't have.
What I can do is to add non NN versions of Stockfish, Komodo, Ethereal ... but honestly, I like it more to see "for me newer engines" in a tournament. Not again and again the strongest engines.
Today a new Pawn 2.0 is available.
I tested a bit on my second and third system.
Very strong ... the new version is perfect for my field of engines!
I think I have to changed my tournament again.
Keep up your good work.
Best
Frank
I think that is fully OK with the differents you have.
More or less I have to think about it for my still running tournament. Around 300 is not OK. Elo from Shredder tournament table is around the same EloStat do.
The reason I can't add engines like Stockfish, or Komodo, Berserk ... to strong.
Ethereal I don't have.
What I can do is to add non NN versions of Stockfish, Komodo, Ethereal ... but honestly, I like it more to see "for me newer engines" in a tournament. Not again and again the strongest engines.
Today a new Pawn 2.0 is available.
I tested a bit on my second and third system.
Very strong ... the new version is perfect for my field of engines!
I think I have to changed my tournament again.
Keep up your good work.
Best
Frank
-
- Posts: 3699
- Joined: Thu Jun 07, 2012 11:02 pm
Re: ADRL 40/120
EloStat - 270 Elo spread. So actually, it is closer to Ordo than it is to bayesElo.Modern Times wrote: ↑Thu Oct 12, 2023 6:59 pmNot sure what the result would be using Elostat, but for my Chess324 Top 15
Bayeselo - 231 Elo spread
Ordo - 282 Elo spread
Both are valid, but personally I'm not a fan of the greater spread that Ordo measures.