IN 100 games if SF reaches 51 in TCEC it should be stopped

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Zenmastur »

Ovyron wrote: Sat Oct 12, 2019 6:14 pm Are you saying that Leela winning Season 15 was a fluke?
NO! I'm saying that Leela NOT winning season 16 wasn't a fluke.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Ovyron »

Zenmastur wrote: Sat Oct 12, 2019 6:33 pmNO! I'm saying that Leela NOT winning season 16 wasn't a fluke.
I think you need to re-parse that, because it's a double negative, so if you remove the "NOT"s you're left with "I'm saying that Leela winning season 16 was a fluke", but Leela isn't winning season 16. Can you reword what you're saying without using any negatives?
Your beliefs create your reality, so be careful what you wish for.
Leo
Posts: 1080
Joined: Fri Sep 16, 2016 6:55 pm
Location: USA/Minnesota
Full name: Leo Anger

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Leo »

Dann Corbit wrote: Sat Oct 12, 2019 2:59 am
Chessqueen wrote: Sat Oct 12, 2019 12:16 am I do not know if it was luck that AllieStein v0.5-dev_7b41f8c-n11 got a better score than LCO but AS did NOT do as good as LCO against Stockfish 19092522, probably next time around Alliestein with an update might be as strong as Stockfish, unless there is something better than RTX 2080 waiting around the corner. Anyway In 100 games if SF reaches 51 it should be stopped, or they will continue it anyway ? www.tcec-chess.com/
From TCEC 16 Rules and Information:
"Superfinal

The Superfinal consists of 100 games at TC 120+10, with 50 different openings, among them once the normal start position, so that each engine plays both black and white of the same opening position. The match will be presented with opening 1 used in games 1 and 2, then opening 2 used in games 3 and 4 etc.
If the match is theoretically won for one side before game 100, the match will still continue until all 100 games have been played."

SF has already won 51 games, and they are playing on, so the rules are being followed.
I like that, because the games produce really interesting data.
I think advertisers have paid for 100 games.
Advanced Micro Devices fan.
Leo
Posts: 1080
Joined: Fri Sep 16, 2016 6:55 pm
Location: USA/Minnesota
Full name: Leo Anger

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Leo »

What has SF done to improve so much? I am really surprised.
Advanced Micro Devices fan.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Zenmastur »

Ovyron wrote: Sat Oct 12, 2019 6:43 pm
Zenmastur wrote: Sat Oct 12, 2019 6:33 pmNO! I'm saying that Leela NOT winning season 16 wasn't a fluke.
I think you need to re-parse that, because it's a double negative, so if you remove the "NOT"s you're left with "I'm saying that Leela winning season 16 was a fluke", but Leela isn't winning season 16. Can you reword what you're saying without using any negatives?
The two negatives aren't referring to the same subject. i.e "NOT winning" and "WASN"T a fluke" don't cancel out since the subject isn't the same.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Ovyron »

But that leads to a contradiction:

A. Leela beat Stockfish in TCEC 15 because it was better (it wasn't a fluke.)
B. Allie advanced to TCEC 16 super final because it was better than Leela (Leela on 3rd or fourth.)
C. Stockfish beat Allie because it is better (because NNs aren't mature enough, etc.)
D. Stockfish hasn't been improved significantly since TCEC 15.

So how did Stockfish become better than Allie and Leela without improving much? To resolve this contradictions one of these must be true:

a. Stockfish improved and is now better than Allie and Leela
b. Leela is still better than those but by a fluke it ended third before TCEC 16 superfinal.
c. Leela was never better than Stockfish and it won TCEC 15 by a fluke.
d. (something else that you're saying that I don't get)
Your beliefs create your reality, so be careful what you wish for.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Zenmastur »

Ovyron wrote: Sat Oct 12, 2019 7:03 pm But that leads to a contradiction:

A. Leela beat Stockfish in TCEC 15 because it was better (it wasn't a fluke.)
B. Allie advanced to TCEC 16 super final because it was better than Leela (Leela on 3rd or fourth.)
C. Stockfish beat Allie because it is better (because NNs aren't mature enough, etc.)
D. Stockfish hasn't been improved significantly since TCEC 15.

So how did Stockfish become better than Allie and Leela without improving much? To resolve this contradictions one of these must be true:

a. Stockfish improved and is now better than Allie and Leela
b. Leela is still better than those but by a fluke it ended third before TCEC 16 superfinal.
c. Leela was never better than Stockfish and it won TCEC 15 by a fluke.
d. (something else that you're saying that I don't get)
First it's possible that three programs can be equal and yet A beats B, B beats C, and C beats A.
Second, D is true as far as raw ELO goes. But it's move selection could have (did) change in subtle ways that affects its ability to successfully defend/attack against Leela while remaining basically neutral against other A/B engines.

Kai claims Leela is superior. Superiority is a moving target. Maybe it was/is. I guess time will tell if the target doesn't move too far in the mean time.

One fact remains, ALL NN engines are capable of producing gross blunders and most do so at somewhat regular intervals (say 1 per 500 moves). This figure needs to be greatly reduced.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Ovyron »

Zenmastur wrote: Sat Oct 12, 2019 7:34 pmFirst it's possible that three programs can be equal and yet A beats B, B beats C, and C beats A.
In this case whoever wins wins by a fluke, not by being better than the others.
Zenmastur wrote: Sat Oct 12, 2019 7:34 pmOne fact remains, ALL NN engines are capable of producing gross blunders and most do so at somewhat regular intervals (say 1 per 500 moves). This figure needs to be greatly reduced.
Okay, but if they still play at the level of A/B engines it means they blunder less generally, to compensate for the gross blunders. Any time Stockfish lost in the TCEC 15 final it was because it blundered, and it blundered more often than Leela, so I'd say it'd be more fruitful to reduce the figure of Stockfish's blunders than the ones from NNs, even if they're not as gross.

Because a blunder is a blunder, it'll lose you the game even if it's not a gross one, so I don't see the difference between Stockfish blunders and NN blunders (?? is bad enough to lose.)
Your beliefs create your reality, so be careful what you wish for.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Zenmastur »

Ovyron wrote: Sat Oct 12, 2019 8:15 pm
Zenmastur wrote: Sat Oct 12, 2019 7:34 pmFirst it's possible that three programs can be equal and yet A beats B, B beats C, and C beats A.
In this case whoever wins wins by a fluke, not by being better than the others.
If that's what you want to call it. With a set ELO difference between two opponents you can statistically predict how often each should win in a given match length. I don't call that a fluke. It's just the way it is.
Zenmastur wrote: Sat Oct 12, 2019 7:34 pmOne fact remains, ALL NN engines are capable of producing gross blunders and most do so at somewhat regular intervals (say 1 per 500 moves). This figure needs to be greatly reduced.
Okay, but if they still play at the level of A/B engines it means they blunder less generally, to compensate for the gross blunders. Any time Stockfish lost in the TCEC 15 final it was because it blundered, and it blundered more often than Leela, so I'd say it'd be more fruitful to reduce the figure of Stockfish's blunders than the ones from NNs, even if they're not as gross.

Because a blunder is a blunder, it'll lose you the game even if it's not a gross one, so I don't see the difference between Stockfish blunders and NN blunders (?? is bad enough to lose.)
A/B engines tend more towards micro-blunders when they don't "understand" the position. And they generally produce many micro blunders to lose the game. Death by a thousand small cuts. NN engines can produce gross blunders at ANY time, EVEN when they fully "understand" the position and/or have a commanding lead. When an A/B engine has a commanding lead it has an attainable goal and will rarely blunder it. When an A/B engine "thinks" the position is about even OR there is no clear goal for it, they tend to micro-blunder much more often. The nature, magnitude and number of the blunders tend to differ between the two types of engines.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Laskos »

Zenmastur wrote: Sat Oct 12, 2019 8:39 pm

A/B engines tend more towards micro-blunders when they don't "understand" the position. And they generally produce many micro blunders to lose the game. Death by a thousand small cuts. NN engines can produce gross blunders at ANY time, EVEN when they fully "understand" the position and/or have a commanding lead. When an A/B engine has a commanding lead it has an attainable goal and will rarely blunder it. When an A/B engine "thinks" the position is about even OR there is no clear goal for it, they tend to micro-blunder much more often. The nature, magnitude and number of the blunders tend to differ between the two types of engines.

Regards,

Zenmastur
IMO you defined the behavior of the two paradigms well. So, you do agree that in most tactically quiet, fairly balanced positions Leela is better (possibly much better)? Doesn't this lead to "take Leela as the base engine, and SF as tactical backup" for analysis? We disagreed on that IIRC.

I am not a Corr Chess player, and I might be wrong.