Where are the funs of Leela?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Where are the funs of Leela?

Post by jp »

Laskos wrote: Sun Jun 16, 2019 10:18 am I had a look now too. You have to be aware of the conditions. Right now they are playing starting from the move 22 (!) of some important game of the past. So, right from the middlegame, completely missing the opening and early midgame.
So again it points to the opening having a disproportionately large influence on the game result.
User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Where are the funs of Leela?

Post by Guenther »

MikeGL wrote: Sat Jun 15, 2019 3:49 pm I agree these NN are very strong. But I did a quick visit at CCC site for the currrent tourn, and was suprised to see Brainfish was demolishing Lc0 left and right. Don't know what's wrong but looks like it is vengeance time for A/B side.
+7 -3 =182!! (94.79% draw rate) is hardly 'demolishing' - this can only be called missinformation or mathematical ignorance

Crosstable after 192 rounds:

Code: Select all

1 Brainfish 98.0/192      
51.04%
0½½½½½½½½½½½½½½½½1½½
½½½½½½½½½½½½½½½½½½½½
½½½½½½½½½½½½½½½½0½½0
½½½½½½1½½½½½½½½½½½½½
½½½½½½½½½½½½½½½½1½½½
½½½½½½½½½½½½½½½½½½½½
½½½½½½½½1½½½½½½½½½½½
½½½½½1½½½½½½½½½½1½½½
½½½½½½½½½½½1½½½½½½½½
½½½½½½½½½½½½

2 Lc0 94.0/192
48.96%
1½½½½½½½½½½½½½½½½0½½
½½½½½½½½½½½½½½½½½½½½
½½½½½½½½½½½½½½½½1½½1
½½½½½½0½½½½½½½½½½½½½
½½½½½½½½½½½½½½½½0½½½
½½½½½½½½½½½½½½½½½½½½
½½½½½½½½0½½½½½½½½½½½
½½½½½0½½½½½½½½½½0½½½
½½½½½½½½½½½0½½½½½½½½
½½½½½½½½½½½½
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Where are the funs of Leela?

Post by Laskos »

jp wrote: Sun Jun 16, 2019 11:41 am
Laskos wrote: Sun Jun 16, 2019 10:18 am I had a look now too. You have to be aware of the conditions. Right now they are playing starting from the move 22 (!) of some important game of the past. So, right from the middlegame, completely missing the opening and early midgame.
So again it points to the opening having a disproportionately large influence on the game result.
Not sure. It points to Lc0 being much better than SF in the openings. About the weights of openings and middlegames, I had some results showing middlegames as more important. After all, the result starting from the middlegames is only +7 -3 and a bazillion of draws for BF, not an outrage, not even conclusive. I guess if one starts from the endgames, he can get an outrage of a result in favor of BF.
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Where are the funs of Leela?

Post by jp »

Laskos wrote: Sun Jun 16, 2019 12:14 pm Not sure. It points to Lc0 being much better than SF in the openings. About the weights of openings and middlegames, I had some results showing middlegames as more important. After all, the result starting from the middlegames is only +7 -3 and a bazillion of draws for BF, not an outrage, not even conclusive.
I'm not sure if your first two sentences were meant to be one or two ("Not sure it points to Lc0 being much better in the openings" or "Not sure. It points to Lc0 being much better in the openings"), which changes the meaning, but I'll assume the second, i.e. you're saying Lc0 is much better in the openings.

Alpha & Leela's reliance on getting an opening advantage would be more commendable if they had anything to show human GMs about the opening, but so far AFAIK they have not produced (in their games) any real opening novelties (of a kind that's different from opening ideas produced by SF & traditional engines, which of course humans do use for opening preparation).

But, yes, +7 -3 and a bazillion of draws is not conclusive.

Laskos wrote: Sun Jun 16, 2019 12:14 pm I guess if one starts from the endgames, he can get an outrage of a result in favor of BF.
Yes, possibly, especially without TBs, though that might tell us more about Lc's weaknesses than BF's strengths.
Nay Lin Tun
Posts: 708
Joined: Mon Jan 16, 2012 6:34 am

Re: Where are the funs of Leela?

Post by Nay Lin Tun »

carldaman wrote: Fri Jun 14, 2019 4:51 pm As someone else suggested, the 'Dawn of a New Era' would be a better and somewhat less sensational way to express the same thought.
"End of an era" became viral in TCEC chat since 26 sept 2018 when King Crusher uploaded a video about leela beating Komodo.

At that time a lot of people did not believe it would become a reality. Some Leela supporters expect " 6 months", the others joke them like " 6 moths" etc.
Nay Lin Tun
Posts: 708
Joined: Mon Jan 16, 2012 6:34 am

Re: Where are the funs of Leela?

Post by Nay Lin Tun »

jp wrote: Sat Jun 15, 2019 3:22 pm
Nay Lin Tun wrote: Sat Jun 15, 2019 1:19 pm In Leela discord , before Sufi 14 and Sufi 15, there were extensive amounts of games tested with exact SF vs Leela ratio in Tcec, around 5000 to 10, 000 games before s14 and 20,000+ games before S15.
Those tests must have used much smaller nodes per move, though, if there were so many games. That would then be just bullet game testing. It's not just the ratio that matters.
A lot of tests were around 1 min bullet. But there were also 3+2 min in 2080RTX and also a few people with 20xx RTX cards do exact TCEC time control with 2 hours per game.

Leela project is massive.( activities, no of volunteers and power consumption etc in comparision with SF)
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Where are the funs of Leela?

Post by jp »

Nay Lin Tun wrote: Mon Jun 17, 2019 10:17 am A lot of tests were around 1 min bullet. But there were also 3+2 min in 2080RTX and also a few people with 20xx RTX cards do exact TCEC time control with 2 hours per game.
I wonder if there are any updated scaling graphs. For older versions, there looked like a problem with Leela's scaling, especially performance maxing out at a certain number of nodes per move (not a very large number). Preferably, the scaling graphs would be done by one person, because we've seen how sensitive results are to the exact conditions, including hardware and openings, etc. At that time, it looked like the best TC for Leela was bullet.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Where are the funs of Leela?

Post by Ovyron »

Dann Corbit wrote: Thu Jun 13, 2019 12:56 amAll this does is introduce more uncertainty about which engine is the strongest.
Consider Sedat's book contests like this one:
https://sites.google.com/site/computers ... book-cs-24
All of the books are different and all of the engines (in this case asmFishWCP 130519 BMI2) and hardware are identical.
So we see that changing the book can change the rating by 113 Elo.
So we can conclude that AsmFish is 113 Elo stronger or weaker than itself, depending on which book is chosen. Does that make sense to you?
If AsmFish was human it could choose its own book (what we call "repertoire"), and it'd choose the book that made it play the strongest. It'd not pick the weaker book because it'd make no sense. I'd not play generic random openings, because nobody wanting to win would play them, ever.

Engines are at a tremendous disadvantage because they can't make choices for themselves, however, it shouldn't matter because humans can make the best choices for them.

We are running around in circles like blindfolded chickens but the truth about who is the best, Leela or Stockfish, will only be answered if they use the strongest book that they can, because it'll show who's best in the most relevant chess lines (like on human tourneys where they prepare the strongest lines that are known at the time), not by playing into random lines that have never appeared in serious games.

Is this a valid test line?:

1. f3 e5 2. g4 Nf6 3. Bg2 Ng8 4. d3 Ne7 5. h4 f6 6. h5 Rg8 7. e3 Na6 8. a3

[d]r1bqkbr1/ppppn1pp/n4p2/4p2P/6P1/P2PPP2/1PP3B1/RNBQK1NR b KQq -

Black already did not mate white, which wasn't optimal, at move 2. What follows is not even chess, and such a line shouldn't be used to test engines because, who cares about how they perform after so many suboptimal moves in a row? Yet most lines from testing groups are like this (a bunch of suboptimal moves leading to random position used to prove who's best in a game where nobody would have played those moves?)
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Where are the funs of Leela?

Post by Dann Corbit »

It does not really matter if there is a mate inside the book line on the way to the book exit point if both engines play the same opening and then reverse it.

First, we are not trying to discover the strongest chess lines. Contests like Sedat's book tournaments {attempt to} discover the best opening lines.

Second, it is almost always possible to reach a position like this:
[d]r1bqkbr1/ppppn1pp/n4p2/4p2P/6P1/P2PPP2/1PP3B1/RNBQK1NR b KQq -
Through another pathway that does not offer the mate.

It seems a common practice to use book openings where the interior nodes have never been tested. These books always contain gaffes. And that is OK, because we are trying to find out how strong the engines are and not how strong the book lines are.

It is true that once in a while some horrible thing happens even at the exit point. When that occurs, what generally happens is that any existing games with that bad opening exit point get expunged and a new opening is substituted. (Happened in TCEC last year, IIRC, but it might have been the year before).

Now, I recognize that correspondence chess players want contests to debug opening lines for them. Perhaps the correspondence players should sponsor contests that do that. Currently the kind of data correspondence chess players would like is not really produced by most chess contests.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Nay Lin Tun
Posts: 708
Joined: Mon Jan 16, 2012 6:34 am

Re: Where are the funs of Leela?

Post by Nay Lin Tun »

jp wrote: Mon Jun 17, 2019 10:33 am
Nay Lin Tun wrote: Mon Jun 17, 2019 10:17 am A lot of tests were around 1 min bullet. But there were also 3+2 min in 2080RTX and also a few people with 20xx RTX cards do exact TCEC time control with 2 hours per game.
I wonder if there are any updated scaling graphs. For older versions, there looked like a problem with Leela's scaling, especially performance maxing out at a certain number of nodes per move (not a very large number). Preferably, the scaling graphs would be done by one person, because we've seen how sensitive results are to the exact conditions, including hardware and openings, etc. At that time, it looked like the best TC for Leela was bullet.
There are about 6-10 leela testers who are testing and streaming various Leela nets 24/7 .

If you go Leela discord 》 testing channel 》and ask those testers/ streamers.