Fat Fritz destroyed Stockfish!

Ozymandias · Post by **Ozymandias** » Mon Nov 18, 2019 4:18 pm

Ovyron wrote: ↑Mon Nov 18, 2019 12:16 pm
Ozymandias wrote: ↑Mon Nov 18, 2019 11:56 amBTW, I haven't used my DB in a game since 2014 and it hasn't even been updated since 2017.
So the question is why is it called a "good database" if there's no use for it? Has Nelson stopped updating his database as well, or what does he use it for?

I check my database every time I start a new game, and update it on a regular basis, but few games make it into it and I have found statistics become more accurate if I delete old irrelevant games, so it has been steadily shrinking, to the point where training a NN with it would probably produce garbage results, but I don't expect it would shine with one from 2017 games and before either. I wonder what kind of results Nelson's NN training would get, I'm afraid any DB would need a lot of non-trivial work to be used as a training set, but unless we can take a look at Albert Silver's database, we can't know if the zero approach is better or if someone finding the right set of games could produce the best weights file in a much shorter time than the zero approach would allow.

In all areas I've seen supervised learning has improved over unsupervised one, I'd be greatly surprised if this was the exception, but how to supervise it well remains to be seen.

It's called "good" because it's well done, I'm assuming you didn't have objections to the improvements I was mentioning. Nelson still has a use for it, remember he selects some of the openings for TCEC.

Whether the zero approach is better or not, we won't know until an Lc(non)0 project is conducted. I personally can't hazard a guess, can an NN learn something from games played by humans and AB engines, that it can't find on its own? Won't it simply learn some things faster at the same time it turns a blind eye to others?

corres · Post by **corres** » Mon Nov 18, 2019 5:05 pm

Nordlandia wrote: ↑Mon Nov 18, 2019 3:40 pm
corres wrote: ↑Mon Nov 18, 2019 9:36 am I think especially in the case of a relatively weak GPU (weaker than RTX 2080 Ti) during endgame Leela is weakened considerably if the time increment is too low (< 1 sec). Because of this it is more correct for Leele using some minutes plus (> 1 sec) / move Short Time Control.
Other note: Using ponder is correct only if the match is PC against PC and incorrect if the match is engine-engine because in the case of engine-engine the ponder disturb the calculation of the other engine.
That is not necessarily true on a chip with many cores. I agree that Alpha Beta vs another Alpha Beta is a direct downgrade in performance. That is not so apparent for neural network against alpha beta since NN need modest claim on CPU resources.

I suppose your answer connects to my note about "ponder".
Really, the disturbing effect is higher in the case of being a few cores.
But using ponder correctly you would run PC against PC matches.

Ovyron · Post by **Ovyron** » Mon Nov 18, 2019 5:11 pm

Ozymandias wrote: ↑Mon Nov 18, 2019 4:18 pmIt's called "good" because it's well done, I'm assuming you didn't have objections to the improvements I was mentioning.

My DB has 0% repeated games, but I never bothered to have consistent names of players, so perhaps a guy appears as 6 different guys, and I never bothered fixing the elo of them, so a player known to have 2500 rating can still appear with 1500 provisional, or whatever.

I guess that to me whether it's well made depends on its usage. The only uses I've found for a DB is to be able to predict opponent's moves (some are just like parrots that will play the most played move as long as they can), prepare traps (if the most common moves have a hole in them) and to know to pay attention when we're near novelty territory (which turns out to be critical and can be the difference between winning or drawing, unless one was going to win anyway.) And for this, it does its job.

But it follows the Pareto principle, 80% of its contents are crap, and 80% of the time checking it was a waste of time, but I don't know if some statistic is crap, or if I'm wasting my time, until after the fact. Conversely, I can download some random PGN of a 12 2 tournament of Infinity Chess, and find 20% of games that provide value, the problem is I never know if I'll use it on a game ever, like, finally someone plays the Pirc and I get to use a line I saw 2 years ago, but I don't know how much I have to wait to face the Italian where I'm black...

Ozymandias wrote: ↑Mon Nov 18, 2019 4:18 pm Nelson still has a use for it, remember he selects some of the openings for TCEC.

If I was him I'd have convinced them already to drop playing a position with reversed colors and simulate tournament conditions by providing the engines with updated books tweaked with what they play best, like in a real world championship. Because if a position is the Achilles heel of an engine, it makes no sense to have it play it, TCEC turns out to be some thematic tournament of positions and not what I'd call real chess (where players are able to play what they're best at, engines need books for this and variety so we just need to build them as if we were its programmer and wanted it to win TCEC.)

So yeah, I claim that I could outdo Nelson. And the problem is not the size of the knife or how sharp is it, it's that switching to a pair of scissors would be better.

Ozymandias wrote: ↑Mon Nov 18, 2019 4:18 pmWhether the zero approach is better or not, we won't know until an Lc(non)0 project is conducted. I personally can't hazard a guess, can an NN learn something from games played by humans and AB engines, that it can't find on its own? Won't it simply learn some things faster at the same time it turns a blind eye to others?

I used to believe that the zero approach was clearly best because it got rid of all of human prejudice about chess and how it should be played, back in the time where Leela was playing some outstanding chess very unlike anything else seen before and winning from positions where her opponents thought they were winning despite locked Bishops and Queens in the board but out of the game (nicknamed "queen in Siberia"), and it made sense. You introduce human games or engine games and you introduce biases.

But now that it has been shown that you can get to the same place by training from a DB, and that Leela had to get rid of those alien-looking concepts to improve its elo and plays much more normal chess now, the zero approach looks a bit capricious, and I predict that soon enough someone's bulb will light on and they will provide a supervised learning NN that plays stronger than anything else available, though as things tend to go, it'll only be like 20 elo over second best and nobody will be impressed by that.

Ozymandias · Post by **Ozymandias** » Mon Nov 18, 2019 7:17 pm

Ovyron wrote: ↑Mon Nov 18, 2019 5:11 pmI predict that soon enough someone's bulb will light on and they will provide a supervised learning NN that plays stronger than anything else available, though as things tend to go, it'll only be like 20 elo over second best and nobody will be impressed by that.

Neither should them. Right now FF is 50 Elo points behind SF and Lc0. That doesn't invalidate it as an option, but the must-have aura of the top two doesn't come from meagre superiority over its derivates, but the 100 Elo difference over everything else. That's what impresses users.

Nordlandia · Post by **Nordlandia** » Tue Nov 19, 2019 4:16 am

corres wrote: ↑Mon Nov 18, 2019 5:05 pm
Nordlandia wrote: ↑Mon Nov 18, 2019 3:40 pm
corres wrote: ↑Mon Nov 18, 2019 9:36 am I think especially in the case of a relatively weak GPU (weaker than RTX 2080 Ti) during endgame Leela is weakened considerably if the time increment is too low (< 1 sec). Because of this it is more correct for Leele using some minutes plus (> 1 sec) / move Short Time Control.
Other note: Using ponder is correct only if the match is PC against PC and incorrect if the match is engine-engine because in the case of engine-engine the ponder disturb the calculation of the other engine.
That is not necessarily true on a chip with many cores. I agree that Alpha Beta vs another Alpha Beta is a direct downgrade in performance. That is not so apparent for neural network against alpha beta since NN need modest claim on CPU resources.
I suppose your answer connects to my note about "ponder".
Really, the disturbing effect is higher in the case of being a few cores.
But using ponder correctly you would run PC against PC matches.

Let say play on a 10-Core i7-6950X. The proper setup will be 7-Core for Stockfish, 2 For Leela and 1 for OS as background buffer. There is not any particular disturbing effect as the cores don't interfere with each other. Alternatively 8-Core for Stockfish 2 for Leela.

Slight reduction in speed is likely offset by being able to think during opponents time.

Nay Lin Tun · Post by **Nay Lin Tun** » Tue Nov 19, 2019 11:55 am

[/quote]

So yeah, I claim that I could outdo Nelson. And the problem is not the size of the knife or how sharp is it, it's that switching to a pair of scissors would be better.

[/quote]

What is the point of allowing the best engines choose the best opening line? Do you want to increase draw rate from 95% to 99%?

I guess estimated draw rate of top engines without book would probably be 95%( similar to ICCF rate).

A lot of people will get bored and tcec will be dead.

Ovyron · Post by **Ovyron** » Tue Nov 19, 2019 12:48 pm

Nay Lin Tun wrote: ↑Tue Nov 19, 2019 11:55 am What is the point of allowing the best engines choose the best opening line? Do you want to increase draw rate from 95% to 99%?

No, curiously enough, when you go ahead and do this, in practice, the draw-rate goes down, and surprisingly, black's winning rate goes up!

What is happening? Well, suppose there's engine X and Stockfish, if you match them normally, from what people have deemed are acceptable opening positions, where both engines are forced to play them from both sides, you get the same expected results that rating lists predict, no surprise there.

When you do what I'm saying, it's possible that engine X plays as white a position that Stockfish doesn't know how to defend, and that as black engine X plays a position that Stockfish doesn't know how to attack, and that the outcome is 2 games where engine X beats Stockfish! This will decrease the draw rate perceived in the rating lists and TCEC!

Why does Stockfish allow X to get into this? Wouldn't it patch the hole and avoid these positions? It's that only engine X knows how to play those positions well, against other engines Stockfish plays normally or they pick their own variations which are different.

Ask anybody that has produced tournament books for chess engines back when the World Championship was important, if their custom private books were aimed to increase draws, or if they aimed for positions where their engine would shine, and if by increasing their engine's winning chances the draw rate naturally decreased, because you'd rather have a win and a loss than two draws, just in case your engine can save the losing game and earn a half-point.

In the end, if these championship conditions were emulated, it's possible we wouldn't see Stockfish or a Leela derivative winning TCEC, it's possible engine X is able to play against them certain positions where it is better than them (rating is only an average of positions, you can improve this rating by removing the bad ones) and prove supreme if you only play what it's best at.

But we will never know, because old habits die hard.

Nay Lin Tun · Post by **Nay Lin Tun** » Tue Nov 19, 2019 3:47 pm

I agreed there are certain positions where SF doesnt know how to defend, e.g French Wanawer, some Catalan.( same apply to Leela too).

But my point is that " you have to put that custom position variation " . Otherwise, Leela never choose opening variation that she dont know how to play well. And SF rarely choose too.

Ovyron · Post by **Ovyron** » Tue Nov 19, 2019 5:02 pm

Of course, doing it would be hard work.

But imagine if someone went and prepared a tournament book for Stockfish that led it into the positions where it would get the biggest advantage against Leela (tactical positions where Leela is too slow to find the best moves) white someone else prepared a tournament book for Leela that led it into positions where it would get the biggest advantage against Stockfish (those positions where Stockfish claims a 0.00 score but it doesn't have the time to find the necessary positional moves.)

What kind of positions would actually be played? Who would win such a match? Would we get a series of games where both sides beat each other after reaching the positions? What if we allowed the book makers to tweak the books after each game?

All these questions look extremely interesting to me, but they can't be answered with TCEC's approach or on the rating lists. We just get generic results from generic openings and nothing changes because people have become conformist. They see the strongest software, the strongest hardware, and don't really think about how's it used. A world champion of machines will never be seen again unless something changes.

Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!

Re: Fat Fritz destroyed Stockfish!