STS rating v13.1 for Lc0 0.21.2 with nodes = 1

Laskos · Post by **Laskos** » Wed Jun 19, 2019 10:45 pm

peter wrote: ↑Wed Jun 19, 2019 10:04 pm
Laskos wrote: ↑Wed Jun 19, 2019 9:19 pm Can you admit bluntly that Leela on an RTX GPU is objectively MUCH stronger positionally than any of these Stockfishes, Komodos etc even on 64 core machine (doesn't matter how many cores)? And that STS fails miserably in showing that, while it claimed to measure exactly that?
You don't want to understand that your definition of "positional strength", which I hope for you is more than the result of your "test suite", isn't mine, which of course isn't simply to me any single one other test suite neither, not any single one , not even like any single one like the to me still even better test suite like STS.

It wasn't me demanding from you a test suite that would replace eng-eng-games as for showing any certain kind of playing strength or of even overall playing strength (even more difficult to define because demanding even more single positions to be tested) but the one reflected by the single test you run.

Every test, by game-playing and by test suites depend on positions, opening, middle- game and end-game positions.

Game- playing from early opening positions only always test the opening positions and the strength in opening of the engines compared to each other three times more then endgame positions and three times more divided by two then middlegame- postions, because opeing positions are tested in opening, in middle- game and in end-game by the progress of game.

So if you want to give better measurements, you have to have better and more test postions, opening- ,middle-, and endgame positions.
By gameplaying from certain opening positions you test engines' ability to deal with these opening positions, by gameplaying from middlegame positions you test engines' abilities to deal with these middlegame positions and the same with endgame positions.

If you think, your positional test suite represents your definition of positional strenght best, fine, so be it for you and your definition.
What you must not expect, is that it would be anybody else's definition and test suite of one and only choice too.

If I find the positional qualities tested by STS better fitting to my definition of positional strength, you'll have to be confident with this as well or call me whatever you want.
But remember it wasn't and isn't me who claimed any single test suite a measurement for anybody else's definition of positional strength then the one given by the author of the suite as a very well defined one definition of its own, not more and not less.
Period.

I didn't claim anything about my suite, maybe just that positionally in the openings it didn't show itself as badly as STS, which deals with overall positional play (and deals badly, being badly conceived). It is this very thread dealing with STS results for 1 node Leelas as being useful for something.

So, you don't admit that aside endgames, Lc0 with late 20b nets on RTX GPU is vastly superior positionally to all these Stockfishes and Komodos? Because you have a very broad and deep understanding of "positional strength", while I understand "positional strength" only according to my suite, right? Well, there are many deep and broad geniuses on Talkchess, one of them for some reason is not active for months now, Tsvetkov or something, and while I am only a 1700 Elo player, I can smell geniuses from a mile.

All in all, I wanted to say that the STS results are not very useful when comparing Leela 20b nets. Even my crap suite is better.

Max · Post by **Max** » Wed Jun 19, 2019 11:16 pm

Laskos wrote: ↑Wed Jun 19, 2019 1:52 pm STS is not great positional test suite and this became clear precisely with Leela. I have my own 3 year old positional opening suite containing 200 positions, which proved to withstand the Leela challenge, Leela is outperforming all other engines by a wide margin. Here are solved positions at 1 node for Leela and depth=1 for top regular engines:
Code: Select all
nodes=1

42611
score=128/200 

T40.T8.610
score=125/200

40b_131
score=118/200

11261
score=117/200

32930
score=116/200

--------------------
depth=1

Stockfish_dev
score=55/200

Komodo 13.02
score=46/200
Score of 128/200 is matched by SF only in some long time controls tests (say 30s per position on 4 cores, or more than 150 million nodes per position).

Kai, I don't get your point. Why are your 200 positions better? With the 1500 STS positions there is also a big margin to the Lc0 networks tested with node=1.

Running STS rating v13.1 with depth = 1

Code: Select all

STS(elo) 	Engine
--------------------------------
1405		Hakkapeliitta TCEC v2
1473		Arasan 21.3
1555		Texel 1.07
1602		Vajolet 2.6.2
1705		Stockfish 10
1754		Komodo 10

Laskos · Post by **Laskos** » Wed Jun 19, 2019 11:30 pm

Max wrote: ↑Wed Jun 19, 2019 11:16 pm
Laskos wrote: ↑Wed Jun 19, 2019 1:52 pm STS is not great positional test suite and this became clear precisely with Leela. I have my own 3 year old positional opening suite containing 200 positions, which proved to withstand the Leela challenge, Leela is outperforming all other engines by a wide margin. Here are solved positions at 1 node for Leela and depth=1 for top regular engines:
Code: Select all
nodes=1

42611
score=128/200 

T40.T8.610
score=125/200

40b_131
score=118/200

11261
score=117/200

32930
score=116/200

--------------------
depth=1

Stockfish_dev
score=55/200

Komodo 13.02
score=46/200
Score of 128/200 is matched by SF only in some long time controls tests (say 30s per position on 4 cores, or more than 150 million nodes per position).
Kai, I don't get your point. Why are your 200 positions better? With the 1500 STS positions there is also a big margin to the Lc0 networks tested with node=1.

Running STS rating v13.1 with depth = 1
Code: Select all
STS(elo) 	Engine
--------------------------------
1405		Hakkapeliitta TCEC v2
1473		Arasan 21.3
1555		Texel 1.07
1602		Vajolet 2.6.2
1705		Stockfish 10
1754		Komodo 10

Yes, but I think comparing Lc0 20b nets among themselves using STS is wrong. I have no reason to believe that giving weird result in some finite time (say 1s/position), it will show something interesting at 1 node. Yes, your initial results in this thread are not that far off, but probably still a bit scrambled, and conclusions based on it are useless.

I think it's easy to play many games at nodes=1 with Leela to have much more far-reaching conclusions. I recently had the following:

Code: Select all

Rank Name                           Elo     +/-   Games   Score   Draws
   1 Lc0 42580                       49      30     400   57.0%   24.0%
   2 Lc0 11248                       18      30     400   52.6%   23.8%
   3 Texel Elo 2100                   3      32     400   50.5%   14.5%
   4 Lc0 40b 119                     -8      29     400   48.9%   26.3%
   5 Lc0 32930                      -63      29     400   41.0%   30.5%
Finished match

Starting positions were human 4-movers.

peter · Post by **peter** » Thu Jun 20, 2019 12:18 am

Laskos wrote: ↑Wed Jun 19, 2019 10:45 pm Even my crap suite is better.

Ah, now I finally yet did get your point.
Yes of course, you're right. Your crap suite is the very best for crap results, but then again, that I never doubted.

By the way, what are you talking about geniuses like Tsvetkov?
I thought we were talking about people like Swaminathan and Corbit, and who were the authors of "your suite" quickly again?
Some special geniuses of their own for sure too.

lkaufman · Post by **lkaufman** » Thu Jun 20, 2019 1:11 am

I just want to say that I think the distinction between "tactical" and "positional" problems is rather arbitrary and not so useful, because in real chess games good positional moves are found by tactical details. For example, let's say that rook on the 7th rank is usually good (with whatever conditions you want to specify). One engine may find some odd-looking move that after a deep search results in getting a rook to the 7th rank because preventing it loses material. Is this tactical or positional? I wonder if there is some set of problems taken from high level human games where the right move is difficult but 90% or so agreed upon as best, without distinguishing between tactical and positional problems? That might be a test with some predictive power for elo ratings.

Rebel · Post by **Rebel** » Thu Jun 20, 2019 9:31 am

peter wrote: ↑Thu Jun 20, 2019 12:18 am
Laskos wrote: ↑Wed Jun 19, 2019 10:45 pm Even my crap suite is better.
Ah, now I finally yet did get your point.
Yes of course, you're right. Your crap suite is the very best for crap results, but then again, that I never doubted.

By the way, what are you talking about geniuses like Tsvetkov?
I thought we were talking about people like Swaminathan and Corbit, and who were the authors of "your suite" quickly again?
Some special geniuses of their own for sure too.

Well, Kai is right. STS was developed with the help of the top engines of its time, Rybka and friends. When you run STS nowadays Rybka and friends will top the list and not the engines that are 200-300 elo stronger. That should make you think. It's outdated and served its purpose at the time. It's still a good test for starters.

Laskos · Post by **Laskos** » Thu Jun 20, 2019 10:10 am

lkaufman wrote: ↑Thu Jun 20, 2019 1:11 am I just want to say that I think the distinction between "tactical" and "positional" problems is rather arbitrary and not so useful, because in real chess games good positional moves are found by tactical details. For example, let's say that rook on the 7th rank is usually good (with whatever conditions you want to specify). One engine may find some odd-looking move that after a deep search results in getting a rook to the 7th rank because preventing it loses material. Is this tactical or positional? I wonder if there is some set of problems taken from high level human games where the right move is difficult but 90% or so agreed upon as best, without distinguishing between tactical and positional problems? That might be a test with some predictive power for elo ratings.

I am not sure I would agree, more so seeing the playing of this weirdo called Leela. We do know what a "tactical" problem or puzzle is, don't we? In fact so much emphasis in this forum is about some tactical puzzles, that to me it became a clear, albeit often a bit obnoxious topic. We can find positions which per se don't pose any tactical complications, and aren't they, brushing aside elementary tactics, the majority of in-game chess positions? I do not know a strong human's perspective on that, maybe there are few "quiet" moves for humans, and even a strong human is wary of some hidden tactics move upon move. But this concept that "tactical" and "positional" are hard to separate came to me with the top regular AB engines, where I can clearly see that what I call "positional" strength is due to deeper search and deeper tactics. In case of regular engines, "positional" strength came mostly as a side effect of deeper, tactically accurate search. But Leela doesn't play this game. Leela can easily miss a three-mover shot, but in real games that is not what usually happens.

If we know what "tactics" is, then we know that WAC suite is a very tactical test-suite. I trimmed it from 300 to 145 positions which have a unique, game-changing solution. Komodo solves all of them in under 5 seconds/position, and the vast majority of them at depths 1-12 (125/145 solved), literally in 1-30 milliseconds.

Code: Select all

Engine: Komodo 13.02 64-bit (192 MB)
by Don Dailey, Larry Kaufman, Mark Lefler

1      sec    ->       142/145 
2      sec    ->       143/145 
3      sec    ->       143/145 
4      sec    ->       144/145 
5      sec    ->       145/145 

  n/s: 7.202.528  
  TotTime: 2:33m    SolTime: 11s
  Ply: 0   Positions:145   Avg Nodes:       0   Branching = 0.00
  Ply: 1   Positions:113   Avg Nodes:    1956   Branching = 0.00
  Ply: 2   Positions: 97   Avg Nodes:    4834   Branching = 2.47
  Ply: 3   Positions: 82   Avg Nodes:    6860   Branching = 1.42
  Ply: 4   Positions: 74   Avg Nodes:    9254   Branching = 1.35
  Ply: 5   Positions: 66   Avg Nodes:   14131   Branching = 1.53
  Ply: 6   Positions: 57   Avg Nodes:   19534   Branching = 1.38
  Ply: 7   Positions: 50   Avg Nodes:   26732   Branching = 1.37
  Ply: 8   Positions: 43   Avg Nodes:   36925   Branching = 1.38
  Ply: 9   Positions: 39   Avg Nodes:   50848   Branching = 1.38
  Ply:10   Positions: 32   Avg Nodes:   91416   Branching = 1.80
  Ply:11   Positions: 26   Avg Nodes:  134044   Branching = 1.47
  Ply:12   Positions: 20   Avg Nodes:  235666   Branching = 1.76
  Ply:13   Positions: 16   Avg Nodes:  372846   Branching = 1.58
  Ply:14   Positions: 12   Avg Nodes:  529047   Branching = 1.42
  Ply:15   Positions: 10   Avg Nodes:  794787   Branching = 1.50
  Ply:16   Positions:  7   Avg Nodes: 1289637   Branching = 1.62
  Ply:17   Positions:  6   Avg Nodes: 1792729   Branching = 1.39
  Ply:18   Positions:  5   Avg Nodes: 2959558   Branching = 1.65
  Ply:19   Positions:  4   Avg Nodes: 4611414   Branching = 1.56
  Ply:20   Positions:  4   Avg Nodes: 5795014   Branching = 1.26
  Ply:21   Positions:  3   Avg Nodes: 9113716   Branching = 1.57
  Ply:22   Positions:  2   Avg Nodes: 4529585   Branching = 0.50
  Ply:23   Positions:  1   Avg Nodes:16790212   Branching = 3.71

Here is the number of new solutions by depth of Komodo:

WAC_depth.jpg

Leela (42620) has a big, irrecuperable trouble with this very easy for Komodo tactical suite:

1s/position
score=98/145 [averages on correct positions: depth=4.1 time=0.07 nodes=498]

10s/poisition
score=105/145 [averages on correct positions: depth=4.4 time=0.14 nodes=1620]

In 10 seconds per position (on strong GPU), Leela fares worse than Komodo in milliseconds to depth 9. Leela misses tactical 2-3-4 mover shots quite easily. Can we say that "Leela is weak tactically"? And all in all, beats the crap out of Komodo on my PC due to "something else"? If I call this "something else" as roughly the "positional play", I am outside the usual terminology? Yes, as a patzer human player, I can hardly grasp what exactly "positional play" means, so I use engines, and for example these extreme positional/tactical test suites. If 2 years ago with usual engines, the separation tactical/positional was unclear indeed, with Leela this separation is quite extreme. So, I am unprepared now to blur the separation tactical/positional (was more prepared 2 years ago).

peter · Post by **peter** » Thu Jun 20, 2019 11:10 am

Rebel wrote: ↑Thu Jun 20, 2019 9:31 am
peter wrote: ↑Thu Jun 20, 2019 12:18 am
Laskos wrote: ↑Wed Jun 19, 2019 10:45 pm Even my crap suite is better.
Ah, now I finally yet did get your point.
Yes of course, you're right. Your crap suite is the very best for crap results, but then again, that I never doubted.

By the way, what are you talking about geniuses like Tsvetkov?
I thought we were talking about people like Swaminathan and Corbit, and who were the authors of "your suite" quickly again?
Some special geniuses of their own for sure too.

Well, Kai is right. STS was developed with the help of the top engines of its time, Rybka and friends. When you run STS nowadays Rybka and friends will top the list and not the engines that are 200-300 elo stronger. That should make you think. It's outdated and served its purpose at the time. It's still a good test for starters.

Well, Ed, of course I could imagine better test suites than STS also still, but do you really think, Kai's 200 positions are anything like that?
Did you ever have a look at the positions?

Do you remember the disussions many years ago, when Swaminathan and Corbit came along with the early versions of STS?

At least about this kind of discussion, not so much has changed till then, don't you think so too?

And did you also notice once again, that almost never in such cases, there's a discussion about certain single positions of the suites, just always only about the "results", most of the times even without mentioning better or worse conditions of hardware- time and pool of engines to be compared to each other by different conditions.

There are other discussions about single positions of interest often enough, but it's other people disussing such most of the times, did you notice that too?

I'm outa here again, high time for me for that since quite a while, shouldn't have even started to try one more time to touch one of the holy cows, but now and then, I just can't resist.

chrisw · Post by **chrisw** » Thu Jun 20, 2019 11:40 am

Laskos wrote: ↑Thu Jun 20, 2019 10:10 am
lkaufman wrote: ↑Thu Jun 20, 2019 1:11 am I just want to say that I think the distinction between "tactical" and "positional" problems is rather arbitrary and not so useful, because in real chess games good positional moves are found by tactical details. For example, let's say that rook on the 7th rank is usually good (with whatever conditions you want to specify). One engine may find some odd-looking move that after a deep search results in getting a rook to the 7th rank because preventing it loses material. Is this tactical or positional? I wonder if there is some set of problems taken from high level human games where the right move is difficult but 90% or so agreed upon as best, without distinguishing between tactical and positional problems? That might be a test with some predictive power for elo ratings.
I am not sure I would agree, more so seeing the playing of this weirdo called Leela. We do know what a "tactical" problem or puzzle is, don't we? In fact so much emphasis in this forum is about some tactical puzzles, that to me it became a clear, albeit often a bit obnoxious topic. We can find positions which per se don't pose any tactical complications, and aren't they, brushing aside elementary tactics, the majority of in-game chess positions? I do not know a strong human's perspective on that, maybe there are few "quiet" moves for humans, and even a strong human is wary of some hidden tactics move upon move. But this concept that "tactical" and "positional" are hard to separate came to me with the top regular AB engines, where I can clearly see that what I call "positional" strength is due to deeper search and deeper tactics. In case of regular engines, "positional" strength came mostly as a side effect of deeper, tactically accurate search. But Leela doesn't play this game. Leela can easily miss a three-mover shot, but in real games that is not what usually happens.

If we know what "tactics" is, then we know that WAC suite is a very tactical test-suite. I trimmed it from 300 to 145 positions which have a unique, game-changing solution. Komodo solves all of them in under 5 seconds/position, and the vast majority of them at depths 1-12 (125/145 solved), literally in 1-30 milliseconds.
Code: Select all
Engine: Komodo 13.02 64-bit (192 MB)
by Don Dailey, Larry Kaufman, Mark Lefler

1      sec    ->       142/145 
2      sec    ->       143/145 
3      sec    ->       143/145 
4      sec    ->       144/145 
5      sec    ->       145/145 

  n/s: 7.202.528  
  TotTime: 2:33m    SolTime: 11s
  Ply: 0   Positions:145   Avg Nodes:       0   Branching = 0.00
  Ply: 1   Positions:113   Avg Nodes:    1956   Branching = 0.00
  Ply: 2   Positions: 97   Avg Nodes:    4834   Branching = 2.47
  Ply: 3   Positions: 82   Avg Nodes:    6860   Branching = 1.42
  Ply: 4   Positions: 74   Avg Nodes:    9254   Branching = 1.35
  Ply: 5   Positions: 66   Avg Nodes:   14131   Branching = 1.53
  Ply: 6   Positions: 57   Avg Nodes:   19534   Branching = 1.38
  Ply: 7   Positions: 50   Avg Nodes:   26732   Branching = 1.37
  Ply: 8   Positions: 43   Avg Nodes:   36925   Branching = 1.38
  Ply: 9   Positions: 39   Avg Nodes:   50848   Branching = 1.38
  Ply:10   Positions: 32   Avg Nodes:   91416   Branching = 1.80
  Ply:11   Positions: 26   Avg Nodes:  134044   Branching = 1.47
  Ply:12   Positions: 20   Avg Nodes:  235666   Branching = 1.76
  Ply:13   Positions: 16   Avg Nodes:  372846   Branching = 1.58
  Ply:14   Positions: 12   Avg Nodes:  529047   Branching = 1.42
  Ply:15   Positions: 10   Avg Nodes:  794787   Branching = 1.50
  Ply:16   Positions:  7   Avg Nodes: 1289637   Branching = 1.62
  Ply:17   Positions:  6   Avg Nodes: 1792729   Branching = 1.39
  Ply:18   Positions:  5   Avg Nodes: 2959558   Branching = 1.65
  Ply:19   Positions:  4   Avg Nodes: 4611414   Branching = 1.56
  Ply:20   Positions:  4   Avg Nodes: 5795014   Branching = 1.26
  Ply:21   Positions:  3   Avg Nodes: 9113716   Branching = 1.57
  Ply:22   Positions:  2   Avg Nodes: 4529585   Branching = 0.50
  Ply:23   Positions:  1   Avg Nodes:16790212   Branching = 3.71
Here is the number of new solutions by depth of Komodo:

WAC_depth.jpg

Leela (42620) has a big, irrecuperable trouble with this very easy for Komodo tactical suite:

1s/position
score=98/145 [averages on correct positions: depth=4.1 time=0.07 nodes=498]

10s/poisition
score=105/145 [averages on correct positions: depth=4.4 time=0.14 nodes=1620]

In 10 seconds per position (on strong GPU), Leela fares worse than Komodo in milliseconds to depth 9. Leela misses tactical 2-3-4 mover shots quite easily. Can we say that "Leela is weak tactically"? And all in all, beats the crap out of Komodo on my PC due to "something else"? If I call this "something else" as roughly the "positional play", I am outside the usual terminology? Yes, as a patzer human player, I can hardly grasp what exactly "positional play" means, so I use engines, and for example these extreme positional/tactical test suites. If 2 years ago with usual engines, the separation tactical/positional was unclear indeed, with Leela this separation is quite extreme. So, I am unprepared now to blur the separation tactical/positional (was more prepared 2 years ago).

Tactical-positional is a falser than false dichotomy. They are both words associated with weak human players learning chess. Firstly, weak player knows about material 95331. Then he blunders around, leaves pieces en prise and learns to guard his material. Then he learns there are things called tactics, maybe a knight fork or something. From then on he plays tactics, always looking for a trick. Some few then learn there’s a bit more, called positional, like double pawn. These few then play tactics to try and get some positional advantage. That’s where it stops, also with most chess programmers. Everything is a combination of positional things and some lookahead tactics. The belief being that chess is entirely tactics with a bit of position knowledge. Eg, chess is won by tactics.
Fast forward to 2018. Whoops. Everything everybody knew was wrong. But they are not really very sure why, so they carry on babbling using language that doesn’t actually fit, tactics this, positional that, bla bla. Like Sisyphus, they want to progress up the hill, but they have this giant tactics-positional dichotomy stone that keeps rolling backwards. Worst, probably, are these 75 move deep SF lines people, who want to prove everything, but can’t.
Throw away the words, they were useful in learning, but are a handicap to understanding deeper. There is no tactics. Everything is positional. Except for beginners.

Paloma · Post by **Paloma** » Thu Jun 20, 2019 2:55 pm

Laskos wrote: ↑Thu Jun 20, 2019 10:10 am snip...

If we know what "tactics" is, then we know that WAC suite is a very tactical test-suite. I trimmed it from 300 to 145 positions which have a unique, game-changing solution. Komodo solves all of them in under 5 seconds/position, and the vast majority of them at depths 1-12 (125/145 solved), literally in 1-30 milliseconds.

Can we have your trimmed 145 positions ?
That would be nice.

STS rating v13.1 for Lc0 0.21.2 with nodes = 1

Re: STS rating v13.1 for Lc0 0.21.2 with nodes = 1

Re: STS rating v13.1 for Lc0 0.21.2 with nodes = 1

Re: STS rating v13.1 for Lc0 0.21.2 with nodes = 1

Re: STS rating v13.1 for Lc0 0.21.2 with nodes = 1

Re: STS rating v13.1 for Lc0 0.21.2 with nodes = 1

Re: STS rating v13.1 for Lc0 0.21.2 with nodes = 1

Re: STS rating v13.1 for Lc0 0.21.2 with nodes = 1

Re: STS rating v13.1 for Lc0 0.21.2 with nodes = 1

Re: STS rating v13.1 for Lc0 0.21.2 with nodes = 1

Re: STS rating v13.1 for Lc0 0.21.2 with nodes = 1