Progress of Stockfish in 6 days

mwyoung · Post by **mwyoung** » Fri Aug 14, 2020 4:03 am

mwyoung wrote: ↑Fri Aug 14, 2020 1:31 am
lkaufman wrote: ↑Fri Aug 14, 2020 1:20 am
mwyoung wrote: ↑Fri Aug 14, 2020 12:03 am
lkaufman wrote: ↑Thu Aug 13, 2020 10:51 pm
mwyoung wrote: ↑Thu Aug 13, 2020 9:54 pm
lkaufman wrote: ↑Thu Aug 13, 2020 7:22 am
Jouni wrote: ↑Wed Aug 12, 2020 9:36 pm Yes SF NNUE is equal to quadruple your CPU cores for free. Incredible .
I actually got a result that SFNNUE (a couple days ago) on one thread beat Stockfish 11 on seven threads, at 2' + 1", by 90 to 80! So you may be understating it!
I also achieved this kind of results. But I hope to God you don't think it can achieve this results in with more threads, and time. Because it does not, it is only marginally better then the best standard Stockfish, and Lc0. The strongest program yes, crushing the other engines no!
Do you have results that show that normal Stockfish with say 64 threads can beat current SFNNUE on 16 threads at some time control (or any four to one ratio of threads at any time limit), or do you just mean that there are a lot more draws with longer time controls and more threads when playing on equal threads?
We have a match request for Stockfish NNUE vs Lc0 on the Match and Tournament page.

"How does perform SF-NNUE vs Lc0 ?
Post by Vinvin » Tue Aug 11, 2020 10:09 pm

I saw very few matches between this 2 engines.

Conditions like :
3min+2sec
200 games
book : short lines (where Lc0 was the best over A/B engines)

RTX 2080 Ti for Lc0
16 cores for SF-NNUE (latest exe + latest NN file)

Could someone run a match like this ?"

Why Speculate this is a easy test to run! Lets see how the best 32 threads Stockfish NNUE performs at 3m+2s vs the best Lc0 26.1 on a 2080 ti.

"I actually got a result that SFNNUE (a couple days ago) on one thread beat Stockfish 11 on seven threads, at 2' + 1", by 90 to 80! So you may be understating it!"

Lets see how close we get to your Elo rating....

Live Stream:
So far all draws. While I do appreciate this match, it is not very relevant to the question of whether SFNNUE can give (for example) 64 to 16 thread handicap to SF11. If it can give four threads to one successfully (as appears to be the case), if it can't give 64 to 16 then it may indicate poor scaling. The elo gap with equal threads will of course decline with more threads and time, that has nothing to do with the specific engine.
Lets see what it can do here.

And stop the hype, you should know better without testing.

"Yes SF NNUE is equal to quadruple your CPU cores for free. Incredible ."

"I actually got a result that SFNNUE (a couple days ago) on one thread beat Stockfish 11 on seven threads, at 2' + 1", by 90 to 80! So you may be understating it!"

As testers we should give the whole truth, not cherry picked results!

After 24 games SF NNUE vs Lc0 at 3m+2s are tied. 1-22-1

Live Steam:

Nay Lin Tun · Post by **Nay Lin Tun** » Fri Aug 14, 2020 6:41 am

You may consider draw tendency of your opening book.
If 4 core SF is able to get 40/100 scores vs 32 cores SF in your opening book and time control, you may consider discarding your opening book. In chess, some lines are pretty drawish (Nimzo and QGD ).

mwyoung · Post by **mwyoung** » Fri Aug 14, 2020 6:50 am

Nay Lin Tun wrote: ↑Fri Aug 14, 2020 6:41 am You may consider draw tendency of your opening book.
If 4 core SF is able to get 40/100 scores vs 32 cores SF in your opening book and time control, you may consider discarding your opening book. In chess, some lines are pretty drawish (Nimzo and QGD ).

So what is your point. I should use bad openings. The same book and the same book settings did not look drawish here.

Stockfish NNUE (Sergio 2138) vs Komodo 14 (1 Core Test) (TC = 1m+0.5s.)

Hardware 2950x, RTX 2080 ti

TC=1m+0.5s.
Ponder off.
1 threads.
1 Gb hash.
6 man TB.
6 move book
Default settings

DESKTOP-CORSAIR, Blitz 1.0min+0.5sec 0

1 SF+NNUE PO 290720 x64 popc +241 +24/=16/-0 80.00% 32.0/40
2 Komodo 14 64-bit -241 +0/=16/-24 20.00% 8.0/40

Here is a theory. Maybe it is their results that are flawed, or book. I posted my book, and settings. And recorded the match.

MMarco · Post by **MMarco** » Fri Aug 14, 2020 8:21 am

Your results are not flawed, but you'll get tons of draws with regular openings as opposed to low draw openings like in TCEC. With their hadware, if TCEC were to use regular openings, superfinals would end up like +2, -1, =97 instead of +23, -16, =61 like in TCEC 18.

mwyoung · Post by **mwyoung** » Fri Aug 14, 2020 8:28 am

MMarco wrote: ↑Fri Aug 14, 2020 8:21 am Your results are not flawed, but you'll get tons of draws with regular openings as opposed to low draw openings like in TCEC. With their hadware, if TCEC were to use regular openings, superfinals would end up like +2, -1, =97 instead of +23, -16, =61 like in TCEC 18.

Yes. It is not in the interest of TCEC to have draws. Because TCEC is not a engine testing site. TCEC is a chess engine exhibition.

And the book is not an issue. The book is to 6 moves. There is no deep theory. The reason why you see draws is because you have 2 very strong engines playing on good hardware. If I use a weaker engine vs Lc0 or Stockfish NNUE. The draws go away.

Laskos · Post by **Laskos** » Fri Aug 14, 2020 10:36 am

-mwyoung- is known to not change his mind at any cost even in the front of overwhelming evidence that he might do much better. Leave him alone, I never follow his tests, he used to use some 36 threads on 16 core machine, I don't know what other silly things he is doing.

marsell · Post by **marsell** » Fri Aug 14, 2020 1:28 pm

-mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.

Jouni · Post by **Jouni** » Fri Aug 14, 2020 1:45 pm

SF NNUE is beating now Lc0 badly +53 ELO. No need to use 90/176 cores like CCC or TCEC. All next finals SF vs SF NNUE!?

Laskos · Post by **Laskos** » Fri Aug 14, 2020 1:52 pm

marsell wrote: ↑Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.

I am attaching a very useful 2-mover unbalanced EPD opening file for cases of draw rates above 70% from balanced openings, an 2-mover opening suite which will decrease the draw rate to some 50% even from 95% draw rate before. The Elo differences will be large even to LTC and strong hardware. Real error margins are calculated using pentanomial variance, and are often very much smaller than those shown in UI or Ordo (up to 2 times smaller than those shown by trinomial rating calculators). The suite contains ~1750 different 2-mover positions.

2moves_80_100.rar

mwyoung · Post by **mwyoung** » Fri Aug 14, 2020 3:48 pm

marsell wrote: ↑Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.

Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.

The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....

Progress of Stockfish in 6 days

Re: Progress of Stockfish in 6 days

Re: Progress of Stockfish in 6 days

Re: Progress of Stockfish in 6 days

Re: Progress of Stockfish in 6 days

Re: Progress of Stockfish in 6 days

Re: Progress of Stockfish in 6 days

Re: Progress of Stockfish in 6 days

Re: Progress of Stockfish in 6 days

Re: Progress of Stockfish in 6 days

Re: Progress of Stockfish in 6 days