Leela Chess Zero 42565 vs Stockfish 140619

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
hgm
Posts: 23477
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Leela Chess Zero 42565 vs Stockfish 140619

Post by hgm » Sun Jun 16, 2019 9:51 pm

Pondering makes sense when playing against Leela on a many-core machine, as Leela wouldn't use many CPU threads while thinking, and all other threads would then go to waste. Likewise the GPUs would be idle during Stockfish' turn when Leela is not pondering.

If Leela need two unshared cores, you can just set an affinity for it for hyper-threads 0-3. You can then set affinity for Stockfish to the remaining HT. That way they won't compete for cores. They might still compete for memory bandwidth, though; not sure how important that is for Leela.

I don't see why Leela couldn't run its two threads on the same physical core. That could of course be a disadvantage compared to running 2 threads on 2 physical cores, but that also holds for Stockfish' threads sharing physical cores. Yet some people claim that hyper-threading is beneficial compared to running 1 thread per physical core, and testing both under conditions with 2 active HT per core does not seem particularly unfair. It is just like all HT are somewhat slower physical cores. This doesn't involve any scheduling, so it should not be noisy.

Of course if you don't also reserve some cores for the OS, that would cause noise.

mwyoung
Posts: 1600
Joined: Wed May 12, 2010 8:00 pm

Re: Leela Chess Zero 42565 vs Stockfish 140619

Post by mwyoung » Sun Jun 16, 2019 9:51 pm

Nordlandia wrote:
Sun Jun 16, 2019 9:32 pm
Then again the question that arise is if ponder is good trade-off for more time :?:
Exactactly.

Is 6% CPU cost to stockfish worth pondering. You tell me...

Current move prediction rate of this current ponder match is 57.9%.

And stockfish likes it very much.
Professing themselves to be wise, they became fools,
Take on me. foes 0

mwyoung
Posts: 1600
Joined: Wed May 12, 2010 8:00 pm

Re: Leela Chess Zero 42565 vs Stockfish 140619

Post by mwyoung » Sun Jun 16, 2019 9:53 pm

hgm wrote:
Sun Jun 16, 2019 9:51 pm
Pondering makes sense when playing against Leela on a many-core machine, as Leela wouldn't use many CPU threads while thinking, and all other threads would then go to waste. Likewise the GPUs would be idle during Stockfish' turn when Leela is not pondering.

If Leela need two unshared cores, you can just set an affinity for it for hyper-threads 0-3. You can then set affinity for Stockfish to the remaining HT. That way they won't compete for cores. They might still compete for memory bandwidth, though; not sure how important that is for Leela.

I don't see why Leela couldn't run its two threads on the same physical core. That could of course be a disadvantage compared to running 2 threads on 2 physical cores, but that also holds for Stockfish' threads sharing physical cores. Yet some people claim that hyper-threading is beneficial compared to running 1 thread per physical core, and testing both under conditions with 2 active HT per core does not seem particularly unfair. It is just like all HT are somewhat slower physical cores. This doesn't involve any scheduling, so it should not be noisy.

Of course if you don't also reserve some cores for the OS, that would cause noise.
Ding ding ding a man with a brain. Happy fathers day. And there are even more effect ways to configure that also take care of the operating system.
Professing themselves to be wise, they became fools,
Take on me. foes 0

User avatar
Laskos
Posts: 9312
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Leela Chess Zero 42565 vs Stockfish 140619

Post by Laskos » Mon Jun 17, 2019 3:26 am

hgm wrote:
Sun Jun 16, 2019 9:51 pm
Pondering makes sense when playing against Leela on a many-core machine, as Leela wouldn't use many CPU threads while thinking, and all other threads would then go to waste. Likewise the GPUs would be idle during Stockfish' turn when Leela is not pondering.

If Leela need two unshared cores, you can just set an affinity for it for hyper-threads 0-3. You can then set affinity for Stockfish to the remaining HT. That way they won't compete for cores. They might still compete for memory bandwidth, though; not sure how important that is for Leela.

I don't see why Leela couldn't run its two threads on the same physical core. That could of course be a disadvantage compared to running 2 threads on 2 physical cores, but that also holds for Stockfish' threads sharing physical cores. Yet some people claim that hyper-threading is beneficial compared to running 1 thread per physical core, and testing both under conditions with 2 active HT per core does not seem particularly unfair. It is just like all HT are somewhat slower physical cores. This doesn't involve any scheduling, so it should not be noisy.

Of course if you don't also reserve some cores for the OS, that would cause noise.
This creature is running 34 threads on a 16 core machine, and it IS affecting heavily the engines, much more Leela than SF. I don't know, I have to perform plain experiment to show black on paper what is obvious:

4 cores i7 CPU, 8 logical cores.
Leela on RTX 2070 GPU using 2 threads.

1/ Leela to 1 million nodes on 2 threads (using one of the latest nets) without the interference from SF:

info depth 16 seldepth 52 time 35832 nodes 1021332 score cp 27 hashfull 481 nps 28503
info depth 16 seldepth 51 time 35866 nodes 1017115 score cp 27 hashfull 481 nps 28358
info depth 16 seldepth 52 time 36044 nodes 1027829 score cp 27 hashfull 483 nps 28515

Very high stability, within 0.5% deviation speeds.


2/ SF on 8 threads AND Leela to 1 million nodes on 2 threads:

info depth 16 seldepth 53 time 43960 nodes 1058510 score cp 27 hashfull 498 nps 24078
info depth 16 seldepth 52 time 48555 nodes 1017708 score cp 27 hashfull 480 nps 20959
info depth 16 seldepth 52 time 43277 nodes 999935 score cp 27 hashfull 474 nps 23105
info depth 16 seldepth 52 time 58460 nodes 1063476 score cp 27 hashfull 498 nps 18191

The speeds on average are some 30% lower and they are very erratic, some 20-30% deviation on average from one run to another.
The issue would be even graver with his 2080ti, as it uses a bit more of CPU resources than mine 2070. And more severe in shorter runs, as there are bursts of slowdowns. All in all, this creature's tests and posts here are plain garbage.

User avatar
hgm
Posts: 23477
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Leela Chess Zero 42565 vs Stockfish 140619

Post by hgm » Mon Jun 17, 2019 5:50 am

Well, it doesn't seem to serve any purpose to use more threads in total than the number of HT provided by the hardware. So with 16 cores, 32 HT I would put Leela on 1 core, (HT 0 and 1) stockfish on 14, (HT 2-29) and reserve 1 core (HT 30-31) as a 'noise buffer' (as it seem to me it would make very little difference whether SF runs on 14 or 15 cores, and it is better to be safe than sorrow). With only 8 cores a similar approach would just leave 2 cores / 4 HT for SF, and as adding a third core would probably make a significant difference I would be tempted to run without noise buffer. As I said, contention for mamory bandwidth could still be a problem; I have really no idea whether CPU power or memory bandwidth would be the actual bottleneck, with so many cores.

Of course there are other ways to provide noise immunity, like running by node count rather than by wall-clock time.

This gives me an interesting idea. In principle pondering is still inefficient use of the CPU/GPU time that would be idle in a non-ponder game. It would be better to only use it for thinking. This is not possible in a single game, as there is no way to avoid that you have to wait for the opponent's move. But it would be possible when you played multiple (ponder-off) games in parallel. then you could run the CPU and GPU in 'batch mode', having the engines that are on move just queue up for the resource they need (i.e. one Leela queue for the GPU + 2 or 4 HT, one Stockfish queue for 28 or 26 HT, and only let their clock count down when they are actually thinking. By running some 8 games in parallel it should be quite rare that one engine thinks so much longer than average that the other queue is completely serviced, and its opponent would have to wait. The more games you run, the smaller that probability gets, and there really is no reason not to run all games of the match in parallel. (Well, perhaps hash memory.)

User avatar
Nordlandia
Posts: 2374
Joined: Fri Sep 25, 2015 7:38 pm
Location: Sortland, Norway

Re: Leela Chess Zero 42565 vs Stockfish 140619

Post by Nordlandia » Mon Jun 17, 2019 6:39 am

Graham Banks mentioned that it's good idea to place EGTB on two different drives to avoid throttling in case of ponder matches. It's better than nothing.

In my case i have i7-5960X 4.5GHz 8-core (HT off) and GTX 1070 Ti. One guy told me leela need only 1 thread for my card.

I allocated SF 6-core and Lc0 1-core and 1-core for "noise buffer" and 5-men syzygy on seperate SSDs.

Time control: 30m+30s + ponder.




mwyoung
Posts: 1600
Joined: Wed May 12, 2010 8:00 pm

Re: Leela Chess Zero 42565 vs Stockfish 140619

Post by mwyoung » Mon Jun 17, 2019 12:29 pm

hgm wrote:
Mon Jun 17, 2019 5:50 am
Well, it doesn't seem to serve any purpose to use more threads in total than the number of HT provided by the hardware. So with 16 cores, 32 HT I would put Leela on 1 core, (HT 0 and 1) stockfish on 14, (HT 2-29) and reserve 1 core (HT 30-31) as a 'noise buffer' (as it seem to me it would make very little difference whether SF runs on 14 or 15 cores, and it is better to be safe than sorrow). With only 8 cores a similar approach would just leave 2 cores / 4 HT for SF, and as adding a third core would probably make a significant difference I would be tempted to run without noise buffer. As I said, contention for mamory bandwidth could still be a problem; I have really no idea whether CPU power or memory bandwidth would be the actual bottleneck, with so many cores.

Of course there are other ways to provide noise immunity, like running by node count rather than by wall-clock time.

This gives me an interesting idea. In principle pondering is still inefficient use of the CPU/GPU time that would be idle in a non-ponder game. It would be better to only use it for thinking. This is not possible in a single game, as there is no way to avoid that you have to wait for the opponent's move. But it would be possible when you played multiple (ponder-off) games in parallel. then you could run the CPU and GPU in 'batch mode', having the engines that are on move just queue up for the resource they need (i.e. one Leela queue for the GPU + 2 or 4 HT, one Stockfish queue for 28 or 26 HT, and only let their clock count down when they are actually thinking. By running some 8 games in parallel it should be quite rare that one engine thinks so much longer than average that the other queue is completely serviced, and its opponent would have to wait. The more games you run, the smaller that probability gets, and there really is no reason not to run all games of the match in parallel. (Well, perhaps hash memory.)
There is a purpose to using a 32+2 thread configuration. Remember threads are not the same as utilization. And lc0 is not a ab engine. And there is more then one way to manage threads. And it drives the trolls crazy.
Professing themselves to be wise, they became fools,
Take on me. foes 0

User avatar
Nordlandia
Posts: 2374
Joined: Fri Sep 25, 2015 7:38 pm
Location: Sortland, Norway

Re: Leela Chess Zero 42565 vs Stockfish 140619

Post by Nordlandia » Mon Jun 17, 2019 1:46 pm

Please explain why you choose to overload your system with 32+2 threads?

mwyoung
Posts: 1600
Joined: Wed May 12, 2010 8:00 pm

Re: Leela Chess Zero 42565 vs Stockfish 140619

Post by mwyoung » Mon Jun 17, 2019 1:52 pm

Nordlandia wrote:
Mon Jun 17, 2019 1:46 pm
Please explain why you choose to overload your system with 32+2 threads?
First I am not overloading my system. As you can see from the live stream. Let see if you can figure this out on your own.

I will give you a hint. How many threads is your compuer running right now.
Professing themselves to be wise, they became fools,
Take on me. foes 0

User avatar
Nordlandia
Posts: 2374
Joined: Fri Sep 25, 2015 7:38 pm
Location: Sortland, Norway

Re: Leela Chess Zero 42565 vs Stockfish 140619

Post by Nordlandia » Mon Jun 17, 2019 2:16 pm

mwyoung wrote:
Mon Jun 17, 2019 1:52 pm
Nordlandia wrote:
Mon Jun 17, 2019 1:46 pm
Please explain why you choose to overload your system with 32+2 threads?
First I am not overloading my system. As you can see from the live stream. Let see if you can figure this out on your own.

I will give you a hint. How many threads is your compuer running right now.
8-core at 4.5GHz.

Post Reply