Is the 320x24b larger net the strongest around for RTX GPU?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Post by zullil »

mwyoung wrote: Wed Jul 24, 2019 3:23 am
zullil wrote: Wed Jul 24, 2019 1:43 am
mwyoung wrote: Wed Jul 24, 2019 12:48 am

I can not find any kind of memory leak on my system. Here is a screen shot of Lc0 playing Stockfish. Each program was given 16 GB of ram. I see no unbound memory usage in game play at all. And everything resets normally after each game. I am confused as to what others are seeing.
What does it mean to give Lc0 16 GB of RAM? How did you do this?

Also, I think the question is what happens if you search a single position for a long time, like you did earlier. Based on the number of nodes searched, that 5.5 hour search should have required 150 GB of memory. At least, according to the post I linked to. So why didn't you run into issues? I'm puzzled.

I don't know either. Other then I never had a issue come up when running Lc0. Below is how I give Lc0 16 GB of ram. Or any amount I wish to use for hash.

Look here.jpg
But Lc0 doesn't seem to support the Hash option at all. So I'm wondering what the setting does.

Code: Select all

uci
id name Lc0 v0.21.2
id author The LCZero Authors.
option name WeightsFile type string default <autodiscover>
option name Backend type combo default blas var blas var check var random var roundrobin var multiplexing var demux
option name BackendOptions type string default 
option name Threads type spin default 2 min 1 max 128
option name NNCacheSize type spin default 200000 min 0 max 999999999
option name MinibatchSize type spin default 256 min 1 max 1024
option name MaxPrefetch type spin default 32 min 0 max 1024
option name CPuct type string default 3.000000
option name CPuctBase type string default 19652.000000
option name CPuctFactor type string default 2.000000
option name Temperature type string default 0.000000
option name TempDecayMoves type spin default 0 min 0 max 100
option name TempCutoffMove type spin default 0 min 0 max 1000
option name TempEndgame type string default 0.000000
option name TempValueCutoff type string default 100.000000
option name TempVisitOffset type string default 0.000000
option name DirichletNoise type check default false
option name VerboseMoveStats type check default false
option name SmartPruningFactor type string default 1.330000
option name FpuStrategy type combo default reduction var reduction var absolute
option name FpuValue type string default 1.200000
option name FpuStrategyAtRoot type combo default same var reduction var absolute var same
option name FpuValueAtRoot type string default 1.000000
option name CacheHistoryLength type spin default 0 min 0 max 7
option name PolicyTemperature type string default 2.200000
option name MaxCollisionEvents type spin default 32 min 1 max 1024
option name MaxCollisionVisits type spin default 9999 min 1 max 1000000
option name OutOfOrderEval type check default true
option name StickyEndgames type check default true
option name SyzygyFastPlay type check default true
option name MultiPV type spin default 1 min 1 max 500
option name ScoreType type combo default centipawn var centipawn var centipawn_2018 var win_percentage var Q
option name HistoryFill type combo default fen_only var no var fen_only var always
option name KLDGainAverageInterval type spin default 100 min 1 max 10000000
option name MinimumKLDGainPerNode type string default 0.000000
option name Slowmover type string default 1.000000
option name MoveOverheadMs type spin default 200 min 0 max 100000000
option name SyzygyPath type string default 
option name Ponder type check default true
option name ImmediateTimeUse type string default 1.000000
option name RamLimitMb type spin default 0 min 0 max 100000000
option name ConfigFile type string default lc0.config
option name LogFile type string default 
uciok
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Post by mwyoung »

zullil wrote: Wed Jul 24, 2019 4:11 am
mwyoung wrote: Wed Jul 24, 2019 3:23 am
zullil wrote: Wed Jul 24, 2019 1:43 am
mwyoung wrote: Wed Jul 24, 2019 12:48 am

I can not find any kind of memory leak on my system. Here is a screen shot of Lc0 playing Stockfish. Each program was given 16 GB of ram. I see no unbound memory usage in game play at all. And everything resets normally after each game. I am confused as to what others are seeing.
What does it mean to give Lc0 16 GB of RAM? How did you do this?

Also, I think the question is what happens if you search a single position for a long time, like you did earlier. Based on the number of nodes searched, that 5.5 hour search should have required 150 GB of memory. At least, according to the post I linked to. So why didn't you run into issues? I'm puzzled.

I don't know either. Other then I never had a issue come up when running Lc0. Below is how I give Lc0 16 GB of ram. Or any amount I wish to use for hash.

Look here.jpg
But Lc0 doesn't seem to support the Hash option at all. So I'm wondering what the setting does.

Code: Select all

uci
id name Lc0 v0.21.2
id author The LCZero Authors.
option name WeightsFile type string default <autodiscover>
option name Backend type combo default blas var blas var check var random var roundrobin var multiplexing var demux
option name BackendOptions type string default 
option name Threads type spin default 2 min 1 max 128
option name NNCacheSize type spin default 200000 min 0 max 999999999
option name MinibatchSize type spin default 256 min 1 max 1024
option name MaxPrefetch type spin default 32 min 0 max 1024
option name CPuct type string default 3.000000
option name CPuctBase type string default 19652.000000
option name CPuctFactor type string default 2.000000
option name Temperature type string default 0.000000
option name TempDecayMoves type spin default 0 min 0 max 100
option name TempCutoffMove type spin default 0 min 0 max 1000
option name TempEndgame type string default 0.000000
option name TempValueCutoff type string default 100.000000
option name TempVisitOffset type string default 0.000000
option name DirichletNoise type check default false
option name VerboseMoveStats type check default false
option name SmartPruningFactor type string default 1.330000
option name FpuStrategy type combo default reduction var reduction var absolute
option name FpuValue type string default 1.200000
option name FpuStrategyAtRoot type combo default same var reduction var absolute var same
option name FpuValueAtRoot type string default 1.000000
option name CacheHistoryLength type spin default 0 min 0 max 7
option name PolicyTemperature type string default 2.200000
option name MaxCollisionEvents type spin default 32 min 1 max 1024
option name MaxCollisionVisits type spin default 9999 min 1 max 1000000
option name OutOfOrderEval type check default true
option name StickyEndgames type check default true
option name SyzygyFastPlay type check default true
option name MultiPV type spin default 1 min 1 max 500
option name ScoreType type combo default centipawn var centipawn var centipawn_2018 var win_percentage var Q
option name HistoryFill type combo default fen_only var no var fen_only var always
option name KLDGainAverageInterval type spin default 100 min 1 max 10000000
option name MinimumKLDGainPerNode type string default 0.000000
option name Slowmover type string default 1.000000
option name MoveOverheadMs type spin default 200 min 0 max 100000000
option name SyzygyPath type string default 
option name Ponder type check default true
option name ImmediateTimeUse type string default 1.000000
option name RamLimitMb type spin default 0 min 0 max 100000000
option name ConfigFile type string default lc0.config
option name LogFile type string default 
uciok
Interesting.... All I can tell you is I see the memory has been assigned to Lc0.exe in resource monitor and it is filling it up. :?:
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Post by Ferdy »

mwyoung wrote: Tue Jul 23, 2019 7:11 pm
corres wrote: Tue Jul 23, 2019 4:17 pm
Hai wrote: Tue Jul 23, 2019 1:27 pm ...
I need, when using 2xRTX 2080 Ti GPUs, 64 GB RAM for 1 hour.
=640 GB for 10 hours
=1280 GB for 20 hours.
=1536 GB for 24 hours.
=Threadripper with 2 TB RAM makes sense :mrgreen:.
This is the sad reality...
I have been doing a run now for 5 1/2 hours on the same position. What is suppose to happen. I still have RAM, and Lc0 has not slowed down.

New game Line
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Analysis by Lc0 v0.21.2:

1.e4 c6 2.d3 d5 3.Nf3 dxe4 4.dxe4 Qxd1+ 5.Kxd1 Nf6 6.Nfd2 Ng4 7.Ke1 e5 8.Nc3 f6 9.a4 Nh6 10.a5 Nf7 11.h4 h5 12.f3 g6 13.g3 Be6 14.Bc4 Nd8 15.Nd1 Na6 16.c3 Bh6 17.Bxe6 Nxe6 18.Ke2 Nac5 19.b4 Bxd2 20.Bxd2 Nb3 21.Ra2 Nxd2 22.Rxd2 Ke7 23.Nb2 Rad8 24.Nd3 Rhg8 25.Ke3 Rd7 26.f4 exf4+ 27.Nxf4 Rxd2 28.Kxd2 Nd8 29.a6 b6 30.b5 cxb5 31.Nd5+ Ke6 32.Nc7+ Ke5 33.Ke3 Nc6 34.Rd1 Rd8 35.Rxd8 Nxd8
White has an edge: = (0.29) Depth: 41/105 05:28:15 616MN, tb=6098
(, 23.07.2019)
When analyzing single position, Lc0 consumes memory as time increases. Tried to monitor Lc0 v0.21.2 memory usage, this is on blas backend on 2 threads.

Image

This is generally dangerous to a computer, to avoid system undefined behaviour when all your computer memory are consumed, try to manage your Lc0 memory thru the option RamLimitMb. Currently the default value is irresponsible and is set at 0 (without a warning message to the user) and Lc0 will consume your memory until your system dies.
crem
Posts: 177
Joined: Wed May 23, 2018 9:29 pm

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Post by crem »

660MNodes while taking 30GB (or how much it was?) of RAM does indeed look very weird, probably you have large swap file on SSD or something, and most of the search tree is there by that time.

Lc0 doesn't have HashSize option, it seems that it's something that chess GUI adds for all engines thinking that it's "standard option"? If it's not the case, I don't have an explanation how it appeared.
User avatar
Guenther
Posts: 4607
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Post by Guenther »

crem wrote: Wed Jul 24, 2019 9:36 am 660MNodes while taking 30GB (or how much it was?) of RAM does indeed look very weird, probably you have large swap file on SSD or something, and most of the search tree is there by that time.

Lc0 doesn't have HashSize option, it seems that it's something that chess GUI adds for all engines thinking that it's "standard option"? If it's not the case, I don't have an explanation how it appeared.
If I read the first screenshot given right (despite it cannot be enlarged and is much too small for clearly seeing the numbers), it seems to say
47% of around total 290.xxx GB memory used. As it is probably calculated with some virtual memory and raw numbers for MB/GB this would
fit to your estimation quite good and he probably has 256GB Ram installed.

Ofc the numbers of 16GB hash given in the second screenshot are just a plain GUI setting (at the end of the real settings),
which the CB software cannot enforce at all, but the user believes in.
https://rwbc-chess.de

trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Post by Ferdy »

Leak test on Lc0 v0.21.2 on blas, run an engine match involving Lc0 and monitor Lc0's memory usage. After a game engines are not restarted.

Game 1 peaked at around 105MB at around t 350 sec, then it drops to around 40MB to start game 2. It peaks at around 120MB and goes back down to around 40MB to start game 3. 3 games were finished and game 4 is in progress.

There is no memory leak here.

Image
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Post by zullil »

Guenther wrote: Wed Jul 24, 2019 9:47 am
crem wrote: Wed Jul 24, 2019 9:36 am 660MNodes while taking 30GB (or how much it was?) of RAM does indeed look very weird, probably you have large swap file on SSD or something, and most of the search tree is there by that time.

Lc0 doesn't have HashSize option, it seems that it's something that chess GUI adds for all engines thinking that it's "standard option"? If it's not the case, I don't have an explanation how it appeared.
If I read the first screenshot given right (despite it cannot be enlarged and is much too small for clearly seeing the numbers), it seems to say
47% of around total 290.xxx GB memory used. As it is probably calculated with some virtual memory and raw numbers for MB/GB this would
fit to your estimation quite good and he probably has 256GB Ram installed.

Ofc the numbers of 16GB hash given in the second screenshot are just a plain GUI setting (at the end of the real settings),
which the CB software cannot enforce at all, but the user believes in.
It appears to read: Memory 29.8/63.9 GB (47%). I believe his system has 64 GB RAM and that most of the long search was swapped to storage.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Post by Laskos »

crem wrote: Wed Jul 24, 2019 9:36 am 660MNodes while taking 30GB (or how much it was?) of RAM does indeed look very weird, probably you have large swap file on SSD or something, and most of the search tree is there by that time.

Lc0 doesn't have HashSize option, it seems that it's something that chess GUI adds for all engines thinking that it's "standard option"? If it's not the case, I don't have an explanation how it appeared.
Oh, great news actually. It seems it's fine to use a swap file on SSD with Lc0. I disabled my swap file a long ago, due to Syzygy6 bloating of RAM and the page file (Syzygy too has bad behavior) and slowing down the regular chess engines by factors of 10. But I reset the swap file to be 128 GB of my 256 GB SSD. Well, with Lc0, it seems a great idea. Using it, it seems to slow down Lc0 moderately, working at about 70% speeds when using the swap file on SSD, very reasonable. The GPU load with the swap file (after 100 million nodes) oscillates around 70% too.

My PC has 16 GB RAM, of which about 10-12 GB are available to Leela. I set the swap file to 128 GB on SSD. The GPU is RTX 2070.

The command line commands are here:

Code: Select all

setoption name Backend value cudnn-fp16
setoption name MinibatchSize value 400 
setoption name NNCacheSize value 10000000
setoption name WeightsFile value .\weights_run1_42810.pb.gz  

go
Not swapping, to some 25-30 million nodes, the speed was about 33000 nps. After swapping a lot, to 100 million nodes (more than an hour and about 40 GB of swap file on SSD used) the speed was:

info depth 34 seldepth 92 time 4370404 nodes 100550067 score cp 28 hashfull 1000 nps 23007 tbhits 0 pv e2e4 e7e5 g1f3 b8c6 f1b5 g8f6 e1g1

Which is reasonable. So, using a swap file, even a low RAM system can analyze with Leela for several hours at a decent speed using a large swap file on SSD.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Post by corres »

mwyoung wrote: Wed Jul 24, 2019 3:23 am
zullil wrote: Wed Jul 24, 2019 1:43 am
mwyoung wrote: Wed Jul 24, 2019 12:48 am I can not find any kind of memory leak on my system. Here is a screen shot of Lc0 playing Stockfish. Each program was given 16 GB of ram. I see no unbound memory usage in game play at all. And everything resets normally after each game. I am confused as to what others are seeing.
What does it mean to give Lc0 16 GB of RAM? How did you do this?
I don't know either. Other then I never had a issue come up when running Lc0. Below is how I give Lc0 16 GB of ram. Or any amount I wish to use for hash.
Look here.jpg
In the source of Leela ver.0.21.x there is no any parameter of "Hash size".
I can put a "Hash size" parameter on the UCI parameter list of Leela only with Photoshop.
Maybe you use a personal version of Leela?
chrisw
Posts: 4319
Joined: Tue Apr 03, 2012 4:28 pm

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Post by chrisw »

Ferdy wrote: Wed Jul 24, 2019 7:11 am
mwyoung wrote: Tue Jul 23, 2019 7:11 pm
corres wrote: Tue Jul 23, 2019 4:17 pm
Hai wrote: Tue Jul 23, 2019 1:27 pm ...
I need, when using 2xRTX 2080 Ti GPUs, 64 GB RAM for 1 hour.
=640 GB for 10 hours
=1280 GB for 20 hours.
=1536 GB for 24 hours.
=Threadripper with 2 TB RAM makes sense :mrgreen:.
This is the sad reality...
I have been doing a run now for 5 1/2 hours on the same position. What is suppose to happen. I still have RAM, and Lc0 has not slowed down.

New game Line
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Analysis by Lc0 v0.21.2:

1.e4 c6 2.d3 d5 3.Nf3 dxe4 4.dxe4 Qxd1+ 5.Kxd1 Nf6 6.Nfd2 Ng4 7.Ke1 e5 8.Nc3 f6 9.a4 Nh6 10.a5 Nf7 11.h4 h5 12.f3 g6 13.g3 Be6 14.Bc4 Nd8 15.Nd1 Na6 16.c3 Bh6 17.Bxe6 Nxe6 18.Ke2 Nac5 19.b4 Bxd2 20.Bxd2 Nb3 21.Ra2 Nxd2 22.Rxd2 Ke7 23.Nb2 Rad8 24.Nd3 Rhg8 25.Ke3 Rd7 26.f4 exf4+ 27.Nxf4 Rxd2 28.Kxd2 Nd8 29.a6 b6 30.b5 cxb5 31.Nd5+ Ke6 32.Nc7+ Ke5 33.Ke3 Nc6 34.Rd1 Rd8 35.Rxd8 Nxd8
White has an edge: = (0.29) Depth: 41/105 05:28:15 616MN, tb=6098
(, 23.07.2019)
When analyzing single position, Lc0 consumes memory as time increases. Tried to monitor Lc0 v0.21.2 memory usage, this is on blas backend on 2 threads.

Image

This is generally dangerous to a computer, to avoid system undefined behaviour when all your computer memory are consumed, try to manage your Lc0 memory thru the option RamLimitMb. Currently the default value is irresponsible and is set at 0 (without a warning message to the user) and Lc0 will consume your memory until your system dies.
If some people are getting memory leaks and some not, this might be down to the NN handling software, how it deals with situations where available GPU memory comes under pressure, either by having too little GPU RAM or by LCZero parameter settings being too GPU RAM greedy. NN handler software tries to detect impending OOMs and tries getting clever, but I guess this cleverness is not necessarily fully stable.