Is the 320x24b larger net the strongest around for RTX GPU?

corres · Post by **corres** » Tue Jul 23, 2019 4:47 pm

crem wrote: ↑Tue Jul 23, 2019 4:25 pm
corres wrote: ↑Tue Jul 23, 2019 4:17 pm
Hai wrote: ↑Tue Jul 23, 2019 1:27 pm ...
I need, when using 2xRTX 2080 Ti GPUs, 64 GB RAM for 1 hour.
=640 GB for 10 hours
=1280 GB for 20 hours.
=1536 GB for 24 hours.
=Threadripper with 2 TB RAM makes sense .
This is the sad reality...
Those numbers are about right, but note it's memory consumption per a single search session (aka move). When new move starts, most of the memory is freed up for the next search.
So if engine spends 1 hour on a single move, it indeed can require 64 GB of RAM.
But it's not true for 1 hour game. 1 hour game is usually 1 minute per move or so, memory usage will be about 1GB even with 2x2080ti GPUs.

Please, explain me why Leela needs more and more memory when I run a game.
Moreover during the next games memory usage of Leela enhances further until the full virtual memory exhausted.

Laskos · Post by **Laskos** » Tue Jul 23, 2019 5:00 pm

Hai wrote: ↑Tue Jul 23, 2019 1:21 pm I can do the tests but I'm doing it with CB, so I need .cbh and not epd or pgn.
And a link where I can download opening200revised and arasan21beta. Others are welcome too.

I am not experienced with CB database formats, but maybe I got right the CBH for human unabalanced 3-mover openings. I attached a ZIP file containing them in CBH and EPD formats, the Openings200revised.epd positional test-suite, and WAC145.epd and Arasan21beta.epd tactical test-suites.

DB.zip

Hai · Post by **Hai** » Tue Jul 23, 2019 5:01 pm

Laskos wrote: ↑Tue Jul 23, 2019 11:05 am Jhorthos is building and training larger and smaller than the default 256x20b T40 nets, one can download them here:

https://github.com/jhorthos/lczero-trai ... a-Training

On several test suites, which combined, were pretty faithful to longer time/position in assessing strength of different Lc0 nets, the last 320x24b net from that place (320x24.J13-swa-410000) comes out as the best net to longer than Blitz TC. Maybe someone can devote some time playing say 50 games of that net against some of the latest T40 nets at say 30 minutes + 15 seconds TC on an RTX card or several of them. Openings would better be a bit unbalanced, to avoid 90%+ draw rates.

Test results on the suites to longer time per position (30 seconds) of that larger net are pretty amazing, and better than anything I saw with Leela (T30 and T40).

4 i7 fast cores
RTX 2070 GPU
Leela on 2 threads, cache = 1,000,000

My own positional test-suite "Openings200revised":

1s / position
Lc0 42810 : 152/200
Lc0 320x24b-410: 147/200

30s / position
Lc0 42810 : 157/200
Lc0 320x24b-410: 163/200

At larger time per position, the big-net surpasses T40 nets on this positional test suite, getting a record beating result for this suite at 30s/position

Tactical "Arasan21beta":

1s / position
Lc0 42810 : 100/199
Lc0 320x24b-410: 93/199

30s / position
Lc0 42810 : 129/199
Lc0 320x24b-410: 133/199

To my surprise, big-net surpasses on this tactical suite the T40 nets to 30s/position.

Also, JH 320x24b big-nets are progressing steadily, for example, the latest "410" big-net performs significantly better on both these test suites than the "370" big-net from the same site. Useless to say, the big-net seems to scale significantly better to longer TC than T40 nets. The big-net might also well be the best in analyzing positions.

Yes, it's a bit naive to rely on test suites to see the strength-wise behavior to longer TC, but my experience with Lc0 shows that a combination of a positional + a tactical suite was pretty faithful indicator of the strength to inaccesible time controls. Again, maybe some would be curious to play actual games at these longer TC.

2x RTX 2080 Ti and 4 CPU cores
Multiplexing
No TB

ERET-TEST-SUITE:

1 second and 1000 mb:
24x320
0 / 111 = 0.0%
42810
0 / 111 = 0.0%
What's wrong?

1 second and 2000 mb:
24x320
62 / 111 = 55.8%
42810
64 / 111 = 57.6%

1 second and 3000 mb:
24x320
62 / 111 = 55.8%
42810
64 / 111 = 57.6%

1 second and 4000 mb:
24x320
62 / 111 = 55.8%
42810
62 / 111 = 55.8%

1 second and 5000 mb:
24x320
63 / 111 = 56.7%
42810
63 / 111 = 56.7%

1 second and 10000 mb:
24x320
65 / 111 = 58.5%
42810
65 / 111 = 58.5%
Both have the same + highest points

1 second and 15000 mb:
24x320
61 / 111 = 54.9%
42810
65 / 111 = 58.5%

1 second and 20000 mb:
24x320
64 / 111 = 57.6%
42810
65 / 111 = 58.5%

1 second and 30000 mb:
24x320
63 / 111 = 56.7%
42810
63 / 111 = 56.7%

1 second and 40000 mb:
24x320
62 / 111 = 55.8%
42810
67 /111 = 60.3%

1 second and 50000 mb:
24x320
62 / 111 = 55.8%
42810
64 / 111 = 57.6%

crem · Post by **crem** » Tue Jul 23, 2019 6:00 pm

corres wrote: ↑Tue Jul 23, 2019 4:47 pm
Please, explain me why Leela needs more and more memory when I run a game.
Moreover during the next games memory usage of Leela enhances further until the full virtual memory exhausted.

We tried several times to reproduce memory leak claims, but were never able to.
If it Lc0 indeed grows memory consumption unbounded when you start a new game regularly, it's a bug. Could you run 10-20 games of similar time control within one session and check memory usage after every game to show that it really grows?

corres · Post by **corres** » Tue Jul 23, 2019 6:21 pm

crem wrote: ↑Tue Jul 23, 2019 6:00 pm
corres wrote: ↑Tue Jul 23, 2019 4:47 pm
Please, explain me why Leela needs more and more memory when I run a game.
Moreover during the next games memory usage of Leela enhances further until the full virtual memory exhausted.
We tried several times to reproduce memory leak claims, but were never able to.
If it Lc0 indeed grows memory consumption unbounded when you start a new game regularly, it's a bug. Could you run 10-20 games of similar time control within one session and check memory usage after every game to show that it really grows?

I think it is not a memory leak but an inherent behavior of Leela.
Naturally I checked the memory usage of Leela during games and the enhancement is obvious.
I think sometimes you, dear developers, also would run some custom tests with your binary to see the issues.
Especially if it was stated on the Forum of lczero.org many times...

mwyoung · Post by **mwyoung** » Tue Jul 23, 2019 7:11 pm

corres wrote: ↑Tue Jul 23, 2019 4:17 pm
Hai wrote: ↑Tue Jul 23, 2019 1:27 pm ...
I need, when using 2xRTX 2080 Ti GPUs, 64 GB RAM for 1 hour.
=640 GB for 10 hours
=1280 GB for 20 hours.
=1536 GB for 24 hours.
=Threadripper with 2 TB RAM makes sense .
This is the sad reality...

I have been doing a run now for 5 1/2 hours on the same position. What is suppose to happen. I still have RAM, and Lc0 has not slowed down.

New game Line
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Analysis by Lc0 v0.21.2:

1.e4 c6 2.d3 d5 3.Nf3 dxe4 4.dxe4 Qxd1+ 5.Kxd1 Nf6 6.Nfd2 Ng4 7.Ke1 e5 8.Nc3 f6 9.a4 Nh6 10.a5 Nf7 11.h4 h5 12.f3 g6 13.g3 Be6 14.Bc4 Nd8 15.Nd1 Na6 16.c3 Bh6 17.Bxe6 Nxe6 18.Ke2 Nac5 19.b4 Bxd2 20.Bxd2 Nb3 21.Ra2 Nxd2 22.Rxd2 Ke7 23.Nb2 Rad8 24.Nd3 Rhg8 25.Ke3 Rd7 26.f4 exf4+ 27.Nxf4 Rxd2 28.Kxd2 Nd8 29.a6 b6 30.b5 cxb5 31.Nd5+ Ke6 32.Nc7+ Ke5 33.Ke3 Nc6 34.Rd1 Rd8 35.Rxd8 Nxd8
White has an edge: = (0.29) Depth: 41/105 05:28:15 616MN, tb=6098
(, 23.07.2019)

zullil · Post by **zullil** » Tue Jul 23, 2019 7:50 pm

mwyoung wrote: ↑Tue Jul 23, 2019 7:11 pm

I have been doing a run now for 5 1/2 hours on the same position. What is suppose to happen. I still have RAM, and Lc0 has not slowed down.

New game Line
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Analysis by Lc0 v0.21.2:

1.e4 c6 2.d3 d5 3.Nf3 dxe4 4.dxe4 Qxd1+ 5.Kxd1 Nf6 6.Nfd2 Ng4 7.Ke1 e5 8.Nc3 f6 9.a4 Nh6 10.a5 Nf7 11.h4 h5 12.f3 g6 13.g3 Be6 14.Bc4 Nd8 15.Nd1 Na6 16.c3 Bh6 17.Bxe6 Nxe6 18.Ke2 Nac5 19.b4 Bxd2 20.Bxd2 Nb3 21.Ra2 Nxd2 22.Rxd2 Ke7 23.Nb2 Rad8 24.Nd3 Rhg8 25.Ke3 Rd7 26.f4 exf4+ 27.Nxf4 Rxd2 28.Kxd2 Nd8 29.a6 b6 30.b5 cxb5 31.Nd5+ Ke6 32.Nc7+ Ke5 33.Ke3 Nc6 34.Rd1 Rd8 35.Rxd8 Nxd8
White has an edge: = (0.29) Depth: 41/105 05:28:15 616MN, tb=6098
(, 23.07.2019)

4M nodes requires 1GB RAM. So your search of 616M nodes has used 154 GB RAM. So either your machine has a very large amount of memory, or you're paging to storage. Or I'm totally confused.

mwyoung · Post by **mwyoung** » Wed Jul 24, 2019 12:48 am

zullil wrote: ↑Tue Jul 23, 2019 7:50 pm
mwyoung wrote: ↑Tue Jul 23, 2019 7:11 pm

I have been doing a run now for 5 1/2 hours on the same position. What is suppose to happen. I still have RAM, and Lc0 has not slowed down.

New game Line
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Analysis by Lc0 v0.21.2:

1.e4 c6 2.d3 d5 3.Nf3 dxe4 4.dxe4 Qxd1+ 5.Kxd1 Nf6 6.Nfd2 Ng4 7.Ke1 e5 8.Nc3 f6 9.a4 Nh6 10.a5 Nf7 11.h4 h5 12.f3 g6 13.g3 Be6 14.Bc4 Nd8 15.Nd1 Na6 16.c3 Bh6 17.Bxe6 Nxe6 18.Ke2 Nac5 19.b4 Bxd2 20.Bxd2 Nb3 21.Ra2 Nxd2 22.Rxd2 Ke7 23.Nb2 Rad8 24.Nd3 Rhg8 25.Ke3 Rd7 26.f4 exf4+ 27.Nxf4 Rxd2 28.Kxd2 Nd8 29.a6 b6 30.b5 cxb5 31.Nd5+ Ke6 32.Nc7+ Ke5 33.Ke3 Nc6 34.Rd1 Rd8 35.Rxd8 Nxd8
White has an edge: = (0.29) Depth: 41/105 05:28:15 616MN, tb=6098
(, 23.07.2019)
4M nodes requires 1GB RAM. So your search of 616M nodes has used 154 GB RAM. So either your machine has a very large amount of memory, or you're paging to storage. Or I'm totally confused.

I can not find any kind of memory leak on my system. Here is a screen shot of Lc0 playing Stockfish. Each program was given 16 GB of ram. I see no unbound memory usage in game play at all. And everything resets normally after each game. I am confused as to what others are seeing.

Memory Leak test.jpg

zullil · Post by **zullil** » Wed Jul 24, 2019 1:43 am

mwyoung wrote: ↑Wed Jul 24, 2019 12:48 am

I can not find any kind of memory leak on my system. Here is a screen shot of Lc0 playing Stockfish. Each program was given 16 GB of ram. I see no unbound memory usage in game play at all. And everything resets normally after each game. I am confused as to what others are seeing.

What does it mean to give Lc0 16 GB of RAM? How did you do this?

Also, I think the question is what happens if you search a single position for a long time, like you did earlier. Based on the number of nodes searched, that 5.5 hour search should have required 150 GB of memory. At least, according to the post I linked to. So why didn't you run into issues? I'm puzzled.

mwyoung · Post by **mwyoung** » Wed Jul 24, 2019 3:23 am

zullil wrote: ↑Wed Jul 24, 2019 1:43 am
mwyoung wrote: ↑Wed Jul 24, 2019 12:48 am

I can not find any kind of memory leak on my system. Here is a screen shot of Lc0 playing Stockfish. Each program was given 16 GB of ram. I see no unbound memory usage in game play at all. And everything resets normally after each game. I am confused as to what others are seeing.

What does it mean to give Lc0 16 GB of RAM? How did you do this?

Also, I think the question is what happens if you search a single position for a long time, like you did earlier. Based on the number of nodes searched, that 5.5 hour search should have required 150 GB of memory. At least, according to the post I linked to. So why didn't you run into issues? I'm puzzled.

I don't know either. Other then I never had a issue come up when running Lc0. Below is how I give Lc0 16 GB of ram. Or any amount I wish to use for hash.

Look here.jpg

Is the 320x24b larger net the strongest around for RTX GPU?

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Re: Is the 320x24b larger net the strongest around for RTX GPU?

Re: Is the 320x24b larger net the strongest around for RTX GPU?