Page 1 of 2

TCEC Lc0 Configuration

Posted: Fri May 17, 2019 8:52 pm
by mwyoung
TCEC GPU Server
GPUs: 1 x 2080 ti + 1 x 2080
CPU: Quad Core i5 2600k
RAM: 16GB DDR3-2133
SSD:Samsung 840 Pro 256gb

I am watching the TCEC games with my Lc0. With the TCEC Lc0 running with an extra RTX 2080 over my setup. It is strange that my system is searching more plies always, and in some cases even more NPS.

I am running default setting except for using the fp-16 backend, and using NN = 9900000

Does anyone know what settings TCEC is using.

Either NN has a huge impact on the search depth, or using a extra GPU is not worth much, or TCEC is running a poor configuration for Lc0?

Re: TCEC Lc0 Configuration

Posted: Fri May 17, 2019 9:13 pm
by arunsoorya1309
{
"name": "MoveOverheadMs",
"value": "2000"
},
{
"name": "WeightsFile",
"value": "xxxx/T40.T8.610.pb.gz"
},
{
"name": "Threads",
"value": "3"
},
{
"name": "NNCacheSize",
"value": "20000000"
},
{
"name": "MinibatchSize",
"value": "256"
},
{
"name": "MaxCollisionEvents",
"value": "32"
},
{
"name": "CPuct",
"value": "3.4"
},
{
"name": "CPuctBase",
"value": "10000"
},
{
"name": "MaxPrefetch",
"value": "32"
},
{
"name": "RamLimitMb",
"value": "22528"
},
{
"name": "LogFile",
"value": "Sufi.txt"
},
{
"name": "SyzygyPath",
"value": "xxx"
},
{
"name": "Ponder",
"value": "false"
},
{
"name": "CommandLineOptions",
"value": "--backend=roundrobin --backend-opts=\"(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)\""
}

Re: TCEC Lc0 Configuration

Posted: Fri May 17, 2019 9:33 pm
by mwyoung
arunsoorya1309 wrote: Fri May 17, 2019 9:13 pm {
"name": "MoveOverheadMs",
"value": "2000"
},
{
"name": "WeightsFile",
"value": "xxxx/T40.T8.610.pb.gz"
},
{
"name": "Threads",
"value": "3"
},
{
"name": "NNCacheSize",
"value": "20000000"
},
{
"name": "MinibatchSize",
"value": "256"
},
{
"name": "MaxCollisionEvents",
"value": "32"
},
{
"name": "CPuct",
"value": "3.4"
},
{
"name": "CPuctBase",
"value": "10000"
},
{
"name": "MaxPrefetch",
"value": "32"
},
{
"name": "RamLimitMb",
"value": "22528"
},
{
"name": "LogFile",
"value": "Sufi.txt"
},
{
"name": "SyzygyPath",
"value": "xxx"
},
{
"name": "Ponder",
"value": "false"
},
{
"name": "CommandLineOptions",
"value": "--backend=roundrobin --backend-opts=\"(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)\""
}
Thanks,

This all looks ok, except for using 3 threads, I can only conclude that the issues has to be running 2 cards. And not getting anything out of the extra card.

Re: TCEC Lc0 Configuration

Posted: Fri May 17, 2019 10:56 pm
by Hai
3 threads for two rtx gpus?
Shouldn't it be 3 threads per rtx gpu?


I think TCEC should use 4 rtx 2080 ti gpus and not only two, to be equal and fair play compared to Stockfishs maximum high end cores.
With this configuration it would show us the true difference between LC0 and Stockfish!
Actually LC0 is limited by 50 to 100 elo less strength.

Re: TCEC Lc0 Configuration

Posted: Fri May 17, 2019 11:28 pm
by mwyoung
Hai wrote: Fri May 17, 2019 10:56 pm 3 threads for two rtx gpus?
Shouldn't it be 3 threads per rtx gpu?


I think TCEC should use 4 rtx 2080 ti gpus and not only two, to be equal and fair play compared to Stockfishs maximum high end cores.
With this configuration it would show us the true difference between LC0 and Stockfish!
Actually LC0 is limited by 50 to 100 elo less strength.
That would be really bad if they are using 4 cards. Because like I said my one RTX 2080ti is out searching the TCEC setup. And in some positions my NPS are even higher.

Something is not right.

3 threads seems wrong if running 2 or 4 cards. Unless that is for each card.

Re: TCEC Lc0 Configuration

Posted: Sat May 18, 2019 12:13 am
by mwyoung
mwyoung wrote: Fri May 17, 2019 11:28 pm
Hai wrote: Fri May 17, 2019 10:56 pm 3 threads for two rtx gpus?
Shouldn't it be 3 threads per rtx gpu?


I think TCEC should use 4 rtx 2080 ti gpus and not only two, to be equal and fair play compared to Stockfishs maximum high end cores.
With this configuration it would show us the true difference between LC0 and Stockfish!
Actually LC0 is limited by 50 to 100 elo less strength.
That would be really bad if they are using 4 cards. Because like I said my one RTX 2080ti is out searching the TCEC setup. And in some positions my NPS are even higher.

Something is not right.

3 threads seems wrong if running 2 or 4 cards. Unless that is for each card.
Here is what I am seeing. One 2080 ti vs the TCEC setup running Lc0.

TCEC.jpg
one RTX 2080 ti.jpg

Re: TCEC Lc0 Configuration

Posted: Sat May 18, 2019 12:14 am
by mwyoung
mwyoung wrote: Sat May 18, 2019 12:13 am
mwyoung wrote: Fri May 17, 2019 11:28 pm
Hai wrote: Fri May 17, 2019 10:56 pm 3 threads for two rtx gpus?
Shouldn't it be 3 threads per rtx gpu?


I think TCEC should use 4 rtx 2080 ti gpus and not only two, to be equal and fair play compared to Stockfishs maximum high end cores.
With this configuration it would show us the true difference between LC0 and Stockfish!
Actually LC0 is limited by 50 to 100 elo less strength.
That would be really bad if they are using 4 cards. Because like I said my one RTX 2080ti is out searching the TCEC setup. And in some positions my NPS are even higher.

Something is not right.

3 threads seems wrong if running 2 or 4 cards. Unless that is for each card.
Here is what I am seeing. One 2080 ti vs the TCEC setup running Lc0.


TCEC.jpg

one RTX 2080 ti.jpg
one RTX 2080 ti.jpg

Re: TCEC Lc0 Configuration

Posted: Sat May 18, 2019 2:43 am
by corres
If Backend = RoundRobin it is enough 3 threads for two cards.
As "TCEC.jpg" shows the value of position is 2.83 pawns for White.
But as "one RTX 2080 ti.jpg" shows your machine displays only 1.00 pawn advantage for White.
I think you do not use T40.T8.610 net file as TCEC does it.
Or maybe your net file is degraded and this is the cause of the difference in depth and in centipawn.

Re: TCEC Lc0 Configuration

Posted: Sat May 18, 2019 5:04 am
by mwyoung
corres wrote: Sat May 18, 2019 2:43 am If Backend = RoundRobin it is enough 3 threads for two cards.
As "TCEC.jpg" shows the value of position is 2.83 pawns for White.
But as "one RTX 2080 ti.jpg" shows your machine displays only 1.00 pawn advantage for White.
I think you do not use T40.T8.610 net file as TCEC does it.
Or maybe your net file is degraded and this is the cause of the difference in depth and in centipawn.


As you can see from the picture. I am using v0.21.2-rc1. It changed the formula on the eval only. Not the search....
And I out searched the TCEC Lc0 by 3 ply in the same amount of time. So the eval will not be the same.

The eval is not the problem. It is having one card out searching 2 cards running the same NN. That seems to be a problem.

v0.21.2-rc1

Make --sticky-endgames on by default (still off in training) (#844)
update download links in README (#842)
Recalibrate centipawn formula (#841)
Also make parents Terminal if any move is a win or all moves are loss or draw. (#822)
Use parent Q as a default score instead of 0 for unvisited pv. (#828)
Add stop command to selfplay interactive mode to allow for graceful exit. (#810)
Increased hard limit on batch size in opencl backend to 32 (#807)

Re: TCEC Lc0 Configuration

Posted: Sat May 18, 2019 9:42 am
by Guenther
mwyoung wrote: Sat May 18, 2019 12:14 am Here is what I am seeing. One 2080 ti vs the TCEC setup running Lc0.
Image

It seems there is no easy comparison possible, while it is true your image shows a higher depth, you neglect though
that the TCEC image shows for the same move 65 kN/s speed (in the end) and more than twice (24.2M) as much searched nodes!
You really need to try to run the same LC0 version (and also by stepping through the whole game - anyhow it will also include
randomness due to multiprocessing)

Image