Wouldn't it be nice if C++ GPU

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: Wouldn't it be nice if C++ GPU

Post by Rein Halbersma »

Rémi Coulom wrote: Thu Apr 25, 2019 1:59 pm I developed my own home-made C++ deep-learning framework just to be able to do that. I used tensorflow for a while, but it was too painful to use from C++. What you describe can be done with tensorflow, but last time I tried, I had to use undocumented/unsupported features of the low-level C++ tensorflow library, and it was really unpleasant (having to compile the library from source with bazel, ...).

Maybe other frameworks have better C++ support.

At the moment, I am using some simple C++ classes on top of CuDNN. I don't have autodiff, but manually calculating a gradient is not such a big deal in my opinion. I am thinking about compile-time autodiff with template meta-programming.

If people here want help about how to use tensorflow from C++ with a nVidia GPU, I could explain a little how I did it. It is not very difficult to do, but it is not documented.

I applied to the Tensorflow Research Cloud, and was accepted. This gives me access to 100 TPUs for one month. After days of trying to use a TPU from C++, I gave up trying. I am sure it is doable, but there is no documentation at all.
LeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: Wouldn't it be nice if C++ GPU

Post by Rémi Coulom »

Rein Halbersma wrote: Thu Apr 25, 2019 6:58 pmLeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
Thanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.

By the way, has anybody here tried to code a fast convolution in cuda directly? I will probably try soon. My impression is that the performance of cuDNN is very bad for small batches. Good performance with small batches is important for tree search.

Rémi
smatovic
Posts: 2662
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Wouldn't it be nice if C++ GPU

Post by smatovic »

Rémi Coulom wrote: Thu Apr 25, 2019 7:18 pm ...
By the way, has anybody here tried to code a fast convolution in cuda directly? I will probably try soon. My impression is that the performance of cuDNN is very bad for small batches. Good performance with small batches is important for tree search.

Rémi
https://github.com/ankan-ban/ConvTest

--
Srdja
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: Wouldn't it be nice if C++ GPU

Post by Rein Halbersma »

Rémi Coulom wrote: Thu Apr 25, 2019 7:18 pm
Rein Halbersma wrote: Thu Apr 25, 2019 6:58 pmLeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
Thanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.
That's not what tensorflow_cc advertises: https://github.com/FloopCZ/tensorflow_cc
This repository makes possible the usage of the TensorFlow C++ API from the outside of the TensorFlow source code folders and without the use of the Bazel build system.
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: Wouldn't it be nice if C++ GPU

Post by Rémi Coulom »

Rein Halbersma wrote: Thu Apr 25, 2019 8:23 pm
Rémi Coulom wrote: Thu Apr 25, 2019 7:18 pm
Rein Halbersma wrote: Thu Apr 25, 2019 6:58 pmLeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
Thanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.
That's not what tensorflow_cc advertises: https://github.com/FloopCZ/tensorflow_cc
This repository makes possible the usage of the TensorFlow C++ API from the outside of the TensorFlow source code folders and without the use of the Bazel build system.
This means that you don't need bazel to build your own code, but you need it to build the tensorflow library:
If you require GPU support on Ubuntu, please also install Bazel
(from https://github.com/FloopCZ/tensorflow_cc)
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: Wouldn't it be nice if C++ GPU

Post by Rémi Coulom »

smatovic wrote: Thu Apr 25, 2019 7:38 pm https://github.com/ankan-ban/ConvTest
Very interesting, thanks. I will try to make a tensor-core version.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: Wouldn't it be nice if C++ GPU

Post by Rein Halbersma »

Rémi Coulom wrote: Thu Apr 25, 2019 8:48 pm
If you require GPU support on Ubuntu, please also install Bazel
(from https://github.com/FloopCZ/tensorflow_cc)
Thanks for correcting me! But at least it's a one time use and you don't need to integrate Bazel into your own project build.

I've also just found another package that is installable on Debian without having to build with Bazel: https://github.com/kecsap/tensorflow_cpp_packaging Not sure how mature it is though.
chrisw
Posts: 4319
Joined: Tue Apr 03, 2012 4:28 pm

Re: Wouldn't it be nice if C++ GPU

Post by chrisw »

Rémi Coulom wrote: Thu Apr 25, 2019 8:51 pm
smatovic wrote: Thu Apr 25, 2019 7:38 pm https://github.com/ankan-ban/ConvTest
Very interesting, thanks. I will try to make a tensor-core version.
Some integratable c++ source plus necessary headers that works with both windows and Linux/Ubuntu would be a great resource. Should also expand the variance of engines when developers aren’t locked in to particular inputs and can use preprocessed input. for the moment cuda nvidia supporting.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Wouldn't it be nice if C++ GPU

Post by Daniel Shawul »

Rein Halbersma wrote: Thu Apr 25, 2019 9:18 pm
Rémi Coulom wrote: Thu Apr 25, 2019 8:48 pm
If you require GPU support on Ubuntu, please also install Bazel
(from https://github.com/FloopCZ/tensorflow_cc)
Thanks for correcting me! But at least it's a one time use and you don't need to integrate Bazel into your own project build.

I've also just found another package that is installable on Debian without having to build with Bazel: https://github.com/kecsap/tensorflow_cpp_packaging Not sure how mature it is though.
I have the option of building with either bazel or tensorflow_cc but there are some serious issues with the latter

a) tensorflow_cc is available only linux
b) multi-GPU problems with libtensorflow_cc. I reported the issue here https://github.com/FloopCZ/tensorflow_cc/issues/136
but there are still no solutions for it. Building with bazel will not have this problem. This was the deal breaker for me.
c) one more dependency libtensorflow_cc.so and maybe more

Bazel cons

Windows bazel build is kind of broken when compiling with GPU. There is a known issue which i can't find at the moment.
CPU build is OK though, so I use that and get one binary egbbdll.so without depedencies.
For GPU builds I compile directly against TensorRT. Note that you can configure tensorflow with TensorRT, MKL, experimental OpenCL etc so
theoretically you don't have to use anything other than tensorflow. Btw tensorflow flow do have TPU support which i have never explored. But building directly with TensoRT is so much easier (just another library) without going through the pain of compiling tensorflow (either via bazel or tensorflow_cc -- both equally painful). TensorRT gives atleast 2x speedup compared to Tensorflow compiled without TensorRT support. I am curious to know if Tensorlow with TensoRT support can perform equally well ..

@Remi Why do you to write even a single cuda kernel ? cuDNN has lots of convolution kernels to choose from anyway and the performance
the TensorRT performance is as good as hand-written cuda kernels of Ankan as I have detailed here
viewtopic.php?f=2&t=69885&hilit=Scorpio+Leela&start=10
And then when you factor in supporting tensor cores, fp16 and maybe int8/int4 with Turing, the old fp16 units in 1070 etc, one is inclined to conclude
this is better left for a library. I don't even build the graph explicitly like lc0 does because I don't intend to do any manual optimization of the graph
by writing convolution kernel etc.
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: Wouldn't it be nice if C++ GPU

Post by Rémi Coulom »

Daniel Shawul wrote: Fri Apr 26, 2019 2:50 am @Remi Why do you to write even a single cuda kernel ? cuDNN has lots of convolution kernels to choose from anyway and the performance
the TensorRT performance is as good as hand-written cuda kernels of Ankan as I have detailed here
viewtopic.php?f=2&t=69885&hilit=Scorpio+Leela&start=10
And then when you factor in supporting tensor cores, fp16 and maybe int8/int4 with Turing, the old fp16 units in 1070 etc, one is inclined to conclude
this is better left for a library. I don't even build the graph explicitly like lc0 does because I don't intend to do any manual optimization of the graph
by writing convolution kernel etc.
Writing my own cuda is certainly not a reasonable choice, but it is fun to try, and I still believe it is possible to outperform cuDNN for small batches.