Wouldn't it be nice if C++ GPU

Rein Halbersma · Post by **Rein Halbersma** » Thu Apr 25, 2019 6:58 pm

Rémi Coulom wrote: ↑Thu Apr 25, 2019 1:59 pm I developed my own home-made C++ deep-learning framework just to be able to do that. I used tensorflow for a while, but it was too painful to use from C++. What you describe can be done with tensorflow, but last time I tried, I had to use undocumented/unsupported features of the low-level C++ tensorflow library, and it was really unpleasant (having to compile the library from source with bazel, ...).

Maybe other frameworks have better C++ support.

At the moment, I am using some simple C++ classes on top of CuDNN. I don't have autodiff, but manually calculating a gradient is not such a big deal in my opinion. I am thinking about compile-time autodiff with template meta-programming.

If people here want help about how to use tensorflow from C++ with a nVidia GPU, I could explain a little how I did it. It is not very difficult to do, but it is not documented.

I applied to the Tensorflow Research Cloud, and was accepted. This gives me access to 100 TPUs for one month. After days of trying to use a TPU from C++, I gave up trying. I am sure it is doable, but there is no documentation at all.

LeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md

Rémi Coulom · Post by **Rémi Coulom** » Thu Apr 25, 2019 7:18 pm

Rein Halbersma wrote: ↑Thu Apr 25, 2019 6:58 pmLeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md

Thanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.

By the way, has anybody here tried to code a fast convolution in cuda directly? I will probably try soon. My impression is that the performance of cuDNN is very bad for small batches. Good performance with small batches is important for tree search.

Rémi

smatovic · Post by **smatovic** » Thu Apr 25, 2019 7:38 pm

Rémi Coulom wrote: ↑Thu Apr 25, 2019 7:18 pm ...
By the way, has anybody here tried to code a fast convolution in cuda directly? I will probably try soon. My impression is that the performance of cuDNN is very bad for small batches. Good performance with small batches is important for tree search.

Rémi

https://github.com/ankan-ban/ConvTest

--
Srdja

Rein Halbersma · Post by **Rein Halbersma** » Thu Apr 25, 2019 8:23 pm

Rémi Coulom wrote: ↑Thu Apr 25, 2019 7:18 pm
Rein Halbersma wrote: ↑Thu Apr 25, 2019 6:58 pmLeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
Thanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.

That's not what tensorflow_cc advertises: https://github.com/FloopCZ/tensorflow_cc

This repository makes possible the usage of the TensorFlow C++ API from the outside of the TensorFlow source code folders and without the use of the Bazel build system.

Rémi Coulom · Post by **Rémi Coulom** » Thu Apr 25, 2019 8:48 pm

Rein Halbersma wrote: ↑Thu Apr 25, 2019 8:23 pm
Rémi Coulom wrote: ↑Thu Apr 25, 2019 7:18 pm
Rein Halbersma wrote: ↑Thu Apr 25, 2019 6:58 pmLeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
Thanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.

That's not what tensorflow_cc advertises: https://github.com/FloopCZ/tensorflow_cc

This repository makes possible the usage of the TensorFlow C++ API from the outside of the TensorFlow source code folders and without the use of the Bazel build system.

This means that you don't need bazel to build your own code, but you need it to build the tensorflow library:

If you require GPU support on Ubuntu, please also install Bazel

(from https://github.com/FloopCZ/tensorflow_cc)

Rémi Coulom · Post by **Rémi Coulom** » Thu Apr 25, 2019 8:51 pm

smatovic wrote: ↑Thu Apr 25, 2019 7:38 pm https://github.com/ankan-ban/ConvTest

Very interesting, thanks. I will try to make a tensor-core version.

Rein Halbersma · Post by **Rein Halbersma** » Thu Apr 25, 2019 9:18 pm

Rémi Coulom wrote: ↑Thu Apr 25, 2019 8:48 pm
If you require GPU support on Ubuntu, please also install Bazel
(from https://github.com/FloopCZ/tensorflow_cc)

Thanks for correcting me! But at least it's a one time use and you don't need to integrate Bazel into your own project build.

I've also just found another package that is installable on Debian without having to build with Bazel: https://github.com/kecsap/tensorflow_cpp_packaging Not sure how mature it is though.

chrisw · Post by **chrisw** » Thu Apr 25, 2019 9:40 pm

Rémi Coulom wrote: ↑Thu Apr 25, 2019 8:51 pm
smatovic wrote: ↑Thu Apr 25, 2019 7:38 pm https://github.com/ankan-ban/ConvTest
Very interesting, thanks. I will try to make a tensor-core version.

Some integratable c++ source plus necessary headers that works with both windows and Linux/Ubuntu would be a great resource. Should also expand the variance of engines when developers aren’t locked in to particular inputs and can use preprocessed input. for the moment cuda nvidia supporting.

Daniel Shawul · Post by **Daniel Shawul** » Fri Apr 26, 2019 2:50 am

Rein Halbersma wrote: ↑Thu Apr 25, 2019 9:18 pm
Rémi Coulom wrote: ↑Thu Apr 25, 2019 8:48 pm
If you require GPU support on Ubuntu, please also install Bazel
(from https://github.com/FloopCZ/tensorflow_cc)
Thanks for correcting me! But at least it's a one time use and you don't need to integrate Bazel into your own project build.

I've also just found another package that is installable on Debian without having to build with Bazel: https://github.com/kecsap/tensorflow_cpp_packaging Not sure how mature it is though.

I have the option of building with either bazel or tensorflow_cc but there are some serious issues with the latter

a) tensorflow_cc is available only linux
b) multi-GPU problems with libtensorflow_cc. I reported the issue here https://github.com/FloopCZ/tensorflow_cc/issues/136
but there are still no solutions for it. Building with bazel will not have this problem. This was the deal breaker for me.
c) one more dependency libtensorflow_cc.so and maybe more

Bazel cons

Windows bazel build is kind of broken when compiling with GPU. There is a known issue which i can't find at the moment.
CPU build is OK though, so I use that and get one binary egbbdll.so without depedencies.
For GPU builds I compile directly against TensorRT. Note that you can configure tensorflow with TensorRT, MKL, experimental OpenCL etc so
theoretically you don't have to use anything other than tensorflow. Btw tensorflow flow do have TPU support which i have never explored. But building directly with TensoRT is so much easier (just another library) without going through the pain of compiling tensorflow (either via bazel or tensorflow_cc -- both equally painful). TensorRT gives atleast 2x speedup compared to Tensorflow compiled without TensorRT support. I am curious to know if Tensorlow with TensoRT support can perform equally well ..

@Remi Why do you to write even a single cuda kernel ? cuDNN has lots of convolution kernels to choose from anyway and the performance
the TensorRT performance is as good as hand-written cuda kernels of Ankan as I have detailed here
viewtopic.php?f=2&t=69885&hilit=Scorpio+Leela&start=10
And then when you factor in supporting tensor cores, fp16 and maybe int8/int4 with Turing, the old fp16 units in 1070 etc, one is inclined to conclude
this is better left for a library. I don't even build the graph explicitly like lc0 does because I don't intend to do any manual optimization of the graph
by writing convolution kernel etc.

Rémi Coulom · Post by **Rémi Coulom** » Fri Apr 26, 2019 10:26 am

Daniel Shawul wrote: ↑Fri Apr 26, 2019 2:50 am @Remi Why do you to write even a single cuda kernel ? cuDNN has lots of convolution kernels to choose from anyway and the performance
the TensorRT performance is as good as hand-written cuda kernels of Ankan as I have detailed here
viewtopic.php?f=2&t=69885&hilit=Scorpio+Leela&start=10
And then when you factor in supporting tensor cores, fp16 and maybe int8/int4 with Turing, the old fp16 units in 1070 etc, one is inclined to conclude
this is better left for a library. I don't even build the graph explicitly like lc0 does because I don't intend to do any manual optimization of the graph
by writing convolution kernel etc.

Writing my own cuda is certainly not a reasonable choice, but it is fun to try, and I still believe it is possible to outperform cuDNN for small batches.

Wouldn't it be nice if C++ GPU

Re: Wouldn't it be nice if C++ GPU

Re: Wouldn't it be nice if C++ GPU

Re: Wouldn't it be nice if C++ GPU

Re: Wouldn't it be nice if C++ GPU

Re: Wouldn't it be nice if C++ GPU

Re: Wouldn't it be nice if C++ GPU

Re: Wouldn't it be nice if C++ GPU

Re: Wouldn't it be nice if C++ GPU

Re: Wouldn't it be nice if C++ GPU

Re: Wouldn't it be nice if C++ GPU