Wouldn't it be nice to have a C++ header file which supported:
model = LoadTrainedModelFromFile(filename); // model and weights, saved in some appropriate format from Python
results = model.predict(inputs); // using GPU
Wouldn't it be nice if C++ GPU
Moderators: hgm, Rebel, chrisw
-
- Posts: 438
- Joined: Mon Apr 24, 2006 8:06 pm
Re: Wouldn't it be nice if C++ GPU
I developed my own home-made C++ deep-learning framework just to be able to do that. I used tensorflow for a while, but it was too painful to use from C++. What you describe can be done with tensorflow, but last time I tried, I had to use undocumented/unsupported features of the low-level C++ tensorflow library, and it was really unpleasant (having to compile the library from source with bazel, ...).
Maybe other frameworks have better C++ support.
At the moment, I am using some simple C++ classes on top of CuDNN. I don't have autodiff, but manually calculating a gradient is not such a big deal in my opinion. I am thinking about compile-time autodiff with template meta-programming.
If people here want help about how to use tensorflow from C++ with a nVidia GPU, I could explain a little how I did it. It is not very difficult to do, but it is not documented.
I applied to the Tensorflow Research Cloud, and was accepted. This gives me access to 100 TPUs for one month. After days of trying to use a TPU from C++, I gave up trying. I am sure it is doable, but there is no documentation at all.
Maybe other frameworks have better C++ support.
At the moment, I am using some simple C++ classes on top of CuDNN. I don't have autodiff, but manually calculating a gradient is not such a big deal in my opinion. I am thinking about compile-time autodiff with template meta-programming.
If people here want help about how to use tensorflow from C++ with a nVidia GPU, I could explain a little how I did it. It is not very difficult to do, but it is not documented.
I applied to the Tensorflow Research Cloud, and was accepted. This gives me access to 100 TPUs for one month. After days of trying to use a TPU from C++, I gave up trying. I am sure it is doable, but there is no documentation at all.
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Wouldn't it be nice if C++ GPU
Caffe (https://github.com/BVLC/caffe) supports C++ - it is apparently the main language, Python is a binding. I don't know if it does quite what you need though.
--Jon
--Jon
-
- Posts: 4319
- Joined: Tue Apr 03, 2012 4:28 pm
Re: Wouldn't it be nice if C++ GPU
Daniel has some code here https://github.com/dshawul/egbbdll/blob ... val_nn.cpp
which includes what looks like load model, UFF format:
and what looks like a predict ...
it seems to need various support includes, and is not exactly easy to work out what is going on. I would guess the CUDA support is ongoing, because it is a pretty obvious thing for many people (not just games) to want to get the predictor into C++ for ongoing apps and lose the Python requirement at runtime.
which includes what looks like load model, UFF format:
Code: Select all
void TrtModel::LoadGraph(const string& uff_file_name, int dev_id, int dev_type) {
std::string dev_name = ((dev_type == GPU) ? "/gpu:" : "/cpu:") + std::to_string(dev_id);
printf("Loading graph on %s\n",dev_name.c_str());
fflush(stdout);
Model::id = dev_id;
cudaSetDevice(Model::id);
and so on ......
Code: Select all
void TrtModel::predict() {
cudaSetDevice(Model::id);
context->execute(BATCH_SIZE, buffers.data());
if(nn_type == DEFAULT || nn_type == SIMPLE) {
for(int i = 0;i < n_batch;i++) {
float p = buffers_h[valuei][3*i+0] * 1.0 + buffers_h[valuei][3*i+1] * 0.5;
scores[i] = logit(p);
and so on ......
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Wouldn't it be nice if C++ GPU
egbbdll is very easy to use because it was designed for probing endgame bitbases originally.
You could essentially do probe(FEN_string) and get value and policy results.
How and where it is evaluated the user doesn't need to know, but ofcourse it can use both CPU/GPU.
Both Tensorflow & TensoRT are supported which can use cuDNN so ofcourse it can use CUDA too.
Lc0 explicitly wrote cuda code for the backend but I am getting equal nps using TenorRT.
Moreover, one can use INT8 and maybe INT4. So writing backend code when there is a flora of deep learning
libraries is a futile endevour IMHO.
This is the actual code I use for probing bitbases and neural network. It has become a little cumbersome
after I added policy head but still easy to use. You populate your pieces, and feed history info (for lczero nets)
and just probe. The egbbdll takes care of "batching" with multi-thread approach, and caching as well.
Daniel
You could essentially do probe(FEN_string) and get value and policy results.
How and where it is evaluated the user doesn't need to know, but ofcourse it can use both CPU/GPU.
Both Tensorflow & TensoRT are supported which can use cuDNN so ofcourse it can use CUDA too.
Lc0 explicitly wrote cuda code for the backend but I am getting equal nps using TenorRT.
Moreover, one can use INT8 and maybe INT4. So writing backend code when there is a flora of deep learning
libraries is a futile endevour IMHO.
This is the actual code I use for probing bitbases and neural network. It has become a little cumbersome
after I added policy head but still easy to use. You populate your pieces, and feed history info (for lczero nets)
and just probe. The egbbdll takes care of "batching" with multi-thread approach, and caching as well.
Code: Select all
/*
Probe:
Change interanal scorpio board representaion to [A1 = 0 ... H8 = 63]
board representation and then probe bitbase.
*/
void SEARCHER::fill_list(int& count, int* piece, int* square) {
PLIST current;
#define ADD_PIECE(list,type) { \
current = list; \
while(current) { \
piece[count] = type; \
square[count] = SQ8864(current->sq); \
count++; \
current = current->next; \
} \
};
ADD_PIECE(plist[wking],_WKING);
ADD_PIECE(plist[bking],_BKING);
ADD_PIECE(plist[wqueen],_WQUEEN);
ADD_PIECE(plist[bqueen],_BQUEEN);
ADD_PIECE(plist[wrook],_WROOK);
ADD_PIECE(plist[brook],_BROOK);
ADD_PIECE(plist[wbishop],_WBISHOP);
ADD_PIECE(plist[bbishop],_BBISHOP);
ADD_PIECE(plist[wknight],_WKNIGHT);
ADD_PIECE(plist[bknight],_BKNIGHT);
ADD_PIECE(plist[wpawn],_WPAWN);
ADD_PIECE(plist[bpawn],_BPAWN);
piece[count] = _EMPTY;
square[count] = SQ8864(epsquare);
count++;
}
int SEARCHER::probe_bitbases(int& score) {
#ifdef EGBB
int piece[MAX_PIECES],square[MAX_PIECES],count = 0;
fill_list(count,piece,square);
score = probe_egbb(player,piece,square);
if(score != _NOTFOUND)
return true;
#endif
return false;
}
int SEARCHER::probe_neural(bool hard_probe) {
#ifdef EGBB
UBMP64 hkey = ((player == white) ? hash_key :
(hash_key ^ UINT64(0x2bc3964f82352234)));
int moves[3*MAX_MOVES];
int *s = moves;
for(int i = 0; i < pstack->count; i++) {
MOVE& m = pstack->move_st[i];
int from = m_from(m), to = m_to(m);
if(is_castle(m)) {
if(to > from) to++;
else to -= 2;
}
*s++ = SQ8864(from);
*s++ = SQ8864(to);
*s++ = m_promote(m);
}
*s++ = -1;
nnecalls++;
if(nn_type == 0) {
int piece[33],square[33],isdraw[1];
int count = 0, hist = 1;
fill_list(count,piece,square);
return probe_nn(player,castle,fifty,hist,isdraw,piece,square,moves,
(float*)pstack->score_st,pstack->count,hkey,hard_probe);
} else {
int piece[8*33],square[8*33],isdraw[8];
int count = 0, hist = 0, phply = hply;
for(int i = 0; i < 8; i++) {
isdraw[hist++] = draw();
fill_list(count,piece,square);
if(hply > 0 && hstack[hply - 1].move)
POP_MOVE();
else break;
}
count = phply - hply;
for(int i = 0; i < count; i++)
PUSH_MOVE(hstack[hply].move);
if(isdraw[0])
hkey ^= UINT64(0xc7e9153edee38dcb);
hkey ^= fifty_hkey[fifty];
return probe_nn(player,castle,fifty,hist,isdraw,piece,square,moves,
(float*)pstack->score_st,pstack->count,hkey,hard_probe);
}
#endif
return 0;
}
void PROCESSOR::set_num_searchers() {
#ifdef EGBB
if(SEARCHER::use_nn && set_num_active_searchers) {
int n_searchers = n_processors - n_idle_processors;
set_num_active_searchers(n_searchers);
}
#endif
}
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: Wouldn't it be nice if C++ GPU
LeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.mdRémi Coulom wrote: ↑Thu Apr 25, 2019 1:59 pm I developed my own home-made C++ deep-learning framework just to be able to do that. I used tensorflow for a while, but it was too painful to use from C++. What you describe can be done with tensorflow, but last time I tried, I had to use undocumented/unsupported features of the low-level C++ tensorflow library, and it was really unpleasant (having to compile the library from source with bazel, ...).
Maybe other frameworks have better C++ support.
At the moment, I am using some simple C++ classes on top of CuDNN. I don't have autodiff, but manually calculating a gradient is not such a big deal in my opinion. I am thinking about compile-time autodiff with template meta-programming.
If people here want help about how to use tensorflow from C++ with a nVidia GPU, I could explain a little how I did it. It is not very difficult to do, but it is not documented.
I applied to the Tensorflow Research Cloud, and was accepted. This gives me access to 100 TPUs for one month. After days of trying to use a TPU from C++, I gave up trying. I am sure it is doable, but there is no documentation at all.
-
- Posts: 438
- Joined: Mon Apr 24, 2006 8:06 pm
Re: Wouldn't it be nice if C++ GPU
Thanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.Rein Halbersma wrote: ↑Thu Apr 25, 2019 6:58 pmLeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
By the way, has anybody here tried to code a fast convolution in cuda directly? I will probably try soon. My impression is that the performance of cuDNN is very bad for small batches. Good performance with small batches is important for tree search.
Rémi
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Wouldn't it be nice if C++ GPU
https://github.com/ankan-ban/ConvTestRémi Coulom wrote: ↑Thu Apr 25, 2019 7:18 pm ...
By the way, has anybody here tried to code a fast convolution in cuda directly? I will probably try soon. My impression is that the performance of cuDNN is very bad for small batches. Good performance with small batches is important for tree search.
Rémi
--
Srdja
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: Wouldn't it be nice if C++ GPU
That's not what tensorflow_cc advertises: https://github.com/FloopCZ/tensorflow_ccRémi Coulom wrote: ↑Thu Apr 25, 2019 7:18 pmThanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.Rein Halbersma wrote: ↑Thu Apr 25, 2019 6:58 pmLeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
This repository makes possible the usage of the TensorFlow C++ API from the outside of the TensorFlow source code folders and without the use of the Bazel build system.
-
- Posts: 438
- Joined: Mon Apr 24, 2006 8:06 pm
Re: Wouldn't it be nice if C++ GPU
This means that you don't need bazel to build your own code, but you need it to build the tensorflow library:Rein Halbersma wrote: ↑Thu Apr 25, 2019 8:23 pmThat's not what tensorflow_cc advertises: https://github.com/FloopCZ/tensorflow_ccRémi Coulom wrote: ↑Thu Apr 25, 2019 7:18 pmThanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.Rein Halbersma wrote: ↑Thu Apr 25, 2019 6:58 pmLeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
This repository makes possible the usage of the TensorFlow C++ API from the outside of the TensorFlow source code folders and without the use of the Bazel build system.
(from https://github.com/FloopCZ/tensorflow_cc)If you require GPU support on Ubuntu, please also install Bazel