Henk wrote:Do you think you could write a super simple engine that beats Fairy-Max easily ?
I got a bit carried away by this question. (I know, unwise...) So last week I spent writing a new engine for orthodox Chess, as a response to this challenge (and to relieve the 'after-ICGA blues). I started on Sunday night, got a bit carried away
Rank Name Elo + - games score oppo. draws
1 Giraffe 63f71c3eb204 101 60 52 38 75% -101 3%
2 Fairy-Max 4.8S -101 52 60 38 25% 101 3%
It also searches at about 1/5 the speed of Fairy-Max. However, it does have a world-class evaluation function.
Also, that means Skipper is searching at the same speed as Giraffe... which means there is something seriously wrong with Skipper performance-wise. Giraffe does a deep neural network evaluation for each position, which involves multiple large matrix-vector multiplications.
You should definitely find a profiler (several have been suggested in this thread) and see what Skipper is spending all its time on. Also, remember to turn on compiler optimizations.
If you have a slow evaluation function you cannot use it on the leaves or maybe only on the first leave. Searching deeper is perhaps always better.
I tried to use a profiler one year ago. Result was that my system crashed. So it did not work for my C#.NET code.
Just installed VerySleepy but it probably only works for C++ code for it doesn't display any useful information. It only reports statistics about threads not code. Otherwise I don't know how to use it.
I don't want to use the free visual studio community version for I don't want to develop open source.
Probably bottleneck is qSearch what else could it be.
Dann Corbit wrote:If you are doing matrix multiplications, perhaps you can benefit from a GPU card as a coprocessor.
I actually do have GPU support implemented. It helps a bit with training but not for playing.
The reason is that matrix-matrix multiplications (used in training) is compute-bound, while on modern computers, matrix-vector multiplications (used in playing) are memory-bound. The matrices are also not quite big enough to cover up PCI-E latency. Most experiments showed that GPU only beats an optimized CPU implementation for matrices larger than 1000x1000 or so. I am only using ~300x300.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Henk wrote:I tried to use a profiler one year ago. Result was that my system crashed. So it did not work for my C#.NET code.
Just installed VerySleepy but it probably only works for C++ code for it doesn't display any useful information. It only reports statistics about threads not code. Otherwise I don't know how to use it.
I don't want to use the free visual studio community version for I don't want to develop open source.
Probably bottleneck is qSearch what else could it be.
Ah yes, if you use C# your options are much more limited. Most people write high performance applications in C/C++.
Individual license. If you are an individual working on your own applications to sell or for any other purpose, you may use the software to develop and test those applications.
It may or may not be qSearch. Programmers are very bad at guessing where bottlenecks are. There's good possibility that it will turn out to be somewhere you never thought about.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
I found a profiler which seems to work. It is called SlimTune. For instance it says evaluation is a bottleneck. Passed pawns evaluation and checking if rook on open file. Don't know yet if I interpret the results right.
Henk wrote:I found a profiler which seems to work. It is called SlimTune. For instance it says evaluation is a bottleneck. Passed pawns evaluation and checking if rook on open file. Don't know yet if I interpret the results right.
That sounds plausible. If that's the case pawn hash may help.
For rook on open file, this is why bitboards are great. Passed pawn evaluations are also much faster with bitboards.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Seems that using iterator and visitor design patterns also do no good to performance. Even garbage collect slows it down. I use visitors in evaluation to visit pieces and get their value. I knew it was slow but did not care.