mwyoung wrote: ↑Tue Dec 31, 2019 7:19 am
I was told when running the exact GPU's. it was better to use roundrobin or demux instead of multiplexing.
You might find this useful.
roundrobin
Can have multiple child backends. Alternates to which backend the request is sent. E.g. if there are 3 children, 1st request goes to 1st backend, 2nd -- to 2nd, then 3rd, then 1st, 2nd, 3rd, 1st, ... and so on.
Somewhat similar to multiplexing backend, but doesn't combine/accumulate requests from different threads, but rather sends them verbatim immediately. It also doesn't need to use any locks which makes it a bit faster.
It's important for this backend that all child backends have the same speed (e.g. same GPU model, and none of them is throttled/overheated). Otherwise all backends will be slowed down to the slowest one. If you use non-uniform child backends, it's better to use multiplexing backend.
Options:
Takes list of subdictionaries as options, and creates one child backend per dictionary. All subdictionary parameters are passed to those backends, but there are also one additional param:
backend=<string> (default: name of the subdictionary) Name of child backend to use.
demux
Does the opposite from what multiplexing does: takes large batch which comes from search, splits into smaller batches and sends them to children backends to compute in parallel.
May be useful for multi-GPU configurations, or multicore CPU configurations too.
As with roundrobin backend, it's important that all child backends have the same performance, otherwise everyone will wait for the slowest one.
Options:
minimum-split-size=<int> (default: 0) Do not split batch to subbatches smaller than that.
Also takes list of subdictionaries as options, and creates one child backend per dictionary. All subdictionary parameters are passed to those backends, but there are also additional params:
threads=<int> (default: 1) Number of eval threads allocated for this backend.
backend=<string> (default: name of the subdictionary) Name of child backend to use.
Backend configuration at competitions
Here is what we use in competitions (as far as I could find):
CCCC:
backend: demux
backend-opts: backend=cudnn-fp16,(gpu=0),(gpu=1),(gpu=2),(gpu=3)
TCEC:
backend: roundrobin (starting from current DivP; before it was multiplexing)
backend-opts: backend=cudnn-fp16,(gpu=0),(gpu=1)