Fastest Cloud Computing For ML Nearly As Fast As Fastest Supercomputer

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
towforce
Posts: 11023
Joined: Wed Mar 08, 2006 11:57 pm
Location: Birmingham UK

Fastest Cloud Computing For ML Nearly As Fast As Fastest Supercomputer

Post by towforce » Thu Jul 30, 2020 8:04 am

I'm interested to compare ML offerings on on GCE (Google Compute Engine), AWS and Azure - the 3 biggest providers (maybe IBM and a few others as well), but looking at the GCE offering, I think I must be misunderstanding something?

They have a wide range of offerings, but their fastest one seems to be their TPU offering, which is not surprising given that most arithmetic on a TPU is half-precision, making it "not very useful" for tasks other than ML. :)

Here's their TPU pricing - link. Although a price isn't offered for this option, the fastest option offered is 2048 cores of their TPU V3. Here's why I am confused:

Wiki appears to be saying that a single V3 TPU offers 90 teraflops (link). According to the above pricing page, you can rent 2048 of those, for a total of 90*2048 = 184,320 teraflops, which is 184 petaflops. This is close to being as fast as the world's fastest supercomputer - link.

I've done some checks to see whether this could possibly be correct, and it could well be: a direct quote from this page - link: "Cloud TPU v3 Pod 100+ petaflops"

So on GCE you can apparently rent nearly as many flops as the world's fastest supercomputer will offer you. The world's fastest supercomputer cost a billion dollars (link).
Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".

smatovic
Posts: 1845
Joined: Wed Mar 10, 2010 9:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic
Contact:

Re: Fastest Cloud Computing For ML Nearly As Fast As Fastest Supercomputer

Post by smatovic » Thu Jul 30, 2020 9:55 am

Apples and oranges.

At first, Google TPU gen 1 is INT8 based and used for inference, TPU gen 2 and
gen 3 are BF16, brain float 16, used for training.

At second, you can not compare a) general purpose CPU ALUs with b) Vector-Units/
SIMD Units with c) MMUs, Matrix-Multiplication-Units and alike, these all offer
different kind of FLOPS resp. OPS.

For example, the Summit supercomputer with ~200 petaFLOPS FP64 via Nvidia V100
offers > 1 exaOPS in mixed precision mode.

You really have to look what kind of computation task you have and then choose
the appropriate benchmark, there are various ML tasks with according benchmarks
out there and vendors usual publish their performance on these...

*** edit ***
not to mention diffrent kind of memory architectures and bandwidth...

--
Srdja

User avatar
towforce
Posts: 11023
Joined: Wed Mar 08, 2006 11:57 pm
Location: Birmingham UK

Re: Fastest Cloud Computing For ML Nearly As Fast As Fastest Supercomputer

Post by towforce » Thu Jul 30, 2020 12:54 pm

smatovic wrote:
Thu Jul 30, 2020 9:55 am
Apples and oranges.

At first, Google TPU gen 1 is INT8 based and used for inference, TPU gen 2 and gen 3 are BF16, brain float 16, used for training.

At second, you can not compare a) general purpose CPU ALUs with b) Vector-Units/SIMD Units with c) MMUs, Matrix-Multiplication-Units and alike, these all offer different kind of FLOPS resp. OPS.

For example, the Summit supercomputer with ~200 petaFLOPS FP64 via Nvidia V100 offers > 1 exaOPS in mixed precision mode.

You really have to look what kind of computation task you have and then choose the appropriate benchmark, there are various ML tasks with according benchmarks out there and vendors usual publish their performance on these...

*** edit ***
not to mention diffrent kind of memory architectures and bandwidth...

I wasn't aware of the bfloat ("brain float") data type - link.

The Top500 has long had issues with comparing like with like: their long-standing measurement, which enables comparisons with machines going back right to the first computers, is flops - and in terms of flops, the GCE TPU V3 offering gives you a GINORMOUS number. Unfortunately, the way the Top500 measures flops is by running Linpack - and I'm guessing you cannot run Linpack on a TPU because it probably requires "normal" precision arithmetic.

You can also rent GPUs in the GCE, but their flops numbers, impressive as they are, aren't as impressive as the TPU processors deliver. If anyone is training a chess NN, and doesn't want to buy (and fit) a top-end graphics card, which might be outdated in a few years, renting TPUs in a cloud like GCE would certainly be an option.

I downloaded an ML benchmark for my phone (link), and my phone got a score of 86,406. I don't think the Galaxy 8 has anything like a TPU (newer phones might), but it will certainly have a GPU on its SOC.
Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".

Post Reply