To TPU or not to TPU...

pilgrimdan · Post by **pilgrimdan** » Fri Dec 22, 2017 10:45 am

phhnguyen wrote:
pilgrimdan wrote: 5,000 processing units ... okay ... how much does one of these cost ...
25 million USD for the whole system of 5000 TPUs, according to AlphaGoZero article.

Google does not sell them. But I think for a guy having over 25m USD, willing to spend he can still buy

so ... let me see if I get this right ...

Stockfish ... an open source free software ... gets it butt kicked by a machine that costed 25 million dollars ...

is there something i'm missing here ?

and if you were to reply well ... that's not the point ... its the way it beat Stockfish ...

those words just seem to fall on deaf ears ...

hgm · Post by **hgm** » Fri Dec 22, 2017 11:16 am

pilgrimdan wrote:is there something i'm missing here ?

Better to ask: is there something you are not missing here?

For one, AlphaZero was using 4 TPUs when playing against Stockfish, not 5,000. So at k$5/TPU that would be a k$20 machine. Special-purpose equipment is always more expensive than mass-produced consumer goods; if the TPUs were produced and sold in the same numbers as x86 or ARM CPUs, they would probably just cost $150/TPU, instead of $5,000.

The 5,000 TPUs were just used for training AlphaZero. They should be compared to the tens of thousands of people donating CPU time on their k$2 computers to run fishtest for tuning Stockfish. And 10,000 times k$2 is M$20. So Stockfish has been using M$20 worth of hardware too.

This still did not factor in the time; AlphaZero needed the training equipment only for 9 hours. Fishtest has been running for years. If the AlphaZero team would have spread out the training over a time similar to Stockfish development (say a year) they could have done it on the k$20-machine with the 4 TPUs. So it was basically Stockfish that has been using 1000x more expensive hardware as AlphaZero.

mar · Post by **mar** » Fri Dec 22, 2017 2:50 pm

hgm wrote:The 5,000 TPUs were just used for training AlphaZero. They should be compared to the tens of thousands of people donating CPU time on their k$2 computers to run fishtest for tuning Stockfish. And 10,000 times k$2 is M$20. So Stockfish has been using M$20 worth of hardware too.

Last time I checked I saw 136 people donating, even if some donate 30 cores (so certainly above 2k), I don't see $20M hardware.
http://tests.stockfishchess.org/tests/v ... 0ccbb8bfad

hgm · Post by **hgm** » Fri Dec 22, 2017 3:03 pm

Oh, that is much less than I expected. Still, if you figure in the time factor, it seems that Stockfish development consumed orders of magnitude more expensive resources (cores x price/core x time) than AlphaZero.

pilgrimdan · Post by **pilgrimdan** » Fri Dec 22, 2017 3:38 pm

hgm wrote:Oh, that is much less than I expected. Still, if you figure in the time factor, it seems that Stockfish development consumed orders of magnitude more expensive resources (cores x price/core x time) than AlphaZero.

I must admit... it is impressive what Deepmind did ... to go from basically nothing to within a short time comparable to Stockfish... NN + MCTS is a pretty vicious 1-2 punch... it would be interesting to know if there are any 'holes' in alphazero...

smatovic · Post by **smatovic** » Fri Jan 06, 2023 1:23 pm

smatovic wrote: ↑Sat Dec 16, 2017 10:20 am i guess i am not the only one who ponders about chess NN implementation,

so what do you think, should programmers use Frameworks/Libraries like TensorFlow,
to make use of rising TPUs
(Nvidia Volta, Google Cloud TPU, special chips in SmartPhones, ASICs)
or should we write our own NN implementations, optimized for current consumer Hardware
(CPU, AVX, GPU)?

--
Srdja

Just reflecting a bit....

Meanwhile Lc0 uses TensorFlow and Stockfish PyTorch framework for training neural networks. Lc0 has multiple, optimized CPU/GPU backends, like BLAS, OpenCL, CUDA, CUDNN, DX12, Metal and Stockfish has optimized NNUE inference code for different CPU/VPU architectures. So we see both, using frameworks for training, and using libraries and self-written code for inference. I am still not aware of any chess engine (beside original A0) that uses dedicated TPUs (aka neural engines) present in mobiles or SoCs, Intel's AMX in Xeon line might be a candidate in future?

--
Srdja

Ras · Post by **Ras** » Fri Jan 06, 2023 9:08 pm

smatovic wrote: ↑Fri Jan 06, 2023 1:23 pmI am still not aware of any chess engine (beside original A0) that uses dedicated TPUs (aka neural engines) present in mobiles or SoCs

From what I read about the Apple M1, its neural engine is badly documented and not very flexible in what network architectures it accepts. You also don't get any error indication and have to trial and error your way around seeing whether the CPU load increases, which suggests that the network is running on the CPU cores instead. It seems that this is more of a marketing gimmick and maybe for some Apple tools like picture sorting or so, but not for general use.

smatovic · Post by **smatovic** » Sat Jan 07, 2023 9:31 am

Ras wrote: ↑Fri Jan 06, 2023 9:08 pm
smatovic wrote: ↑Fri Jan 06, 2023 1:23 pmI am still not aware of any chess engine (beside original A0) that uses dedicated TPUs (aka neural engines) present in mobiles or SoCs
From what I read about the Apple M1, its neural engine is badly documented and not very flexible in what network architectures it accepts. You also don't get any error indication and have to trial and error your way around seeing whether the CPU load increases, which suggests that the network is running on the CPU cores instead. It seems that this is more of a marketing gimmick and maybe for some Apple tools like picture sorting or so, but not for general use.

Well, Apple has its Neural Engine in M-series, Intel puts AMX in Xeon line, and AMD follows with AMD/Xilinx "XDNA" in Ryzen 7xxx mobile series:

Moving forward, the AI engine and FPGA fabric from Xilinx will be known as "adaptive architecture" building blocks under the name XDNA.

The AI engine is built with a so-called "dataflow architecture" that makes it well-suited for AI and signal processing applications that need a mix of high performance and energy efficiency.

The FPGA fabric, on the other hand, serves as an adaptive interconnect that comes with FPGA logic and local memory.

After teasing plans in May, AMD said it plans to use the AI engine in future Ryzen processors, which includes two future generations of laptop CPUs coming over the next few years. The company also teased that it will use the AI engine in future Epyc CPUs.

To help it capitalize on larger ambitions in AI computing, AMD promised that it will unify previously disparate software stacks for CPUs, GPUs and adaptive chips from Xilinx into one, with the goal of giving developers a single interface to program across different kinds of chips.

The effort will be called the Unified AI Stack, and the first version will bring together AMD's ROCm software for GPU programming, its CPU software and Xilinx's Vitis AI software.

AMD said this will give developers access to optimized inference models and the ability to use popular AI frameworks like PyTorch and TensorFlow across its wider portfolio of chips.

https://www.notebookcheck.net/AMD-detai ... 093.0.html

https://www.tomshardware.com/news/amd-b ... nix-arrive

I am not into the details, what these are effectively able to compute, but if the vendors put dedicated mat-mul engines on silicon there must be some kind of gain. Seems like programmers will rely on frameworks/APIs/libraries from specific vendors to be able to utilize these.

--
Srdja

Henk · Post by **Henk** » Sat Jan 07, 2023 10:53 am

I don't like neural networks at all. If you have no clue you use neural networks. But that is similar to resignation.
Neural networks will never be 100% accurate. Being statistical machines and black boxes.

Ras · Post by **Ras** » Sat Jan 07, 2023 11:02 am

smatovic wrote: ↑Sat Jan 07, 2023 9:31 amThe effort will be called the Unified AI Stack, and the first version will bring together AMD's ROCm software for GPU programming, its CPU software and Xilinx's Vitis AI software.

Given that AMD still struggles with basic driver problems, I wouldn't bet a dime on their software stack. Also, AMD has missed the future for so long that their only hope to get anywhere is at least source level / API compatibibility with CUDA/ML.

To TPU or not to TPU...

to TPU or not to TPU?

Re: To TPU or not to TPU...

Re: To TPU or not to TPU...

Re: To TPU or not to TPU...

Re: To TPU or not to TPU...

Re: To TPU or not to TPU...

Re: To TPU or not to TPU...

Re: To TPU or not to TPU...

Re: To TPU or not to TPU...

Re: To TPU or not to TPU...

Re: To TPU or not to TPU...