assembler for locking at AMD magny cours

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: assembler for locking at AMD magny cours

Post by diep »

Vinvin wrote: Sat Nov 07, 2020 2:51 am
diep wrote: Fri Nov 06, 2020 1:50 am So i built for fun an old 48 core box with old magny cours cpu's for a fraction of the price the great threadrippers go for.
...
Hi !
I suppose CPU's are "Opteron 6174" as shown here : https://www.anandtech.com/show/2978/amd ... ore-xeon/2
and here : https://www.cpubenchmark.net/cpu.php?cp ... cpuCount=4

What are the speed and Watts consumption when running Stockfish 12 (4 GB hash) and Stockfish 11 full speed on this box ?

Thanks for this information,
Vincent
Very old question i see.

Regrettably motherboards i had bought both were broken. 1 was a total write off and the other only 3 sockets work. So it's a 36 core box. It only runs linux actually and has weird habits. A bit over 300 watts under full load it is with 4 sockets. Yet that doesn't include the watercooled gpu i have on it which is a Titan-Z i used to program some code on. Quite recently actually. Tried get prime numbers sieved on it.

GPU programming seems not so useful anymore though. Some critical things simply cannot be done quickly like ordering small arrays (read that as it's total impossible i tried everything i could but can not find a single way let alone some trick losing 'only' factor 5 or more to achieve this).

This box has some nasty habits also its onboard ventilators make it a loud box (would be solvable though).

I built a new box therefore - which is a 2 socket intel box in total 44 cores. I have win2019 installed on it and it's running under full load 2.0Ghz. It's watercooled both sockets. It is a V4 intel cpu though Engineering sample. Very cheap on ebay. Now they raised price of that chip a 50-100 bucks last time i checked on aliexpress. E5-2699v4 ES. Normal clock supposed to be 2.1ghz yet under full load it is stable 2.0Ghz. Hyperthreading turned off of course.

More interesting platform to test i'd say. The 36 core box will get decommissioned. Only for some future tests, if those ever happen, i might use it.

This intel V4 cpu is broadwell core and that delivers a theoretical 32 gflops fp64 a clock, versus the old machines i have here where i type this onto (which obviously is linux as windows would get instantly hacked so much that it can't even boot anymore if i would put it on the internet) is a Xeon L5420. Just like the bulldozer it delivers a 4 flops fp64 each clock in theory.

What i do notice on both hardware is that todays kernels still are total stupid. Both linux as well as windows. If you manually schedule programs to run specific processes with their threads onto specific cores then things go much better.

Now that might not be major issue for a chessprogram actually. For prime number software it really is.

How they not manage to fix the scheduling of the kernels there - i'm pretty horrified by it - though i know how clumsy linux smp kernel team is there.
Most of their work is just babysit things - well that's what you get if things is unpaid of course. Windows2019pro - it is 2021 here now.

Like 18 years after AMd launched their opterons with NUMA architecture still the schedulers are total beginners work.

So any chessprogram would suffer more onto such bulldozer architecture obviously because latencies between each socket are much much slower than a 2 socket intel LGA2011-3 box.

That linux box 36 cores, i can still boot it, if you develop an engine and have an exe you are interested in knowing its speed.

Testing some beancounter from other dudes is not so interesting. You know it'll suck under linux with gcc with such slow cache coherency between each socket.

I do not know whether SMP code has been changed there to something else - whether they do what GCP had programmed for a series of commercial engines that most of the planet uses now. That would scale pretty well - but still the latencies would hurt.

You would initially want to split at smaller search depths only processes/threads that run on the same socket with sureness.

So create a hybrid kind of approach for splitting. Like this which is similar how my 2 hours of sleep hacks in world champs 2003 worked.
(with 2 hours of sleep i fixed diep in 2003 to run at the 512 processor supercomputer better).

Have 4 different splitlists and each core last X plies can only access threads/processes that run on the same socket. Then at bigger depthlefts you give all cores access potentially to all 4 different splitlists.

More than 32 cores in a single splitlist is asking for troubles. Let alone with slower latencies.

So you do not need to test this knowing that all engines will suck at it.

Which is why i had built the box in the first place - i only had to pay for motherboard costs as i got the rest of the hardware for free. enough for 2 computers in fact. I could build another bulldozer box with 4 cpu's easily.

Regrettably both boards i received sucked.
I have in total like 4 cpu's left now of 2.2Ghz and the box working is 2.3Ghz cpu's.

The problem is the motherboards. Some dudes in USA collect all these boards and they have no freaking clue how to store them. They wrap them tight in plastic causing the board to bend as it is so huge.

that makes the boards unstable after some years of storage.

A quickie test then might work just before their wrap - yet after some years wrapped they all fail.
In itself was not a bad idea i'd say try build a 48 core box. Regrettably the motherboards is the problem.

The current box was not cheap. Of course i had a PSU laying around and with 3d printer i could print some plastics using very dirt cheap gpu waterblocks as the blocks to watercool the sockets for LGA2011-3. Again motherboard is the problem. First tried some dual socket boards from china. First one never was shipped. The usual cryptic chinese sayings indicated already something like that. Took long time to get back cash.

Then ordered a second board. Even without cpu it already gives a freaking dead code.
As i was very busy that weekend and a few days. Already within a few days suddenly aliexpress decided
that the salesman would get his cash and that my question to get a refund for yet another broken X99 board resulted in a negative answer.

Filing a protest you must wait months sometimes - yet such salesman if he is chinese gets their money for a broken board within a few days despite i gave of course evidence.

In short with those multiple socket machines the the motherboard is time and again the problem.

i then bought a board within netherlands. Also took weeks of emailing shops and negotiations as no salesman wanted to garantuee a motherboard from supermicro built after 2016 (as those have a bios new enough to support V4 cpu's and no shop has any sort of cpu left let alone knows how to flash a bios anymore to upgrade bios).

It is a special science those multiple socket boards - and i'm building them since the 1990s.

We must ask ourselves the question however for how long we'll keep seeing multiple socket motherboards in the future.
If we look at the killerplan AMD carried out with their threadripper cpu's. That's produce 8 core CCD's which are low power only eat like 30 watts or something similar and then plug 8 of them in a single package with in the middle a chip that's the bridge between those CCD's.

So each 64 core threadripper chip is in fact a 8 socket system.

Taking over the HPC and whatever you and i do by storm of course.

building multiple socket machines from that i have to see.

Sure they plug 2 epyc's onto a single motherboard - but look at price of that motherboard. The 32 core threadripper epycs are peanut prices now on ebay/aliexpress yet the motherboards are $$$$.

Not sure how long we will keep see quad socket motherboards.

Intel will try some probably as their businesscase is ask $20k+ for cpu's that can work in 8 socket machines.

Yet it's like trying to squeeze money out of a concept where AMD walks over as if they are just toasters generating heat.