Page 1 of 1

Is Houdini only TOP3 engine with NUMA support in TCEC?

Posted: Mon Dec 11, 2017 2:38 pm
by Jouni
I remember to read, that neither SF nor Komodo has real NUMA support. According to Houdini document it has +50% boost to nodespeed with dual XEON and NUMA!

Re: Is Houdini only TOP3 engine with NUMA support in TCEC?

Posted: Mon Dec 11, 2017 3:29 pm
by Milos
Jouni wrote:I remember to read, that neither SF nor Komodo has real NUMA support. According to Houdini document it has +50% boost to nodespeed with dual XEON and NUMA!
Brainfish works pretty good with NUMA (and Cfish too). I have no clue what is "real NUMA support" but when going dual XEON with Brainfish using for example 8 cores from each CPU, performance is almost identical to the case when you use 16 cores of a single CPU.

Re: Is Houdini only TOP3 engine with NUMA support in TCEC?

Posted: Mon Dec 11, 2017 9:12 pm
by Houdini
Jouni wrote:I remember to read, that neither SF nor Komodo has real NUMA support. According to Houdini document it has +50% boost to nodespeed with dual XEON and NUMA!
Where did you get that number?

People continue to be confused about 1) support for more than 64 logical processors in Windows, and 2) true NUMA support for multiple CPU sockets.
Point 1) is really important - in Windows you need to make a special call to a system function to be able to use more than 64 threads.
Point 2) is not big. With lazy-like SMP, in which all the threads are running independently, the added value of "NUMA" support is much less than it was with YBWC. It probably delivers at most 5% on Intel hardware.

The TCEC computer is running without hyperthreading and has only 44 logical processors - that's why point 1) does not apply, only point 2) remains.The handicap for the engines without any NUMA support will be very small. Using Large Pages for the hash would be a much bigger deal for TCEC.

Cheers,
Robert

Re: Is Houdini only TOP3 engine with NUMA support in TCEC?

Posted: Wed Dec 13, 2017 10:26 pm
by Jouni

Code: Select all

Configuration 1 thread 6 threads 20 threads 40 threads
Standard or Pro
without NUMA
2,400 kN/s 13,600 kN/s 47,600 kN/s 58,700 kN/s
Pro with NUMA 2,400 kN/s 13,500 kN/s 47,800 kN/s 91,200 kN/s
Pro with Large
Pages
2,550 kN/s 14,500 kN/s 49,400 kN/s 67,300 kN/s
Pro with NUMA
and Large Pages
2,550 kN/s 14,600 kN/s 49,500 kN/s 96,400 kN/s
Has this manual then error?

Re: Is Houdini only TOP3 engine with NUMA support in TCEC?

Posted: Wed Dec 13, 2017 10:41 pm
by CheckersGuy
Houdini wrote:
Jouni wrote:I remember to read, that neither SF nor Komodo has real NUMA support. According to Houdini document it has +50% boost to nodespeed with dual XEON and NUMA!
Where did you get that number?

People continue to be confused about 1) support for more than 64 logical processors in Windows, and 2) true NUMA support for multiple CPU sockets.
Point 1) is really important - in Windows you need to make a special call to a system function to be able to use more than 64 threads.
Point 2) is not big. With lazy-like SMP, in which all the threads are running independently, the added value of "NUMA" support is much less than it was with YBWC. It probably delivers at most 5% on Intel hardware.

The TCEC computer is running without hyperthreading and has only 44 logical processors - that's why point 1) does not apply, only point 2) remains.The handicap for the engines without any NUMA support will be very small. Using Large Pages for the hash would be a much bigger deal for TCEC.

Cheers,
Robert
May I ask what the difference was when Houdini used ybwc ?

Re: Is Houdini only TOP3 engine with NUMA support in TCEC?

Posted: Wed Dec 13, 2017 10:52 pm
by Houdini
Jouni wrote:

Code: Select all

Configuration 1 thread 6 threads 20 threads 40 threads
Standard or Pro
without NUMA
2,400 kN/s 13,600 kN/s 47,600 kN/s 58,700 kN/s
Pro with NUMA 2,400 kN/s 13,500 kN/s 47,800 kN/s 91,200 kN/s
Pro with Large
Pages
2,550 kN/s 14,500 kN/s 49,400 kN/s 67,300 kN/s
Pro with NUMA
and Large Pages
2,550 kN/s 14,600 kN/s 49,500 kN/s 96,400 kN/s
Has this manual then error?
No, the table demonstrates exactly what I've explained above.