mcostalba wrote:syzygy wrote: What Marco is really saing is that NUMA is a non-issue.
I say that NUMA (in case of less than 64 logical processors on Windows) is still to be proved sensibly stronger than ignoring NUMA.
syzygy wrote:That ignores the numbers reported by multiple chess engine programmers. .
Can you please post link to these
multiple test reports?
Sure.
Some Real Performance Data
Code: Select all
Configuration Best Split Depth Average Node Speed Speed Gain
Standard 14 13600 kN/s
With Large Pages 14 14900 kN/s +10%
With NUMA and Large Pages 12 16200 kN/s +20%
In case of Peter we are talking of about 10% speed-up in case of his engine Texel. I don't think this can be blindly generalized to all engines,
Seems to me the one who was blindly generalising is you.
for instance one of the latest patches n SF (that you know well) removed a scalability issue that alone, is able to gain that 10% in case of many cores (32 cores were used for testing).
Which just serves to prove that hardware architecture very well is an issue that needs to be taken into account if performance is considered important.
That's what I know (I mean numbers and tests, not words), if you have something else I would be happy to read them.
Yes, you do not believe in rational arguments. That's OK, because this is a thread for everybody who is interested in the topic.
On a NUMA system, accessing memory on the local node is faster than accessing memory on another node. This is not hand waving; it is the definition of NUMA.
So it follows directly from the technical definition of a NUMA system that on such a system the memory accessed by a search thread is ideally present on the node on which that search thread runs. Obviously this is not possible in case of the transposition table, which needs to be shared by all search threads, but it is possible for many other tables and data structures. Doing so on a machine that is dedicated to running a chess engine (i.e. does not have all kinds of other cpu- and/or memory-intensive jobs running at the same time) cannot harm and should be expected to improve performance at least to some extent. How much performance will be improved will of course depend on lots of factors.