hyper threading and move generation

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

tej

hyper threading and move generation

Post by tej »

Hi, I have an idea I would like comments about. I'm totally new to this forum, so just correct me if I'm wrong on something.

Anyways, I'm thinking about using a "slave thread" for move generation, while a "master thread" on the HT core is accessing a hash table, and waiting for RAM. I don't know how to determine a good hash table size, but with this framework cpu cache usage while using the hash table would not be an issue, so there wouldn't really be a limit on the number of entries in the table.

Has anyone tried something similar? / Or has a quick idea why this would not work?
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: hyper threading and move generation

Post by Daniel Shawul »

tej wrote:Hi, I have an idea I would like comments about. I'm totally new to this forum, so just correct me if I'm wrong on something.

Anyways, I'm thinking about using a "slave thread" for move generation, while a "master thread" on the HT core is accessing a hash table, and waiting for RAM. I don't know how to determine a good hash table size, but with this framework cpu cache usage while using the hash table would not be an issue, so there wouldn't really be a limit on the number of entries in the table.

Has anyone tried something similar? / Or has a quick idea why this would not work?
I think you got this wrong. Hash table size (number of entries) does not affect how much time is spent on accessing the hash table, since it is always done in constant time O(1).
HT technology uses idle superscalar processors by starting a new thread so you may as well start two search threads on one physical core. For chess programs this was not very much of a win because whenever you add a thread you add also search overhead. Unless the speed up from the additional hyper thread is enough to offset the overhead , the search would be slower than using single thread. You can find some threads in this forum about tests with/without HT.

Note that if there was any instruction level parallelism left the compiler probably would have taken care of it. So it is better to start a new thread for searching.
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: hyper threading and move generation

Post by Desperado »

tej wrote:Hi, I have an idea I would like comments about. I'm totally new to this forum, so just correct me if I'm wrong on something.

Anyways, I'm thinking about using a "slave thread" for move generation,
while a "master thread" on the HT core is accessing a hash table, and waiting for RAM. I don't know how to determine a good hash table size, but with this framework cpu cache usage while using the hash table would not be an issue, so there wouldn't really be a limit on the number of entries in the table.

Has anyone tried something similar? / Or has a quick idea why this would not work?
Hello Gabor,

first welcome to the forum :) !

if i understand correctly, you want to use a "move-generation" thread while another thread is just picking the generated moves from a hashtable then. Correct ?

If so, i wonder how do you handle the alphaBeta framework.
Will you generate millions of nodes which will never be accessed later ?
Or do you have a nice solution to avoid massive overhead ?

Parallel computing power should be used for the search imo.

you can find more information here

Michael
tej

Re: hyper threading and move generation

Post by tej »

Quick responses : )

Anyway, I wasn't interested in paralell search, let me try to explain it better:

A thread is doing the nice alpha-beta work, and stores stuff in the transposition table, and sometimes reads from the transposition table. Also, Probably not doing mush work while waiting for the RAM, which can be a long time if it doesn't hit L2 or L3 cache. So the other thread would wake up, and help out with some move generation in the L1 cache of the same core. When the very useful bits arrive from RAM, the main thread takes back control.
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: hyper threading and move generation

Post by hgm »

In principle this could work, provided you have an efficient way to start and stop the move-generation thread, and can prevent it from wasting too much CPU resoures when it has nothing to do. And you should be able to force the threads to run on the same physical core, so they share L1, or it would be a disaster.

One way would be to use a long-latency instruction like multiply in a spin-lock. Like:

Code: Select all

value = lock;
while(value) value *= lock;
This would fall through when the other thread sets lock=0. And it would not consume too many resoures, since the critical path of the loop contains a multiply, which takes many clocks, and the execution of the test-and-branch will have to wait for the result.
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: hyper threading and move generation

Post by syzygy »

tej wrote:A thread is doing the nice alpha-beta work, and stores stuff in the transposition table, and sometimes reads from the transposition table. Also, Probably not doing mush work while waiting for the RAM, which can be a long time if it doesn't hit L2 or L3 cache. So the other thread would wake up, and help out with some move generation in the L1 cache of the same core. When the very useful bits arrive from RAM, the main thread takes back control.
An easier way to achieve more or less the same effect is to issue a prefetch instruction to load the TT entry into cache while the cpu continues with some useful work.

In this paper, Ulrich Drepper proposes to use a hyperthread to do the prefetching (see par. 6.3.4 starting on page 63), which seems to get quite close to what you are thinking of. I don't think it will be easy to get synchronisation overhead low enough to make it work, though.
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: hyper threading and move generation

Post by syzygy »

hgm wrote:In principle this could work, provided you have an efficient way to start and stop the move-generation thread, and can prevent it from wasting too much CPU resoures when it has nothing to do. And you should be able to force the threads to run on the same physical core, so they share L1, or it would be a disaster.

One way would be to use a long-latency instruction like multiply in a spin-lock. Like:

Code: Select all

value = lock;
while(value) value *= lock;
This would fall through when the other thread sets lock=0. And it would not consume too many resoures, since the critical path of the loop contains a multiply, which takes many clocks, and the execution of the test-and-branch will have to wait for the result.
The monitor/mwait instructions were designed for efficiently implementing spinlocks when using hyperthreads. See e.g. this presentation.
tej

Re: hyper threading and move generation

Post by tej »

syzygy wrote: An easier way to achieve more or less the same effect is to issue a prefetch instruction to load the TT entry into cache while the cpu continues with some useful work.
Yes, prefetch is much more simple then what I proposed, with no overhead I think.
syzygy wrote: In this paper, Ulrich Drepper proposes to use a hyperthread to do the prefetching (see par. 6.3.4 starting on page 63), which seems to get quite close to what you are thinking of. I don't think it will be easy to get synchronisation overhead low enough to make it work, though.
So it wasn't a brand new idea after all : )