Hi, I have an idea I would like comments about. I'm totally new to this forum, so just correct me if I'm wrong on something.
Anyways, I'm thinking about using a "slave thread" for move generation, while a "master thread" on the HT core is accessing a hash table, and waiting for RAM. I don't know how to determine a good hash table size, but with this framework cpu cache usage while using the hash table would not be an issue, so there wouldn't really be a limit on the number of entries in the table.
Has anyone tried something similar? / Or has a quick idea why this would not work?
hyper threading and move generation
Moderators: hgm, Rebel, chrisw
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: hyper threading and move generation
I think you got this wrong. Hash table size (number of entries) does not affect how much time is spent on accessing the hash table, since it is always done in constant time O(1).tej wrote:Hi, I have an idea I would like comments about. I'm totally new to this forum, so just correct me if I'm wrong on something.
Anyways, I'm thinking about using a "slave thread" for move generation, while a "master thread" on the HT core is accessing a hash table, and waiting for RAM. I don't know how to determine a good hash table size, but with this framework cpu cache usage while using the hash table would not be an issue, so there wouldn't really be a limit on the number of entries in the table.
Has anyone tried something similar? / Or has a quick idea why this would not work?
HT technology uses idle superscalar processors by starting a new thread so you may as well start two search threads on one physical core. For chess programs this was not very much of a win because whenever you add a thread you add also search overhead. Unless the speed up from the additional hyper thread is enough to offset the overhead , the search would be slower than using single thread. You can find some threads in this forum about tests with/without HT.
Note that if there was any instruction level parallelism left the compiler probably would have taken care of it. So it is better to start a new thread for searching.
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: hyper threading and move generation
Hello Gabor,tej wrote:Hi, I have an idea I would like comments about. I'm totally new to this forum, so just correct me if I'm wrong on something.
Anyways, I'm thinking about using a "slave thread" for move generation,
while a "master thread" on the HT core is accessing a hash table, and waiting for RAM. I don't know how to determine a good hash table size, but with this framework cpu cache usage while using the hash table would not be an issue, so there wouldn't really be a limit on the number of entries in the table.
Has anyone tried something similar? / Or has a quick idea why this would not work?
first welcome to the forum !
if i understand correctly, you want to use a "move-generation" thread while another thread is just picking the generated moves from a hashtable then. Correct ?
If so, i wonder how do you handle the alphaBeta framework.
Will you generate millions of nodes which will never be accessed later ?
Or do you have a nice solution to avoid massive overhead ?
Parallel computing power should be used for the search imo.
you can find more information here
Michael
Re: hyper threading and move generation
Quick responses : )
Anyway, I wasn't interested in paralell search, let me try to explain it better:
A thread is doing the nice alpha-beta work, and stores stuff in the transposition table, and sometimes reads from the transposition table. Also, Probably not doing mush work while waiting for the RAM, which can be a long time if it doesn't hit L2 or L3 cache. So the other thread would wake up, and help out with some move generation in the L1 cache of the same core. When the very useful bits arrive from RAM, the main thread takes back control.
Anyway, I wasn't interested in paralell search, let me try to explain it better:
A thread is doing the nice alpha-beta work, and stores stuff in the transposition table, and sometimes reads from the transposition table. Also, Probably not doing mush work while waiting for the RAM, which can be a long time if it doesn't hit L2 or L3 cache. So the other thread would wake up, and help out with some move generation in the L1 cache of the same core. When the very useful bits arrive from RAM, the main thread takes back control.
-
- Posts: 27837
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: hyper threading and move generation
In principle this could work, provided you have an efficient way to start and stop the move-generation thread, and can prevent it from wasting too much CPU resoures when it has nothing to do. And you should be able to force the threads to run on the same physical core, so they share L1, or it would be a disaster.
One way would be to use a long-latency instruction like multiply in a spin-lock. Like:
This would fall through when the other thread sets lock=0. And it would not consume too many resoures, since the critical path of the loop contains a multiply, which takes many clocks, and the execution of the test-and-branch will have to wait for the result.
One way would be to use a long-latency instruction like multiply in a spin-lock. Like:
Code: Select all
value = lock;
while(value) value *= lock;
-
- Posts: 5569
- Joined: Tue Feb 28, 2012 11:56 pm
Re: hyper threading and move generation
An easier way to achieve more or less the same effect is to issue a prefetch instruction to load the TT entry into cache while the cpu continues with some useful work.tej wrote:A thread is doing the nice alpha-beta work, and stores stuff in the transposition table, and sometimes reads from the transposition table. Also, Probably not doing mush work while waiting for the RAM, which can be a long time if it doesn't hit L2 or L3 cache. So the other thread would wake up, and help out with some move generation in the L1 cache of the same core. When the very useful bits arrive from RAM, the main thread takes back control.
In this paper, Ulrich Drepper proposes to use a hyperthread to do the prefetching (see par. 6.3.4 starting on page 63), which seems to get quite close to what you are thinking of. I don't think it will be easy to get synchronisation overhead low enough to make it work, though.
-
- Posts: 5569
- Joined: Tue Feb 28, 2012 11:56 pm
Re: hyper threading and move generation
The monitor/mwait instructions were designed for efficiently implementing spinlocks when using hyperthreads. See e.g. this presentation.hgm wrote:In principle this could work, provided you have an efficient way to start and stop the move-generation thread, and can prevent it from wasting too much CPU resoures when it has nothing to do. And you should be able to force the threads to run on the same physical core, so they share L1, or it would be a disaster.
One way would be to use a long-latency instruction like multiply in a spin-lock. Like:
This would fall through when the other thread sets lock=0. And it would not consume too many resoures, since the critical path of the loop contains a multiply, which takes many clocks, and the execution of the test-and-branch will have to wait for the result.Code: Select all
value = lock; while(value) value *= lock;
Re: hyper threading and move generation
Yes, prefetch is much more simple then what I proposed, with no overhead I think.syzygy wrote: An easier way to achieve more or less the same effect is to issue a prefetch instruction to load the TT entry into cache while the cpu continues with some useful work.
So it wasn't a brand new idea after all : )syzygy wrote: In this paper, Ulrich Drepper proposes to use a hyperthread to do the prefetching (see par. 6.3.4 starting on page 63), which seems to get quite close to what you are thinking of. I don't think it will be easy to get synchronisation overhead low enough to make it work, though.