question about speedup when starting a test position
Moderators: hgm, Rebel, chrisw
-
- Posts: 27
- Joined: Fri Dec 11, 2009 10:23 pm
Re: question about speedup when starting a test position
The branch prediction also needs warm-up, besides caches and hash. This is a problem with processor simulation.
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: question about speedup when starting a test position
What setting do you use for minimum split depth? I seem to remember you choosing 14. If so, then it seems that no parallelization could even begin until the search reaches that iteration, if I understand correctly what that parameter means. As an experiment, try setting that parameter to something like 7.kgburcham wrote:
Its weird how the cpu can be at 100% yet the thread is at idle.
So it seems the answer is that the demand is not there for 12 threads when the tree is small.
So it also seems since it takes several seconds for 12 threads to fully load then some kns is lost in fast games.
kgburcham
-
- Posts: 2016
- Joined: Sun Feb 17, 2008 4:19 pm
Re: question about speedup when starting a test position
several 12 thread systems worked with Robert before the 1st release. One thing that was worked on was optimum minimum split depth setting using a benchtest Robert came up with. It was determined that 12 was best setting. Maybe for fast game this could be less, not sure. Usually I set this to 12 split depth. Thanks for the reply Louis but I am not having any issues with the setting, just curious why the speedup increases with time. Good info in some of these posts.zullil wrote:What setting do you use for minimum split depth? I seem to remember you choosing 14. If so, then it seems that no parallelization could even begin until the search reaches that iteration, if I understand correctly what that parameter means. As an experiment, try setting that parameter to something like 7.kgburcham wrote:
Its weird how the cpu can be at 100% yet the thread is at idle.
So it seems the answer is that the demand is not there for 12 threads when the tree is small.
So it also seems since it takes several seconds for 12 threads to fully load then some kns is lost in fast games.
kgburcham
kgburcham
-
- Posts: 1471
- Joined: Tue Mar 16, 2010 12:00 am
Re: question about speedup when starting a test position
The "idle" threads are running a small loop, scanning continuously whether other threads have submitted a position to analyze. They are 100% busy doing nothing useful.kgburcham wrote:Its weird how the cpu can be at 100% yet the thread is at idle.
Robert
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: question about speedup when starting a test position
I'm trying to suggest that the setting itself may partly explain the speedup. No splitting would be occurring at all before the depth 12 iteration, and then the amount of splitting would grow with each subsequent iteration (or so I think; perhaps some expert could explain this correctly).kgburcham wrote: Usually I set this to 12 split depth. Thanks for the reply Louis but I am not having any issues with the setting, just curious why the speedup increases with time.
kgburcham
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: question about speedup when starting a test position
There are dozens of issues.kgburcham wrote:when analyzing a test position why do the kns speedup with time?
what is going on in the program to cause this?
(1) cache fills up over time.
(2) hash table starts off useless, yet provides critical data (Crafty tries hash move before generating moves, if that produces a cutoff it is much faster than generating moves and doing a search).
(3) hash table has to be "faulted in" for the first access to each page. That can take time until every page has been touched at least once.
(4) parallel search works better as the search goes deeper, which makes the NPS climb.
-
- Posts: 1471
- Joined: Tue Mar 16, 2010 12:00 am
Re: question about speedup when starting a test position
By the way, you will see a direct relation between the amount of "idle" and the measured node speed.Houdini wrote:The main reason is that the alpha-beta algorithm is in essence a serial algorithm, which our SMP implementations try to transform into a parallel operation. At low search depths not all threads have something useful to do. The CPU is at 100% but the threads are actually idling and waiting for other threads to submit positions to be analyzed. The more threads you have, the more pronounced the effect. With 2 threads the full speed is nearly instantly there, with 8 threads you need to wait several seconds before most threads actually do something useful.
The Houdini 2.0 autotune command shows you the number of "idle" loops that threads have executed, waiting for something useful to do.
If one makes a small table of the above-mentioned results for each elapsed second:
Code: Select all
Time Nodes Idle
msec kN/s M
=======================
1000 4898 267
1999 6324 100
2999 7088 37
3998 7277 15
4998 6940 36
5997 7310 12
7000 7184 35
8003 7245 50
9004 7254 38
10003 6892 75
15005 7491 19
20010 7699 0
25013 7992 0
29014 7918 0
=======================
A strong indication that this purely algorithmic effect is dominant for the node speed reduction.
Robert