For example one of the ideas I have and I attempted superficially is to analyze better what to reduce more and what to reduce less. So the effect will be this, to spend more time on a level but to gain strength.mjlef wrote:As others have discovered, the methods used in Komodo lead to a strength gain at a given depth, on top of the shorter time to completion of that depth.
Some fun with Komodo 8
Moderators: hgm, Rebel, chrisw
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Some fun with Komodo 8
Daniel José - http://www.andscacs.com
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Some fun with Komodo 8
While I agree that more and more CPUS add less and less Elo, I don't think that 1-2 or 2-4 reaches that point unless one uses the shared tt way of doing a parallel search, which is certainly not very good. Which would certainly suggest that for 1-2-4-8 and even 16 would be pretty efficient in the traditional sense. Rather than saying "OK, I can't improve the efficiency so I will use some of the horsepower to try to make the search a bit more accurate with less pruning or reducing." But at the low end of processors, the speedups are typically so good that is not very effective. Or if it is, doing the same thing with just one processor would be a gain (i.e. scaling back the aggressiveness of pruning or reducing).mjlef wrote:Unfortunately, I will have to keep what Komodo does as a kind of "trade secret" for now, since it seems to give us an advantage over other programs. As others have discovered, the methods used in Komodo lead to a strength gain at a given depth, on top of the shorter time to completion of that depth. We changed part of it for Komodo 8 over the scheme Don came up with, and these changes seem to have improved efficiency and scaling.
I think you would agree that scaling using more processors using traditional schemes scales more and more poorly as the processors increases. At some point, doubling processors will reach a point where it only gives a few more elo. We want something better than that.
I make no claim that what we do is optimal, and we hope to make further improvements in the future. Better MP use is definitely something we need to work on.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Some fun with Komodo 8
You can do the SAME thing with just one CPU. That's a key point. The speedup for 1-2 and 2-4 for most programs is pretty good or pretty easy to make it good for a new program. If your "less selectivity" idea is good for 2-4 cpus, it should also work for just one.cdani wrote:For example one of the ideas I have and I attempted superficially is to analyze better what to reduce more and what to reduce less. So the effect will be this, to spend more time on a level but to gain strength.mjlef wrote:As others have discovered, the methods used in Komodo lead to a strength gain at a given depth, on top of the shorter time to completion of that depth.
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Some fun with Komodo 8
Yes, I undertood this. But maybe you can use some time of other cpu to, by not interfering some main search, prepare things for the main search to be more efficient. Of course again you can do this with only one thread for all the work, but maybe with a lot more interferences or context switches than if it's a separated thread. It's just an idea.bob wrote:You can do the SAME thing with just one CPU. That's a key point. The speedup for 1-2 and 2-4 for most programs is pretty good or pretty easy to make it good for a new program. If your "less selectivity" idea is good for 2-4 cpus, it should also work for just one.
Daniel José - http://www.andscacs.com
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Some fun with Komodo 8
Things like LMR can screw up fixed depth strength, without much fixed time strength changes. The question would be "what is depth" with such aggressive LMR use.bob wrote:You can do the SAME thing with just one CPU. That's a key point. The speedup for 1-2 and 2-4 for most programs is pretty good or pretty easy to make it good for a new program. If your "less selectivity" idea is good for 2-4 cpus, it should also work for just one.cdani wrote:For example one of the ideas I have and I attempted superficially is to analyze better what to reduce more and what to reduce less. So the effect will be this, to spend more time on a level but to gain strength.mjlef wrote:As others have discovered, the methods used in Komodo lead to a strength gain at a given depth, on top of the shorter time to completion of that depth.
-
- Posts: 1610
- Joined: Fri Mar 01, 2013 5:28 pm
- Location: USA
Re: Some fun with Komodo 8
bob wrote:cdani wrote:Bob,mjlef wrote: You can do the SAME thing with just one CPU. That's a key point. The speedup for 1-2 and 2-4 for most programs is pretty good or pretty easy to make it good for a new program. If your "less selectivity" idea is good for 2-4 cpus, it should also work for just one.
What are you getting for nps off your 12 core on ICC? 30M+ nps? I suppose also the Linux squeezes more out of it too.
"Without change, something sleeps inside us, and seldom awakens. The sleeper must awaken." (Dune - 1984)
Lonnie
Lonnie
-
- Posts: 1494
- Joined: Thu Mar 30, 2006 2:08 pm
Re: Some fun with Komodo 8
Bob,
Have you found the elo gain for 1-2-4-8-16 processors for Crafty?
I think that since cpu speed in GHz has not been increasing much recently, processor makers have turned to trying to get more out of one CPU cycle, and more cores per chip. So it would be nice to see the elo gain for each increase in the number of processors. There seems to be a pretty big difference in this between programs, with some posting here suggesting adding another processor at some point actually hurts elo. Although I do not know the specifics. With say a limited hash table size for storing best moves and cutoffs, and multiple cores trying to access the same shared memory, perhaps at some point the slowdown due to external memory access ends up not being productive. Also, it seems to me that memory speeds have kinda reached a certain level with newer machines just not having faster external memory. At least the on chip caches are getting bigger.
Mark
Have you found the elo gain for 1-2-4-8-16 processors for Crafty?
I think that since cpu speed in GHz has not been increasing much recently, processor makers have turned to trying to get more out of one CPU cycle, and more cores per chip. So it would be nice to see the elo gain for each increase in the number of processors. There seems to be a pretty big difference in this between programs, with some posting here suggesting adding another processor at some point actually hurts elo. Although I do not know the specifics. With say a limited hash table size for storing best moves and cutoffs, and multiple cores trying to access the same shared memory, perhaps at some point the slowdown due to external memory access ends up not being productive. Also, it seems to me that memory speeds have kinda reached a certain level with newer machines just not having faster external memory. At least the on chip caches are getting bigger.
Mark
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Some fun with Komodo 8
reflectionofpower wrote:bob wrote:Typical number is 40-44M. It does drop some in endgames. And this is a pretty old box as well...cdani wrote:Bob,mjlef wrote: You can do the SAME thing with just one CPU. That's a key point. The speedup for 1-2 and 2-4 for most programs is pretty good or pretty easy to make it good for a new program. If your "less selectivity" idea is good for 2-4 cpus, it should also work for just one.
What are you getting for nps off your 12 core on ICC? 30M+ nps? I suppose also the Linux squeezes more out of it too.
Processor is an ES5650 at 2.67ghz, dual chip 6 cores per chip. My newer iMac with a 4 core chip runs past 20M easily.
-
- Posts: 1610
- Joined: Fri Mar 01, 2013 5:28 pm
- Location: USA
Re: Some fun with Komodo 8
bob wrote:reflectionofpower wrote:Nicebob wrote:Typical number is 40-44M. It does drop some in endgames. And this is a pretty old box as well...cdani wrote:Bob,mjlef wrote: You can do the SAME thing with just one CPU. That's a key point. The speedup for 1-2 and 2-4 for most programs is pretty good or pretty easy to make it good for a new program. If your "less selectivity" idea is good for 2-4 cpus, it should also work for just one.
What are you getting for nps off your 12 core on ICC? 30M+ nps? I suppose also the Linux squeezes more out of it too.
Processor is an ES5650 at 2.67ghz, dual chip 6 cores per chip. My newer iMac with a 4 core chip runs past 20M easily.
"Without change, something sleeps inside us, and seldom awakens. The sleeper must awaken." (Dune - 1984)
Lonnie
Lonnie
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Some fun with Komodo 8
No disagreement with any of that. I have not seen any negative issues for my search through 16 cores. I've run on up to 64, but not recently, and never saw any case where 64 was actually worse than 32, so long as you don't look at an individual position where anything can happen.mjlef wrote:Bob,
Have you found the elo gain for 1-2-4-8-16 processors for Crafty?
I think that since cpu speed in GHz has not been increasing much recently, processor makers have turned to trying to get more out of one CPU cycle, and more cores per chip. So it would be nice to see the elo gain for each increase in the number of processors. There seems to be a pretty big difference in this between programs, with some posting here suggesting adding another processor at some point actually hurts elo. Although I do not know the specifics. With say a limited hash table size for storing best moves and cutoffs, and multiple cores trying to access the same shared memory, perhaps at some point the slowdown due to external memory access ends up not being productive. Also, it seems to me that memory speeds have kinda reached a certain level with newer machines just not having faster external memory. At least the on chip caches are getting bigger.
Mark
I have done some cluster testing in the past, not specifically to measure Elo improvement but more commonly just to stress-test the parallel search. I've been puzzling over an imperfect NPS scaling on this box for months, no luck yet. I can run 12 copies of Crafty at 5M NPS per copy, but running one copy at 12 threads hits around 48M or so. A missing 12M (or 20% of the total processing power). I'm going to find it. 99% of the time this is a cache issue, but so far nothing has helped.