Peculiarity of Komodo 5.1MP

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Peculiarity of Komodo 5.1MP

Post by Don »

Bob (and others),

How you approach a problem is strongly related to how you DEFINE a problem or task. The goal of MP is to make the program play as well as possible using more cores, to take advantage of the hardware as fully as possible and the only measure that really matters (at least to us) is playing strength. We ALL wish that our program played as if it they were 4x faster when playing on 4 cores - but nobody gets that kind of efficiency.

Some have implied that we search wider because we don't get quite the same increase in depth and that this is a horrible thing. The intuition is that if that if it works you could do it on 1 thread too. However I do not think that follows. The problem is that 4 threads is just not 4x better than 1 thread no matter how good your MP implementation, so it always comes down to how to best utilize what you have. Anything goes in my opinion if it gets more out of extra cores in terms of ELO.

It is well known that programs do not scale well beyond a few processors. Going from 128 to 256 processors gives you very little. One of the things Larry and I want to do is to explore how to better exploit more and more cores and hopefully what we learn will translate to better performance on few cores. It is unlikely to be minor refinements to what we already do.

I'm not breaking new ground as this has been studied for decades and lots of ideas have been tried - I'm not pretending otherwise. But we will look at new original ideas we come up with and revisit old-fashioned ideas that have been tried in the past. A lot of the old ideas which have been rejected have later come back to life, so there is no idea that is off-limits.

As far as whether searching "wider" is a bad thing and isn't real parallelism, I disagree with that. If 4 cores were 4 times better than one I probably would agree but since it isn't, we have to determine where the effort is best spent. If you have 11 people in the room and 10 are doctors and the one who isn't has a heart attack, what is the best use of the doctors time? Not all 10 doctors can give CPR even though normally that would be very best use of their talent, but surely one of them will be calling for an ambulance while another will be doing CPR and the rest will be doing whatever they can to assist, even in little ways.

So if you are getting very little benefit out of additional processors you need to figure out if there is something else they can do to be more useful. Perhaps one thing is to deal with the lines that are severely reduced in the single core program?
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Peculiarity of Komodo 5.1MP

Post by lkaufman »

Uri Blass wrote:
Daniel Shawul wrote:This is misleading because komodo probably compensates for poor parallel implementation by searching wider, otherwise it shouldn't be getting any elos for fixed depth test. You should do the test with time and see how much each gain.
"poor parallel implementation?"

The target of parallel implementation is not to get bigger depth but to play better.

If the implementation is good in helping komodo to play better than I think that it is wrong to call it poor.

Maybe it is the opposite and komodo compensates for poor pruning of lines that it should not prune by parallel implementation that prevent it to prune good moves in the relevant lines.
I think Uri got this exactly right. We believe (based on the results and the fact that our MP is really new and still not refined) that it is an efficient use of multi cores to patch up the pruning done on one core. The comments by some that this makes no sense are only valid for a traditional full-width engine, not for one like ours. So time to depth is not relevant for Komodo unless you add the elo equivalent of it to the elo gained at fixed depth. But it is simpler and more accurate just to play timed games, which so far indicate that Komodo's MP is not inferior to other top engines.
In response to another question, I always test Houdini, Stockfish, etc. with default settings for things like split point. Is this wrong? I know theoretically you can try to improve on that for your given machine by certain tests, but I'm not interested in spending time on this, and I doubt that most users or testers do this.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Peculiarity of Komodo 5.1MP

Post by Daniel Shawul »

Some have implied that we search wider because we don't get quite the same increase in depth and that this is a horrible thing. The intuition is that if that if it works you could do it on 1 thread too. However I do not think that follows. The problem is that 4 threads is just not 4x better than 1 thread no matter how good your MP implementation, so it always comes down to how to best utilize what you have. Anything goes in my opinion if it gets more out of extra cores in terms of ELO.
It doesn't matter if 4 threads is not as efficient as as 4x time search. The following statement is provable.
"If your sequential search is as good as it can be, then your parallel implementation that searches wider than sequential search is poorly done".
I will give it a shot as follows:
Assume sequential search is at its best. Give it 3.6x more time (say same as 4 threads search) but change the algorithm so that it can widen its search exactly the same way as the parallel search does. If indeed searching wider is better for parallel search as claimed, then the wide sequential search should show similar improvement over its deeper search counterpart. So we improved the sequential search that we assumed un-improvable, contradiction! The question of difficulty of implementation is irrelevant.
Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: Peculiarity of Komodo 5.1MP

Post by Modern Times »

lkaufman wrote: In response to another question, I always test Houdini, Stockfish, etc. with default settings for things like split point. Is this wrong? I know theoretically you can try to improve on that for your given machine by certain tests, but I'm not interested in spending time on this, and I doubt that most users or testers do this.
Many testers are limited to 4 cores, and for that the defaults are probably optimal. But people who use more cores probably should experiment with different values. When running on 6 cores in HG's monthly tournaments, I use a different value now for Critter for example. Maybe it helps, maybe not. Real benefits from changing the defaults probably aren't seen until you hit 8 or more cores is my guess.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Peculiarity of Komodo 5.1MP

Post by lkaufman »

Daniel Shawul wrote:
Some have implied that we search wider because we don't get quite the same increase in depth and that this is a horrible thing. The intuition is that if that if it works you could do it on 1 thread too. However I do not think that follows. The problem is that 4 threads is just not 4x better than 1 thread no matter how good your MP implementation, so it always comes down to how to best utilize what you have. Anything goes in my opinion if it gets more out of extra cores in terms of ELO.
It doesn't matter if 4 threads is not as efficient as as 4x time search. The following statement is provable.
"If your sequential search is as good as it can be, then your parallel implementation that searches wider than sequential search is poorly done".
I will give it a shot as follows:
Assume sequential search is at its best. Give it 3.6x more time (say same as 4 threads search) but change the algorithm so that it can widen its search exactly the same way as the parallel search does. If indeed searching wider is better for parallel search as claimed, then the wide sequential search should show similar improvement over its deeper search counterpart. So we improved the sequential search that we assumed un-improvable, contradiction! The question of difficulty of implementation is irrelevant.
I think this is wrong, though it's difficult to prove either way. I think the point is that widening the search in parallel may allow for better use of time that would go to waste in a non-widened parallel search. In other words, maybe the speedup is greater for parallelizing a wider search. Why should you expect the parallel speedup to be independent of the degree of pruning?
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Peculiarity of Komodo 5.1MP

Post by lkaufman »

Modern Times wrote:
lkaufman wrote: In response to another question, I always test Houdini, Stockfish, etc. with default settings for things like split point. Is this wrong? I know theoretically you can try to improve on that for your given machine by certain tests, but I'm not interested in spending time on this, and I doubt that most users or testers do this.
Many testers are limited to 4 cores, and for that the defaults are probably optimal. But people who use more cores probably should experiment with different values. When running on 6 cores in HG's monthly tournaments, I use a different value now for Critter for example. Maybe it helps, maybe not. Real benefits from changing the defaults probably aren't seen until you hit 8 or more cores is my guess.
Do any engines have defaults that are automatically changed based on the number of cores specified? If not, are there any official recommendations by the authors to change the default based on a higher number of cores? It is possible that my 12 core tests weren't quite fair if the defaults were inappropriate for 12 cores, but unless the authors have so stated I think I did the right thing, as if I changed these values on my own I would be criticized for that.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Peculiarity of Komodo 5.1MP

Post by syzygy »

Daniel Shawul wrote:Assume sequential search is at its best. Give it 3.6x more time (say same as 4 threads search) but change the algorithm so that it can widen its search exactly the same way as the parallel search does.
Prove that you can "change the algorithm" without loss of speed...
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Peculiarity of Komodo 5.1MP

Post by Daniel Shawul »

syzygy wrote:
Daniel Shawul wrote:Assume sequential search is at its best. Give it 3.6x more time (say same as 4 threads search) but change the algorithm so that it can widen its search exactly the same way as the parallel search does.
Prove that you can "change the algorithm" without loss of speed...
Like I said in my post, "The question of difficulty of implementation is irrelevant."
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Peculiarity of Komodo 5.1MP

Post by syzygy »

Daniel Shawul wrote:
syzygy wrote:
Daniel Shawul wrote:Assume sequential search is at its best. Give it 3.6x more time (say same as 4 threads search) but change the algorithm so that it can widen its search exactly the same way as the parallel search does.
Prove that you can "change the algorithm" without loss of speed...
Like I said in my post, "The question of difficulty of implementation is irrelevant."
I said "prove that you can". This should be understood as "prove that it is possible". This is not evident, so there is a gap in your proof.

Btw, even if you could prove it, it would only show that Komodo's single-core search is suboptimal. It does not prove that Komodo's SMP implementation is suboptimal.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Peculiarity of Komodo 5.1MP

Post by Daniel Shawul »

syzygy wrote:
Daniel Shawul wrote:
syzygy wrote:
Daniel Shawul wrote:Assume sequential search is at its best. Give it 3.6x more time (say same as 4 threads search) but change the algorithm so that it can widen its search exactly the same way as the parallel search does.
Prove that you can "change the algorithm" without loss of speed...
Like I said in my post, "The question of difficulty of implementation is irrelevant."
I said "prove that you can". This should be understood as "prove that it is possible".
I don't need to because my point is you can't have an optimal sequential search and a wide and optimal parallel search at the same time! You choose to go wider when you can't correctly implement going deeper with parallel search. You can't construct same argument for the deeper parallel search, try it.