performance of copy-make

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: performance of copy-make

Post by bob »

Don wrote:
Rein Halbersma wrote:
bob wrote:I don't see how it simplifies the split point code at all. All it does is get rid of Unmake() for a gain, but adds the copy during the make process for a loss. But affecting the split point code? I don't see it. You certainly have to replicate everything for each thread...
This remark was meant in the context of a Cilk type of parallel search where the scheduling is done behind the scenes. Having a copy-make approach simplifies things because you don't have to write out any split-point code at all (because there is no shared state). For programs having an explicit scheduler with pthreads and split points, or what have you, make/unmake might not make a difference in code simplicity compared to copy-make. But make/undo is much harder to adapt to Cilk type of parallelism.
I would think either method would be relatively simple whether using Cilk or not.

Instead of just talking about it, I'm going to bite the bullet and make a clone of Komodo (hope I don't get into any trouble for this) that uses make/unmake. I will report the results - probably tomorrow.

Of course I cannot comment on MP performance yet but that is coming to Komodo pretty soon.
I don't think it will be that easy unless you don't try to go "fully functional". I found lots of places where I could copy the state, and mangle it badly without caring. Whereas with make/unmake, you have to be much more careful.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: performance of copy-make

Post by Don »

bob wrote:
Don wrote:
Rein Halbersma wrote:
bob wrote:I don't see how it simplifies the split point code at all. All it does is get rid of Unmake() for a gain, but adds the copy during the make process for a loss. But affecting the split point code? I don't see it. You certainly have to replicate everything for each thread...
This remark was meant in the context of a Cilk type of parallel search where the scheduling is done behind the scenes. Having a copy-make approach simplifies things because you don't have to write out any split-point code at all (because there is no shared state). For programs having an explicit scheduler with pthreads and split points, or what have you, make/unmake might not make a difference in code simplicity compared to copy-make. But make/undo is much harder to adapt to Cilk type of parallelism.
I would think either method would be relatively simple whether using Cilk or not.

Instead of just talking about it, I'm going to bite the bullet and make a clone of Komodo (hope I don't get into any trouble for this) that uses make/unmake. I will report the results - probably tomorrow.

Of course I cannot comment on MP performance yet but that is coming to Komodo pretty soon.
I don't think it will be that easy unless you don't try to go "fully functional". I found lots of places where I could copy the state, and mangle it badly without caring. Whereas with make/unmake, you have to be much more careful.
I think you are right. I started the effort and I now realize it may not be worth it. make does become more complicated, though not unduly so. Unmake becomes as expensive as make (neither of which is all that expensive) and there are some other issues too, such as the fact that Komodo needs to look backward for more than just Zobrist keys. The only issue of concern to me is the cache issue, I don't believe there is appreciable overhead for the copy itself. I'm not sure I want to give up this much simplification.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: performance of copy-make

Post by Zach Wegner »

bob wrote:
Zach Wegner wrote:Copy/make can be beneficial in parallel search if you do DTS...
I designed DTS. :) And I didn't use copy/make in Cray Blitz. If you look at the source available on the net, it is clearly make/unmake, just like Crafty of today. How, exactly, does it help? Each thread has to have its own "state". Whether that state is copied from a local state that is updated with make/unmake, or copied from a state that is updated via copy/make, I don't see how it matters at all...

Now an iterated search is certainly needed, otherwise DTS will become a nightmare of details to deal with. But that's outside copy/make vs make/unmake.
But you're not copying from "local state", you're copying from some historical value of the local state. So you either have to unmake moves back to the place you're splitting from, or copy the move history from the root to that point (which is what ZCT and Zappa/Rondo do). The only problem is the move history for rep detection. If you had some linked list type scheme where threads could share move history, splitting would be a very cheap operation, involving no making/unmaking of moves. Of course, such a history scheme would probably be too slow to be worth it.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: performance of copy-make

Post by Don »

Zach Wegner wrote:
bob wrote:
Zach Wegner wrote:Copy/make can be beneficial in parallel search if you do DTS...
I designed DTS. :) And I didn't use copy/make in Cray Blitz. If you look at the source available on the net, it is clearly make/unmake, just like Crafty of today. How, exactly, does it help? Each thread has to have its own "state". Whether that state is copied from a local state that is updated with make/unmake, or copied from a state that is updated via copy/make, I don't see how it matters at all...

Now an iterated search is certainly needed, otherwise DTS will become a nightmare of details to deal with. But that's outside copy/make vs make/unmake.
But you're not copying from "local state", you're copying from some historical value of the local state. So you either have to unmake moves back to the place you're splitting from, or copy the move history from the root to that point (which is what ZCT and Zappa/Rondo do). The only problem is the move history for rep detection. If you had some linked list type scheme where threads could share move history, splitting would be a very cheap operation, involving no making/unmaking of moves. Of course, such a history scheme would probably be too slow to be worth it.
Komodo does not have to deal with any of this, since there is a pointer to the previous state and no lists of keys or moves are kept other than this linked list of positions.

If I ever want to make a version of Komodo that works across machines with MPI, this might not be ideal - but presumably any program has to deal with pushing history over.
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: performance of copy-make

Post by rbarreira »

bob wrote:
Zach Wegner wrote:Copy/make can be beneficial in parallel search if you do DTS...
I designed DTS. :) And I didn't use copy/make in Cray Blitz. If you look at the source available on the net, it is clearly make/unmake, just like Crafty of today. How, exactly, does it help? Each thread has to have its own "state". Whether that state is copied from a local state that is updated with make/unmake, or copied from a state that is updated via copy/make, I don't see how it matters at all...

Now an iterated search is certainly needed, otherwise DTS will become a nightmare of details to deal with. But that's outside copy/make vs make/unmake.
So when did you use copy/make if not on Cray Blitz? Did you have Crafty running on the Cray? I thought it was a PC program right from the start.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: performance of copy-make

Post by Zach Wegner »

Don wrote:Komodo does not have to deal with any of this, since there is a pointer to the previous state and no lists of keys or moves are kept other than this linked list of positions.
Yes, that's just what I mean. Looping back over old positions for rep detection is quite elegant, but it would be a bit slower, since you have a big chain of dependent memory accesses. BTW, you should try putting the hash key and the parent pointer in the same cache line, it might give you a 0.1% speedup (if you're not already doing that). Ah, microoptimization... in my work, making the software 0.1% faster is actually a pretty decent accomplishment.

But anyways, you should try DTS, you will have nice cheap splits. It might drive you insane though.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: performance of copy-make

Post by bob »

Zach Wegner wrote:
bob wrote:
Zach Wegner wrote:Copy/make can be beneficial in parallel search if you do DTS...
I designed DTS. :) And I didn't use copy/make in Cray Blitz. If you look at the source available on the net, it is clearly make/unmake, just like Crafty of today. How, exactly, does it help? Each thread has to have its own "state". Whether that state is copied from a local state that is updated with make/unmake, or copied from a state that is updated via copy/make, I don't see how it matters at all...

Now an iterated search is certainly needed, otherwise DTS will become a nightmare of details to deal with. But that's outside copy/make vs make/unmake.
But you're not copying from "local state", you're copying from some historical value of the local state. So you either have to unmake moves back to the place you're splitting from, or copy the move history from the root to that point (which is what ZCT and Zappa/Rondo do). The only problem is the move history for rep detection. If you had some linked list type scheme where threads could share move history, splitting would be a very cheap operation, involving no making/unmaking of moves. Of course, such a history scheme would probably be too slow to be worth it.
A series of unmakes to back up to a prior desirable split point is not very expensive when you count the number of splits done in a 3 minute move, particularly if you split at the root also...

But in any case, whether you use copy/make or make/unmake, I can't imagine anyone not copying the repetition list. But in any case, having done it both ways, nothing stands out to me as a plus for either over the other from the point of splitting the tree
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: performance of copy-make

Post by bob »

rbarreira wrote:
bob wrote:
Zach Wegner wrote:Copy/make can be beneficial in parallel search if you do DTS...
I designed DTS. :) And I didn't use copy/make in Cray Blitz. If you look at the source available on the net, it is clearly make/unmake, just like Crafty of today. How, exactly, does it help? Each thread has to have its own "state". Whether that state is copied from a local state that is updated with make/unmake, or copied from a state that is updated via copy/make, I don't see how it matters at all...

Now an iterated search is certainly needed, otherwise DTS will become a nightmare of details to deal with. But that's outside copy/make vs make/unmake.
So when did you use copy/make if not on Cray Blitz? Did you have Crafty running on the Cray? I thought it was a PC program right from the start.
Early versions of Crafty were copy / make. Version 9.16 was the first to switch to make/unmake. From main.c:

* 9.16 Significant changes to the way MakeMove() operates. Rather than *
* copying the large position[ply] structure around, the piece *
* location bitmaps and the array of which piece is on which square *
* are now simple global variables. This means that there is now an *
* UnmakeMove() function that must be used to restore these after a *
* a move has been made. The benefit is that we avoid copying 10 *
* bitmaps for piece locations when typically only one is changed, *
* Ditto for the which piece is on which square array which is 64 *
* bytes long. The result is much less memory traffic, with the *
* probability that the bitmaps now stay in cache all the time.

That was a _long_ time ago, probably in the first year of development. I think version 13.something played in the 1995 WMCCC, but I am not certain. Versions came pretty fast for the first group. Version 15.0 was the first parallel search version and was somewhere around 1997.

For your last question, it was actually designed to run on a Cray. Which was the reason the bits were numbered left to right MSB=0, LSB=63, to match the way they were numbered on the Cray. Still runs just fine on a Cray, but I haven't fooled with those boxes in years.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: performance of copy-make

Post by bob »

Don wrote:
bob wrote:
Don wrote:
Rein Halbersma wrote:
bob wrote:I don't see how it simplifies the split point code at all. All it does is get rid of Unmake() for a gain, but adds the copy during the make process for a loss. But affecting the split point code? I don't see it. You certainly have to replicate everything for each thread...
This remark was meant in the context of a Cilk type of parallel search where the scheduling is done behind the scenes. Having a copy-make approach simplifies things because you don't have to write out any split-point code at all (because there is no shared state). For programs having an explicit scheduler with pthreads and split points, or what have you, make/unmake might not make a difference in code simplicity compared to copy-make. But make/undo is much harder to adapt to Cilk type of parallelism.
I would think either method would be relatively simple whether using Cilk or not.

Instead of just talking about it, I'm going to bite the bullet and make a clone of Komodo (hope I don't get into any trouble for this) that uses make/unmake. I will report the results - probably tomorrow.

Of course I cannot comment on MP performance yet but that is coming to Komodo pretty soon.
I don't think it will be that easy unless you don't try to go "fully functional". I found lots of places where I could copy the state, and mangle it badly without caring. Whereas with make/unmake, you have to be much more careful.
I think you are right. I started the effort and I now realize it may not be worth it. make does become more complicated, though not unduly so. Unmake becomes as expensive as make (neither of which is all that expensive) and there are some other issues too, such as the fact that Komodo needs to look backward for more than just Zobrist keys. The only issue of concern to me is the cache issue, I don't believe there is appreciable overhead for the copy itself. I'm not sure I want to give up this much simplification.
Going from copy/make to make/unmake took a good while. It was amazing how many places I elected to just make a move, then throw everything away. Say when reading in PGN, or creating the book, or doing threat tests, and such. Now _every_ Make() has to be paired with a corresponding Unmake() or things crash...