I don't think it will be that easy unless you don't try to go "fully functional". I found lots of places where I could copy the state, and mangle it badly without caring. Whereas with make/unmake, you have to be much more careful.Don wrote:I would think either method would be relatively simple whether using Cilk or not.Rein Halbersma wrote:This remark was meant in the context of a Cilk type of parallel search where the scheduling is done behind the scenes. Having a copy-make approach simplifies things because you don't have to write out any split-point code at all (because there is no shared state). For programs having an explicit scheduler with pthreads and split points, or what have you, make/unmake might not make a difference in code simplicity compared to copy-make. But make/undo is much harder to adapt to Cilk type of parallelism.bob wrote:I don't see how it simplifies the split point code at all. All it does is get rid of Unmake() for a gain, but adds the copy during the make process for a loss. But affecting the split point code? I don't see it. You certainly have to replicate everything for each thread...
Instead of just talking about it, I'm going to bite the bullet and make a clone of Komodo (hope I don't get into any trouble for this) that uses make/unmake. I will report the results - probably tomorrow.
Of course I cannot comment on MP performance yet but that is coming to Komodo pretty soon.
performance of copy-make
Moderators: hgm, Rebel, chrisw
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: performance of copy-make
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: performance of copy-make
I think you are right. I started the effort and I now realize it may not be worth it. make does become more complicated, though not unduly so. Unmake becomes as expensive as make (neither of which is all that expensive) and there are some other issues too, such as the fact that Komodo needs to look backward for more than just Zobrist keys. The only issue of concern to me is the cache issue, I don't believe there is appreciable overhead for the copy itself. I'm not sure I want to give up this much simplification.bob wrote:I don't think it will be that easy unless you don't try to go "fully functional". I found lots of places where I could copy the state, and mangle it badly without caring. Whereas with make/unmake, you have to be much more careful.Don wrote:I would think either method would be relatively simple whether using Cilk or not.Rein Halbersma wrote:This remark was meant in the context of a Cilk type of parallel search where the scheduling is done behind the scenes. Having a copy-make approach simplifies things because you don't have to write out any split-point code at all (because there is no shared state). For programs having an explicit scheduler with pthreads and split points, or what have you, make/unmake might not make a difference in code simplicity compared to copy-make. But make/undo is much harder to adapt to Cilk type of parallelism.bob wrote:I don't see how it simplifies the split point code at all. All it does is get rid of Unmake() for a gain, but adds the copy during the make process for a loss. But affecting the split point code? I don't see it. You certainly have to replicate everything for each thread...
Instead of just talking about it, I'm going to bite the bullet and make a clone of Komodo (hope I don't get into any trouble for this) that uses make/unmake. I will report the results - probably tomorrow.
Of course I cannot comment on MP performance yet but that is coming to Komodo pretty soon.
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: performance of copy-make
But you're not copying from "local state", you're copying from some historical value of the local state. So you either have to unmake moves back to the place you're splitting from, or copy the move history from the root to that point (which is what ZCT and Zappa/Rondo do). The only problem is the move history for rep detection. If you had some linked list type scheme where threads could share move history, splitting would be a very cheap operation, involving no making/unmaking of moves. Of course, such a history scheme would probably be too slow to be worth it.bob wrote:I designed DTS. And I didn't use copy/make in Cray Blitz. If you look at the source available on the net, it is clearly make/unmake, just like Crafty of today. How, exactly, does it help? Each thread has to have its own "state". Whether that state is copied from a local state that is updated with make/unmake, or copied from a state that is updated via copy/make, I don't see how it matters at all...Zach Wegner wrote:Copy/make can be beneficial in parallel search if you do DTS...
Now an iterated search is certainly needed, otherwise DTS will become a nightmare of details to deal with. But that's outside copy/make vs make/unmake.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: performance of copy-make
Komodo does not have to deal with any of this, since there is a pointer to the previous state and no lists of keys or moves are kept other than this linked list of positions.Zach Wegner wrote:But you're not copying from "local state", you're copying from some historical value of the local state. So you either have to unmake moves back to the place you're splitting from, or copy the move history from the root to that point (which is what ZCT and Zappa/Rondo do). The only problem is the move history for rep detection. If you had some linked list type scheme where threads could share move history, splitting would be a very cheap operation, involving no making/unmaking of moves. Of course, such a history scheme would probably be too slow to be worth it.bob wrote:I designed DTS. And I didn't use copy/make in Cray Blitz. If you look at the source available on the net, it is clearly make/unmake, just like Crafty of today. How, exactly, does it help? Each thread has to have its own "state". Whether that state is copied from a local state that is updated with make/unmake, or copied from a state that is updated via copy/make, I don't see how it matters at all...Zach Wegner wrote:Copy/make can be beneficial in parallel search if you do DTS...
Now an iterated search is certainly needed, otherwise DTS will become a nightmare of details to deal with. But that's outside copy/make vs make/unmake.
If I ever want to make a version of Komodo that works across machines with MPI, this might not be ideal - but presumably any program has to deal with pushing history over.
-
- Posts: 900
- Joined: Tue Apr 27, 2010 3:48 pm
Re: performance of copy-make
So when did you use copy/make if not on Cray Blitz? Did you have Crafty running on the Cray? I thought it was a PC program right from the start.bob wrote:I designed DTS. And I didn't use copy/make in Cray Blitz. If you look at the source available on the net, it is clearly make/unmake, just like Crafty of today. How, exactly, does it help? Each thread has to have its own "state". Whether that state is copied from a local state that is updated with make/unmake, or copied from a state that is updated via copy/make, I don't see how it matters at all...Zach Wegner wrote:Copy/make can be beneficial in parallel search if you do DTS...
Now an iterated search is certainly needed, otherwise DTS will become a nightmare of details to deal with. But that's outside copy/make vs make/unmake.
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: performance of copy-make
Yes, that's just what I mean. Looping back over old positions for rep detection is quite elegant, but it would be a bit slower, since you have a big chain of dependent memory accesses. BTW, you should try putting the hash key and the parent pointer in the same cache line, it might give you a 0.1% speedup (if you're not already doing that). Ah, microoptimization... in my work, making the software 0.1% faster is actually a pretty decent accomplishment.Don wrote:Komodo does not have to deal with any of this, since there is a pointer to the previous state and no lists of keys or moves are kept other than this linked list of positions.
But anyways, you should try DTS, you will have nice cheap splits. It might drive you insane though.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: performance of copy-make
A series of unmakes to back up to a prior desirable split point is not very expensive when you count the number of splits done in a 3 minute move, particularly if you split at the root also...Zach Wegner wrote:But you're not copying from "local state", you're copying from some historical value of the local state. So you either have to unmake moves back to the place you're splitting from, or copy the move history from the root to that point (which is what ZCT and Zappa/Rondo do). The only problem is the move history for rep detection. If you had some linked list type scheme where threads could share move history, splitting would be a very cheap operation, involving no making/unmaking of moves. Of course, such a history scheme would probably be too slow to be worth it.bob wrote:I designed DTS. And I didn't use copy/make in Cray Blitz. If you look at the source available on the net, it is clearly make/unmake, just like Crafty of today. How, exactly, does it help? Each thread has to have its own "state". Whether that state is copied from a local state that is updated with make/unmake, or copied from a state that is updated via copy/make, I don't see how it matters at all...Zach Wegner wrote:Copy/make can be beneficial in parallel search if you do DTS...
Now an iterated search is certainly needed, otherwise DTS will become a nightmare of details to deal with. But that's outside copy/make vs make/unmake.
But in any case, whether you use copy/make or make/unmake, I can't imagine anyone not copying the repetition list. But in any case, having done it both ways, nothing stands out to me as a plus for either over the other from the point of splitting the tree
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: performance of copy-make
Early versions of Crafty were copy / make. Version 9.16 was the first to switch to make/unmake. From main.c:rbarreira wrote:So when did you use copy/make if not on Cray Blitz? Did you have Crafty running on the Cray? I thought it was a PC program right from the start.bob wrote:I designed DTS. And I didn't use copy/make in Cray Blitz. If you look at the source available on the net, it is clearly make/unmake, just like Crafty of today. How, exactly, does it help? Each thread has to have its own "state". Whether that state is copied from a local state that is updated with make/unmake, or copied from a state that is updated via copy/make, I don't see how it matters at all...Zach Wegner wrote:Copy/make can be beneficial in parallel search if you do DTS...
Now an iterated search is certainly needed, otherwise DTS will become a nightmare of details to deal with. But that's outside copy/make vs make/unmake.
* 9.16 Significant changes to the way MakeMove() operates. Rather than *
* copying the large position[ply] structure around, the piece *
* location bitmaps and the array of which piece is on which square *
* are now simple global variables. This means that there is now an *
* UnmakeMove() function that must be used to restore these after a *
* a move has been made. The benefit is that we avoid copying 10 *
* bitmaps for piece locations when typically only one is changed, *
* Ditto for the which piece is on which square array which is 64 *
* bytes long. The result is much less memory traffic, with the *
* probability that the bitmaps now stay in cache all the time.
That was a _long_ time ago, probably in the first year of development. I think version 13.something played in the 1995 WMCCC, but I am not certain. Versions came pretty fast for the first group. Version 15.0 was the first parallel search version and was somewhere around 1997.
For your last question, it was actually designed to run on a Cray. Which was the reason the bits were numbered left to right MSB=0, LSB=63, to match the way they were numbered on the Cray. Still runs just fine on a Cray, but I haven't fooled with those boxes in years.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: performance of copy-make
Going from copy/make to make/unmake took a good while. It was amazing how many places I elected to just make a move, then throw everything away. Say when reading in PGN, or creating the book, or doing threat tests, and such. Now _every_ Make() has to be paired with a corresponding Unmake() or things crash...Don wrote:I think you are right. I started the effort and I now realize it may not be worth it. make does become more complicated, though not unduly so. Unmake becomes as expensive as make (neither of which is all that expensive) and there are some other issues too, such as the fact that Komodo needs to look backward for more than just Zobrist keys. The only issue of concern to me is the cache issue, I don't believe there is appreciable overhead for the copy itself. I'm not sure I want to give up this much simplification.bob wrote:I don't think it will be that easy unless you don't try to go "fully functional". I found lots of places where I could copy the state, and mangle it badly without caring. Whereas with make/unmake, you have to be much more careful.Don wrote:I would think either method would be relatively simple whether using Cilk or not.Rein Halbersma wrote:This remark was meant in the context of a Cilk type of parallel search where the scheduling is done behind the scenes. Having a copy-make approach simplifies things because you don't have to write out any split-point code at all (because there is no shared state). For programs having an explicit scheduler with pthreads and split points, or what have you, make/unmake might not make a difference in code simplicity compared to copy-make. But make/undo is much harder to adapt to Cilk type of parallelism.bob wrote:I don't see how it simplifies the split point code at all. All it does is get rid of Unmake() for a gain, but adds the copy during the make process for a loss. But affecting the split point code? I don't see it. You certainly have to replicate everything for each thread...
Instead of just talking about it, I'm going to bite the bullet and make a clone of Komodo (hope I don't get into any trouble for this) that uses make/unmake. I will report the results - probably tomorrow.
Of course I cannot comment on MP performance yet but that is coming to Komodo pretty soon.