undo move vs. Position Cloning

bob · Post by **bob** » Thu Sep 17, 2009 8:27 pm

Harald wrote:As I understand this dicussion there are basically 2 methods:

("copy-make") We have an array or stack of positions.
make_move(): copy the position infos from ply to ply+1 and change the copy. ply++
undo_move(): ply--

("make-undo") We have only one position.
make_move(): store some parts in a stack. change the position. ply++
undo_move(): ply--; change the position. retrieve some parts.

You are using classical definitions, which matches what I have been describing exactly. Unfortunately, the stockfish case is not a copy/make example, contrary to what was originally stated, because the undo_move() in stockfish actually undoes the move on the board, rather than just backing up to the previous state.

what a pain to have discussions when the terms are not used with accurate definitions.

In my engine Elephant I use copy-make but that will probably change in the next version.

There is a question related to make-undo:
Does it make sense to do some extra work in make_move() and store
all changes in an undo stack in a simplified form
list< pair<address, old_value> > ?
The undo_move() function does not have to look at the move and
find out what to change. It simply does something like this:
while (list not empty) { *address = old_value; }
(You dont have to use std::list. It is just pseudo code.)

Harald

Gerd Isenberg · Post by **Gerd Isenberg** » Thu Sep 17, 2009 11:18 pm

bob wrote:
You are using classical definitions, which matches what I have been describing exactly. Unfortunately, the stockfish case is not a copy/make example, contrary to what was originally stated, because the undo_move() in stockfish actually undoes the move on the board, rather than just backing up to the previous state.

what a pain to have discussions when the terms are not used with accurate definitions.

So lets define it.

One may actually distinguish three approaches.

There are reversible and (small) irreversible (ep, castle-right, move50, zobkey, eventually captured piece if not member of the move) aspects of a positions. The reversible aspects may be incrementally updated by make and inverse operation in unmake and therefor kept as singleton. The irreversible aspects require memory.

(1) global position, a stack for irreversible parts

Code: Select all

MAKE:   push irreversible parts
        update reversible parts
        update irreversible parts
UNMAKE: pop irreversible parts
        update reversible parts

which copies irreversible aspects of the positions back and forth.

Marco's point is to safe the pop and to distribute the irreversible and reversible parts of a position.
This can be done by an explict ply-array, or semantically equal, as linked list, where f.i. the each element of the list resides on the current processor stack frame of a recursive search.

(2) ply-array or linked list for the irreversible parts, global reversible parts

Code: Select all

MAKE:   copy irreversible parts from ply to ply+1 
        update reversible parts
        update irreversible parts[ply+1]
UNMAKE: update reversible parts

The initial proposal by Terry was to have an array or linked list for the whole position. Guess he means a pre-allocated array of 0x88 int (why not char?) boards, rather allocating a new Position of > 4.5KByte object each time.

(3) ply-array for the whole position

Code: Select all

MAKE:   copy position from ply to ply+1 
        update [ply+1]
UNMAKE: -

We define 3 as copy/make and 1 as make/unmake.
How would you call 2? Distributed make/unmake?

bob · Post by **bob** » Fri Sep 18, 2009 12:20 am

Gerd Isenberg wrote:
bob wrote:
You are using classical definitions, which matches what I have been describing exactly. Unfortunately, the stockfish case is not a copy/make example, contrary to what was originally stated, because the undo_move() in stockfish actually undoes the move on the board, rather than just backing up to the previous state.

what a pain to have discussions when the terms are not used with accurate definitions.
So lets define it.

One may actually distinguish three approaches.

There are reversible and (small) irreversible (ep, castle-right, move50, zobkey, eventually captured piece if not member of the move) aspects of a positions. The reversible aspects may be incrementally updated by make and inverse operation in unmake and therefor kept as singleton. The irreversible aspects require memory.

(1) global position, a stack for irreversible parts
Code: Select all
MAKE:   push irreversible parts
        update reversible parts
        update irreversible parts
UNMAKE: pop irreversible parts
        update reversible parts
which copies irreversible aspects of the positions back and forth.

Marco's point is to safe the pop and to distribute the irreversible and reversible parts of a position.
This can be done by an explict ply-array, or semantically equal, as linked list, where f.i. the each element of the list resides on the current processor stack frame of a recursive search.

(2) ply-array or linked list for the irreversible parts, global reversible parts
Code: Select all
MAKE:   copy irreversible parts from ply to ply+1 
        update reversible parts
        update irreversible parts[ply+1]
UNMAKE: update reversible parts
The initial proposal by Terry was to have an array or linked list for the whole position. Guess he means a pre-allocated array of 0x88 int (why not char?) boards, rather allocating a new Position of > 4.5KByte object each time.

(3) ply-array for the whole position
Code: Select all
MAKE:   copy position from ply to ply+1 
        update [ply+1]
UNMAKE: -
We define 3 as copy/make and 1 as make/unmake.
How would you call 2? Distributed make/unmake?

I only recognize two approaches.

(1) you copy the current state to a new state and update that. No unmake of any kind is required. This has always been called "copy/make"

(2) you have a Make and Unmake function. It doesn't matter if you save/restore or whatever, so long as you do anything in the unmake function, this is a make/unmake approach.

The copy/make is quite specific, and quite unique, because it has absolutely no need for any unmake operation of any kind. Whether you implement it as an array of states, indexed by ply, or whether you pass the address of an old and a new state to MakeMove() is really an implementation detail. But if you have any sort of UnmakeMove()/undo_move() function, it is not a copy/make design.

Granted, you might do a more efficient copy/make if you only copy exactly what is modified, but that becomes problematic IMHO because the new state should be the one use use below this node in the tree, and it has to be complete. If, on the other hand, you use make/unmake but try to be more efficient in how you unmake, by actually reversing some operations and copying/restoring others, that is still a make/unmake architecture, not a copy/make.

adieguez · Post by **adieguez** » Fri Sep 18, 2009 12:30 am

Well, cloning will always be polemic

bob wrote:I think the discussion is worse than even you suggested.
...
So without using a standard definition of terms, the discussion degrades into nothing but misunderstandings.

Don · Post by **Don** » Mon Sep 21, 2009 5:33 am

bob wrote:
SImple first approximation to the answer: What is your NPS compared to Crafty's on the same hardware? If you want, I can run your code on our 8-core box, you you can download Crafty's source and run it on your box. Crafty is not "tricky". It spends fully 50% of the total execution time in Evaluate() and its derivatives. So it is not a particularly "fast and simple" program at all. But it is a "fast program".

I don't think it makes sense to try to compare different programs. But I saw your report on trying make/unmake. So maybe I can get most of the 5% by switching to make/unmake in the quies search or something like that.

Here is some data on my program if you are interested - it does have mobility, king safety and some stuff that slows it down but I try to be fast although I don't obsess like I used to. I think my program is reasonably fast but not as fast as many.

I'm running an early core 2 duo machine. cat /proc/cpuinfo give me this in part:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU 6700 @ 2.66GHz
stepping : 6
cpu MHz : 1600.000
cache size : 4096 KB

If you run from the position after e2e4 e7e5 I get this from my program with a single cpu:

Code: Select all

Doch Version 09.634
position startpos moves e2e4 e7e5
info string fen = rnbqkbnr/pppp1ppp/8/4p3/4P3/8/PPPP1PPP/RNBQKBNR w - -
go depth 15
info depth 1 time 9 nodes 150 score cp 45 nps 16666 pv b1c3 
info depth 2 time 13 nodes 1732 score cp 9 nps 133230 pv b1c3 b8c6 
info depth 3 time 13 nodes 2210 score cp 43 nps 169999 pv b1c3 b8c6 g1f3 
info depth 4 time 13 nodes 3494 score cp 9 nps 268769 pv b1c3 b8c6 g1f3 g8f6 
info depth 5 time 16 nodes 8124 score cp 16 nps 477882 pv b1c3 b8c6 g1f3 g8f6 f1c4 
info depth 6 time 20 nodes 12541 score cp 9 nps 597190 pv b1c3 b8c6 g1f3 g8f6 f1c4 f8c5 
info depth 7 time 32 nodes 26691 score cp 45 nps 808818 pv b1c3 g8f6 g1f3 f8d6 f1d3 e8g8 e1g1 
info depth 8 time 48 nodes 54316 score cp 18 nps 1108489 pv b1c3 g8f6 g1f3 f8d6 f1c4 b8c6 e1g1 e8g8 
info depth 9 time 88 nodes 105249 score cp 23 nps 1182573 pv b1c3 g8f6 g1f3 f8d6 f1c4 b8c6 e1g1 e8g8 d2d4 
info depth 10 time 232 nodes 315733 score cp 24 nps 1355077 pv b1c3 f8d6 g1f3 g8e7 f1d3 e8g8 e1g1 b8c6 h2h3 c6d4 f3d4 e5d4 
info depth 11 time 381 nodes 638986 score cp 27 nps 1677128 pv b1c3 f8d6 g1f3 b8c6 f1c4 g8f6 e1g1 e8g8 d2d4 f8e8 c1g5 
info depth 12 time 1145 nodes 2251210 score cp 9 nps 1966122 pv b1c3 b8c6 g1f3 g8f6 f1b5 f8b4 e1g1 e8g8 d2d3 d7d6 c1e3 c8e6 
info depth 13 time 2073 nodes 4233393 score cp 8 nps 2042157 pv b1c3 b8c6 g1f3 g8f6 f1c4 f8c5 e1g1 e8g8 d2d3 d7d6 c1e3 c5e3 f2e3 c8e6 c4e6 f7e6 
info depth 14 time 4197 nodes 8920132 score cp 8 nps 2125359 pv b1c3 b8c6 g1f3 g8f6 f1c4 f8c5 e1g1 e8g8 d2d3 d7d6 c1e3 c5e3 f2e3 c8e6 c4e6 f7e6 
info depth 15 time 11341 nodes 24116117 score cp 20 nps 2126454 pv g1f3 b8c6 f1c4 f8d6 b1c3 g8f6 e1g1 e8g8 f1e1 d6c5 h2h3 d8e7 d2d3 d7d6 c1e3 
bestmove g1f3 ponder b8c6

bob · Post by **bob** » Mon Sep 21, 2009 8:00 pm

Don wrote:
bob wrote:
SImple first approximation to the answer: What is your NPS compared to Crafty's on the same hardware? If you want, I can run your code on our 8-core box, you you can download Crafty's source and run it on your box. Crafty is not "tricky". It spends fully 50% of the total execution time in Evaluate() and its derivatives. So it is not a particularly "fast and simple" program at all. But it is a "fast program".
I don't think it makes sense to try to compare different programs. But I saw your report on trying make/unmake. So maybe I can get most of the 5% by switching to make/unmake in the quies search or something like that.

Don't forget, also, that the loss was much greater with 8 threads, on a dual-processor quad-core box. Each chip has the usual L1 + 4M L2 that is shared among the cores on that chip, so the box has 8 L1 caches, and 2 L2 caches, and the copying ate a ton of memory bandwidth that was really not there.

Here is some data on my program if you are interested - it does have mobility, king safety and some stuff that slows it down but I try to be fast although I don't obsess like I used to. I think my program is reasonably fast but not as fast as many.

I'm running an early core 2 duo machine. cat /proc/cpuinfo give me this in part:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU 6700 @ 2.66GHz
stepping : 6
cpu MHz : 1600.000
cache size : 4096 KB

If you run from the position after e2e4 e7e5 I get this from my program with a single cpu:

Code: Select all

Doch Version 09.634
position startpos moves e2e4 e7e5
info string fen = rnbqkbnr/pppp1ppp/8/4p3/4P3/8/PPPP1PPP/RNBQKBNR w - -
go depth 15
info depth 1 time 9 nodes 150 score cp 45 nps 16666 pv b1c3 
info depth 2 time 13 nodes 1732 score cp 9 nps 133230 pv b1c3 b8c6 
info depth 3 time 13 nodes 2210 score cp 43 nps 169999 pv b1c3 b8c6 g1f3 
info depth 4 time 13 nodes 3494 score cp 9 nps 268769 pv b1c3 b8c6 g1f3 g8f6 
info depth 5 time 16 nodes 8124 score cp 16 nps 477882 pv b1c3 b8c6 g1f3 g8f6 f1c4 
info depth 6 time 20 nodes 12541 score cp 9 nps 597190 pv b1c3 b8c6 g1f3 g8f6 f1c4 f8c5 
info depth 7 time 32 nodes 26691 score cp 45 nps 808818 pv b1c3 g8f6 g1f3 f8d6 f1d3 e8g8 e1g1 
info depth 8 time 48 nodes 54316 score cp 18 nps 1108489 pv b1c3 g8f6 g1f3 f8d6 f1c4 b8c6 e1g1 e8g8 
info depth 9 time 88 nodes 105249 score cp 23 nps 1182573 pv b1c3 g8f6 g1f3 f8d6 f1c4 b8c6 e1g1 e8g8 d2d4 
info depth 10 time 232 nodes 315733 score cp 24 nps 1355077 pv b1c3 f8d6 g1f3 g8e7 f1d3 e8g8 e1g1 b8c6 h2h3 c6d4 f3d4 e5d4 
info depth 11 time 381 nodes 638986 score cp 27 nps 1677128 pv b1c3 f8d6 g1f3 b8c6 f1c4 g8f6 e1g1 e8g8 d2d4 f8e8 c1g5 
info depth 12 time 1145 nodes 2251210 score cp 9 nps 1966122 pv b1c3 b8c6 g1f3 g8f6 f1b5 f8b4 e1g1 e8g8 d2d3 d7d6 c1e3 c8e6 
info depth 13 time 2073 nodes 4233393 score cp 8 nps 2042157 pv b1c3 b8c6 g1f3 g8f6 f1c4 f8c5 e1g1 e8g8 d2d3 d7d6 c1e3 c5e3 f2e3 c8e6 c4e6 f7e6 
info depth 14 time 4197 nodes 8920132 score cp 8 nps 2125359 pv b1c3 b8c6 g1f3 g8f6 f1c4 f8c5 e1g1 e8g8 d2d3 d7d6 c1e3 c5e3 f2e3 c8e6 c4e6 f7e6 
info depth 15 time 11341 nodes 24116117 score cp 20 nps 2126454 pv g1f3 b8c6 f1c4 f8d6 b1c3 g8f6 e1g1 e8g8 f1e1 d6c5 h2h3 d8e7 d2d3 d7d6 c1e3 
bestmove g1f3 ponder b8c6

stegemma · Post by **stegemma** » Wed Sep 23, 2009 1:20 am

I think that make/unmake is faster than copy/make if and only if unmake function is faster than copy. This seems to be obvious but it is the point to start from. If anybody find that copying some hundred bytes is faster than unmake a move... then this "anybody" should have done a bad unmake function... or the whole make/unmake process is very inefficient. If you program in C, it is possible that your copying is done with memcopy function that are written in assembly, for performance, directly by the men who wrote the compiler itself (and its libraries). Your make/unmake function is written in C and if you don't think about performances it's easy that you have written "bad code". The memcopy function is instead in assembly code and it is very efficient... against your inefficient unmake function in C. This is from an assembly programmer point of view. Watching at my unmake function, i find that it is 14 assembly istructions, for regular moves (some more istructions are needed for castling, en-passant etc). I think that there's no way to do a copy of my chessboard structures in less than 14 istructions (about 20 clock cycles or a little more).

So, if your program is faster with copy instead of unmake... then you better have to consider how you wrote your make/unmake functions...

(IMHO, obviously, and i'm saying this with no flames or offensive purpose, just i haven't find a better way to say that in english)

Don · Post by **Don** » Wed Sep 23, 2009 3:14 am

stegemma wrote:I think that make/unmake is faster than copy/make if and only if unmake function is faster than copy. This seems to be obvious but it is the point to start from. If anybody find that copying some hundred bytes is faster than unmake a move... then this "anybody" should have done a bad unmake function... or the whole make/unmake process is very inefficient. If you program in C, it is possible that your copying is done with memcopy function that are written in assembly, for performance, directly by the men who wrote the compiler itself (and its libraries). Your make/unmake function is written in C and if you don't think about performances it's easy that you have written "bad code". The memcopy function is instead in assembly code and it is very efficient... against your inefficient unmake function in C. This is from an assembly programmer point of view. Watching at my unmake function, i find that it is 14 assembly istructions, for regular moves (some more istructions are needed for castling, en-passant etc). I think that there's no way to do a copy of my chessboard structures in less than 14 istructions (about 20 clock cycles or a little more).

So, if your program is faster with copy instead of unmake... then you better have to consider how you wrote your make/unmake functions...

(IMHO, obviously, and i'm saying this with no flames or offensive purpose, just i haven't find a better way to say that in english)

I think it's pretty clear that speed is a minor issue. Speed was never my reason for using copy/make since my program spends less time doing a copy/make that Crafty saves by NOT doing it.

There are other benefits to copy make such as having less code, a simpler program and a much simpler multi-processing program.

There is one disadvantage when it comes to program development to copy/make. It can hide bugs in the move generator because anything wrong with make goes away with copy/make. However if you use peft and do good debugging it's a non-issue.

I get the feeling that everybody who does make/unmake was offended when I suggested that it could be faster to do copy/make. I never CLAIMED that was the case, I just said it wouldn't surprise me. It's probably silly to even argue about this because I am sure it will depend on each program. A program that has a complex make and keeps incrementally updated state (that is difficult to undo) may benefit from copy make but a Fritz-like program designed to be ridiculously fast may be worse.

Dann Corbit · Post by **Dann Corbit** » Wed Sep 23, 2009 5:38 am

Don wrote:
stegemma wrote:I think that make/unmake is faster than copy/make if and only if unmake function is faster than copy. This seems to be obvious but it is the point to start from. If anybody find that copying some hundred bytes is faster than unmake a move... then this "anybody" should have done a bad unmake function... or the whole make/unmake process is very inefficient. If you program in C, it is possible that your copying is done with memcopy function that are written in assembly, for performance, directly by the men who wrote the compiler itself (and its libraries). Your make/unmake function is written in C and if you don't think about performances it's easy that you have written "bad code". The memcopy function is instead in assembly code and it is very efficient... against your inefficient unmake function in C. This is from an assembly programmer point of view. Watching at my unmake function, i find that it is 14 assembly istructions, for regular moves (some more istructions are needed for castling, en-passant etc). I think that there's no way to do a copy of my chessboard structures in less than 14 istructions (about 20 clock cycles or a little more).

So, if your program is faster with copy instead of unmake... then you better have to consider how you wrote your make/unmake functions...

(IMHO, obviously, and i'm saying this with no flames or offensive purpose, just i haven't find a better way to say that in english)
I think it's pretty clear that speed is a minor issue. Speed was never my reason for using copy/make since my program spends less time doing a copy/make that Crafty saves by NOT doing it.

There are other benefits to copy make such as having less code, a simpler program and a much simpler multi-processing program.

There is one disadvantage when it comes to program development to copy/make. It can hide bugs in the move generator because anything wrong with make goes away with copy/make. However if you use peft and do good debugging it's a non-issue.

I get the feeling that everybody who does make/unmake was offended when I suggested that it could be faster to do copy/make. I never CLAIMED that was the case, I just said it wouldn't surprise me. It's probably silly to even argue about this because I am sure it will depend on each program. A program that has a complex make and keeps incrementally updated state (that is difficult to undo) may benefit from copy make but a Fritz-like program designed to be ridiculously fast may be worse.

It seems like everything in chess programming comes down to "try it".

For instance, null move verification works well in some programs and is a waste of time in others.

For instance, if you already have superb move ordering, then IID won't add much. If you have terrible move ordering, IID will give great improvement.

Some people don't see a benefit from LMR when they try it. Maybe it's a bad implementation or maybe they are already doing something similar with reductions which makes LMR redundant and so not a win.

Most of the time, to find out if something will work for a particular engine, the only way to find out is to give it a go (and sometimes it takes several tries, if it is a complex idea)

stegemma · Post by **stegemma** » Wed Sep 23, 2009 9:56 am

Dann Corbit wrote:
...It seems like everything in chess programming comes down to "try it"....
Yes, that's true but i would add that you can foresee what could happen if you look at the assembly code. Despite from cache and pipeline issue, just seeing 60 assembly istructions (or a loop of 3 istructions repeated 20 times) instead of 20 istructions could give you an advise that a function "should be" wrong.

I just suggest a simple quantitative method to know if sonething could be good or not: just count the assembly istructions!!! (of course i know about clock cycles involved in istructions, pipeline, caching, stall and so on... but i don't ask so much from a pure C programmer).

undo move vs. Position Cloning

Re: undo move vs. Position Cloning

Re: undo move vs. Position Cloning

Re: undo move vs. Position Cloning

Re: undo move vs. Position Cloning

Re: undo move vs. Position Cloning

Re: undo move vs. Position Cloning

Re: undo move vs. Position Cloning

Re: undo move vs. Position Cloning

Re: undo move vs. Position Cloning

Re: undo move vs. Position Cloning