The game can last at most 8 moves (15 ply) here, if the white stone makes the maximum detour, but there probably isn't any reason toever do that. Although the nominal depth of the iterative deepening is not easy to interpret, (it is not a fixed-depth search) I am pretty sure that it searches to the end of the game here.
Ok, I figured out what the problem is with the "full width" search I have: it just calculates the value of the node for a random mover; of course it doesn't actually play as one: if there is a move that wins the game, it plays that 100% of the time, but even so it will score the node as if it plays the winning move 33% of the time.
I need to give this a moment of thought. Rather than a score or an evaluation, I propagate win/loss counts through the tree. This is fine if the first move I try is the only win (and I take a cut-off afterwards), but if the first move I try loses and the second move wins, the current node needs to be win 100% rather than win 50%/lose 50%.
What isn't immediately obvious to me is how MC search handles this situation. I suppose it's not a problem if it doesn't report the current node as 100% won, as long as it converges to 100% won given sufficient (ie, infinite) samples (and picks the correct move at the root). I may try a proper MC search, just for trying something different.
I actually don't have any sort of evaluation at the moment. There are some obvious terms (distance to winning; value of the different stones; you'd rather lose stone 3 than stone 6 because the former gets you more move options when you roll a 3 while the latter just makes you more likely to move stone 5) but I have no clue about weights. I might try to keep it evaluation light.
My program currently doesn't speak CECP either, but that should be easy to change.
I never did a real MC search. But I would be inclined to only propagate the stats of a move that obviously is best, in situations where you can freely choose between moves. Stats of non-best moves are only relevant for proving they are non-best. From the stats of each move you could calculate the probability that it is best, and you could weight them accordingly.