lkaufman wrote:bob wrote:lkaufman wrote:
Even identical engines won't always play the same move in timed play, or even in fixed depth play if MP is involved. Of course two strong engines will play the same move more often than not. Let's say that two unrelated strong engines (for example SF and Rybka) play the same move in timed play 70% of the time (I don't know the real figure). If another engine comes along and plays the same move as one of these programs 97% of the time, I would call it a clone or at least a derivative. If another engine plays the same move maybe 80% of the time, but this percentage increases to 97% if we reduce the allowed time, then I would consider it an improved derivative. There has to be some percentage short of 100% that would convince you that one program is derived from another, especially with regard to the eval. Some day there may be agreement on the proper eval for a chess program, but we are far from that point now. Just compare evals of SF or Crafty with Houdini for example.
Sure, but let's first define the parameters.
1. Just play the same move n% of the time. N has to be quite high to be fairly certain the two are related.
2. Show the same PV n% of the time. N can be lower, because there are more opportunities for differencies to show up.
3. Show the same PV with the same score N % of the time. Now N can be even lower, because we are getting too much similarity when you see the same score and PV, particularly at the same depth (although in the case of Rybka, this can be problematic if the program falsely reports depth (or even scores)).
But just the same move is, to me, not really convincing unless the percentage is quite high. But add in the PV or score or both, and it drops right on down...
Then I would say we are in pretty good agreement. But note that score comparisons are tricky, because not all engines define 1.00 as the average value of an extra pawn, and obviously just multiplying or dividing the score by a constant is a trivial change. Furthermore some programs modify the score in a non-linear way; both Rybka 2.xx and Houdini do this (Houdini for the stated reason of having a given score mean a given win percentage). So I would put a lot more weight on the PV than on the reported score. But even this is tricky, as even a 1 centipawn randomizer will very often produce quite different PVs. Best is to undo any "fudging" of the reported score, if you are able to do this, and then compare the scores to the program it is suspected of cloning. In the case of Houdini, someone even made a version that undid this score transformation, making its similarity to Ivanhoe much more obvious.
For me it is obvious when one program more or less clones the eval of another. I analyze all the major openings with any strong new program in the IDeA tree of Aquarium. If two programs share a very similar eval (I mean much closer than could be attributed to independent work), the two trees will look extremely similar. So basically I can recognize a clone of the eval I did for Rybka 3 quite easily. Probably I could have done the same with Fruit and Rybka 1 had I been interested in them.
I've previously mentioned that problem, already. I have seen 100 to 256 (I think Deep Blue used 256 for pawns, perhaps another contemporary program does/did as well). The IDEA comparison seems reasonable to me, as an automated way of detecting this problem, although it still needs a human's "eye" to see the similarities.
Here's a couple of examples of "sharing ideas but not code."
Back in the middle 90's, Bruce mentioned to me that he was playing around with the idea that after you had gotten to the part of the move ordering where you have tried a few moves based on the history heuristic, that we all stop using the history heuristic after some N moves. He commented that he was playing around with the idea than when he turned the history counter code off, he also was trying to reduce the depth, or outright prune at the last few plies, and that it made his program very fast. But it was also dangerous, for obvious reasons.
I experimented with that idea quite a bit, but at the time, our depths were in the 8-10-12 range on a pentium Pro, and it was beyond risky, it hurt. But he didn't suggest code, just an idea. I did some implementations and tested. And changed things and tested again, until I decided that this was perhaps an idea for future study as it wasn't paying off. Nobody would have said that our two approaches were similar at all, but that the underlying idea, that for what we now call "late moves in the move list," one might expend less effort on them, is a reasonable idea, yet our implementations shared nothing but a vague idea and not code.
Another example, singular extensions. Hsu/Campbell wrote a paper that described a hellishly complex implementation of this. Bruce and I discussed it over quite a while and we came up with an idea of "what about searching all moves, at the front of search, with a reduced depth and a lowered window, to see if just one will fail high? And if it does, flag it as singular, and then do the normal search. If you have a flagged singular move in that normal search, extend it. Bruce seemed to produce some code that helped, although he did not test thoroughly enough to quantify any real elo gain. But it did seem stronger. I tried a couple of approaches, and none passed my testing as better. We compared code, and we had done things terribly differently, yet with the same basic idea. Yet it worked for him (apparently) and did not for me. Same idea, drastically different implementation and code, and different results.
That's why I say "copying ideas" is fine. But by "ideas" I do not mean "look at someone's code, and write your own code that does things exactly as they do them, in the same order, in the same way, with the same bonuses and penalties, etc. That's copying more than an idea, that's copying an implementation.