Perhaps the parser used and the relavant equations can be given and we could parse the source files ourselves (I guess that the information is simply derived from head to head matches with ponder moves pulled from the engine pvs).Shaun wrote:See above - perhaps we need to remove the ponder hit stats untill the scripts can be updated to provide reliable stats again....Dann Corbit wrote:The number of moves is from 306 to 2716. It will include all phases of the game, since the moves are counted from games played between the opponents. If we ignore all of those pairs for which the number of moves counted is less than 1000, I don't think it changes anything.Uri Blass wrote:I think that the number of moves is clearly too small and they are from games so they are not independent.Dann Corbit wrote:I have decided that the entire idea is a very bad one.swami wrote:Yes, you have raised an interesting point. I've usually sent about 160-200 positions to Dann, of which 100 gets selected because they have the best moves as well as partial credit ones. What about the rejected ones? They are rejected because there's no best objective solution.BubbaTough wrote:I thought the point of STS was that there was an objectively best move (as well as a possibly 2nd or 3rd best for partial credit). If this is the case, then the better the programs are the more they would look like each other in terms of STS results. If anything, the positions you have rejected are more likely to be good test candidates, because assumably you rejected them as not having a clear best move. Or better yet, the positions you did not even consider using, because it is completely unclear what the best move might be.swami wrote:
I hope STS can be used for the clone detection test as it also comes with the partial credit moves which just maximizes the probing and the choices of engines will be more easily and comprehensively assessed.
Next version will be released probably by the end of this month, and we will have 1000 positions.
-Sam
So, yes, rejected positions might be the good test candidates because it assesses engine's input in positions where there's no clear best move. Perhaps Dann had saved a list of all the rejected positions for each one of the test?
The idea is really just like the ponder hit of CCRL. (IOW, engines that very frequently have the same pv nodes). Consider this query:
http://www.computerchess.org.uk/ccrl/40 ... es+only%29Are Rybka and Naum clones of each other?Code: Select all
# Pair Ponder hit Moves counted 1 Rybka 2.3.2a 64-bit 2CPU – Naum 3 64-bit 4CPU 84.1 981 2 Pro Deo 1.1 Silver – Booot 4.11.1 83.8 439 3 Booot 4.11.1 – Zeus 1.28 83.3 654 4 Rybka 2.3.2a 64-bit – Naum 3 64-bit 4CPU 83.3 1386 5 Sloppy 0.1.1 – Booot 4.11.1 83.3 460 6 GreKo 5.5 – Deuterium 06.08.25.04 83.0 383 7 Rybka 3 64-bit – Naum 3 64-bit 82.7 394 8 Naum 3 64-bit – Deep Sjeng 3.0 64-bit 1CPU 82.7 306 9 BBChess 1.3a – Cyrano 0.2f 82.6 397 10 Dragon 4.6 – Tytan 9.3 82.3 368 11 Matacz 1.1 – Homer 2.0 82.2 573 12 Naum 2.0 32-bit – Delfi 5.4 82.2 1379 13 Delfi 5.2 – Hamsters 0.6 81.6 2312 14 Uralochka 1.1b – AliChess 4.08 81.5 536 15 Stockfish 1.4 32-bit – Booot 4.15.0 81.5 1700 16 Chessmaster 11 – Delfi 5.2 81.4 1996 17 Ufim 8.02 – Rotor 0.4 81.4 2716 18 Naum 3 64-bit – Glaurung 2.1 64-bit 81.3 578 19 Hiarcs Paderborn 2007 – Chess Tiger 2007.1 81.3 1761 20 Alf 1.09 – Matheus 2.3 81.2 351 21 Naum 4 32-bit – TheMadPrune 1.1.25 81.1 1259 22 Ufim 7.01 – Homer 2.0 81.0 596 23 Booot 4.11.1 – Tytan 9.3 81.0 405 24 Toga II 1.4 beta5c – Naum 2.2 64-bit 80.9 1327 25 Movei 00.8.438 (10 10 10) – Dragon 4.6 80.9 482 26 Rybka 2.3.2a 64-bit 2CPU – Naum 3 64-bit 2CPU 80.8 2108 27 Chess Tiger 2007.1 – Chessmaster 11 80.8 1973 28 Tornado 2.2 – Pupsi2 0.07 80.7 990 29 Naum 2.2 64-bit – Hiarcs 11.2 80.7 378 30 Arasan 10.1 – LittleThought 1.00 32-bit 80.7 409
How about Booot and ProDeo?
Chessmaster and Delfi?
This is an incredibly thorough analysis of engines that "think alike" and yet what it shows me is that there is no correlation to geneology to be assumed from this.
It is possible that a program was involved in some tactical games with many forced moves.
Taking fixed positions is clearly better then playing games and calculating ponder hit.
UriCode: Select all
# Pair Ponder hit Moves counted 4 Rybka 2.3.2a 64-bit – Naum 3 64-bit 4CPU 83.3 1386 12 Naum 2.0 32-bit – Delfi 5.4 82.2 1379 13 Delfi 5.2 – Hamsters 0.6 81.6 2312 15 Stockfish 1.4 32-bit – Booot 4.15.0 81.5 1700 16 Chessmaster 11 – Delfi 5.2 81.4 1996 17 Ufim 8.02 – Rotor 0.4 81.4 2716 19 Hiarcs Paderborn 2007 – Chess Tiger 2007.1 81.3 1761 21 Naum 4 32-bit – TheMadPrune 1.1.25 81.1 1259 24 Toga II 1.4 beta5c – Naum 2.2 64-bit 80.9 1327 26 Rybka 2.3.2a 64-bit 2CPU – Naum 3 64-bit 2CPU 80.8 2108 27 Chess Tiger 2007.1 – Chessmaster 11 80.8 1973
Clone detection test
Moderator: Ras
-
- Posts: 12792
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Clone detection test
-
- Posts: 323
- Joined: Wed Mar 08, 2006 9:55 pm
- Location: Brighton - UK
Re: Clone detection test
You are correct the ponder hits are calculated based on the reported ponder information in the pgn. I have not looked at Kirills scripts in this area so I would rather not comment, incase I give the wrong information. However our games can be downloaded and the pgn with comments should include all ponder information that the GUI / engine combination provided.Dann Corbit wrote:Perhaps the parser used and the relavant equations can be given and we could parse the source files ourselves (I guess that the information is simply derived from head to head matches with ponder moves pulled from the engine pvs).
Shaun
-
- Posts: 454
- Joined: Sat Apr 04, 2009 6:44 pm
- Location: Bulgaria
Re: Clone detection test
Hi guys!
Wouldn't be more correct if the test includes the similarities in the pv as well.
A single move could misleads since the reason for picking it could be even a move ordering bug. I know that huge amount of positions could suppress these kind of possibilities, but they are still remains. I think that the test make sense, but in a perfect world, could be more correct prove if it is combined with the pv similarities and the evaluation value.
Wouldn't be more correct if the test includes the similarities in the pv as well.
A single move could misleads since the reason for picking it could be even a move ordering bug. I know that huge amount of positions could suppress these kind of possibilities, but they are still remains. I think that the test make sense, but in a perfect world, could be more correct prove if it is combined with the pv similarities and the evaluation value.
-
- Posts: 12792
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Clone detection test
Kiril's data contains both the predicted move and also the score.xcomponent wrote:Hi guys!
Wouldn't be more correct if the test includes the similarities in the pv as well.
A single move could misleads since the reason for picking it could be even a move ordering bug. I know that huge amount of positions could suppress these kind of possibilities, but they are still remains. I think that the test make sense, but in a perfect world, could be more correct prove if it is combined with the pv similarities and the evaluation value.
However, it is quite easy to fudge the score (for example, multiply the every score by 1.5 or by 0.75 and the engine will play the same).
I think that other interesting things might fall out of this sort of classification. For instance, we might discover families of engines that like to lock the position and play a slow, closed game. We might discover families of engines that like fireworks and pirates storming over the wall.
We might discover engines that can build a fortress or engines that can dismantle a fortress (and conversely those that can't).
I am beginning to think more and more that it is perhaps not a great idea to accuse someone of something very bad because his engine plays similarly to another engine. But it is also possible that some magic formula will occur that is foolproof. In any case, the idea makes me very nervous.
-
- Posts: 454
- Joined: Sat Apr 04, 2009 6:44 pm
- Location: Bulgaria
Re: Clone detection test
I agree. Probably the purpose of tests like this one is more valuable asDann Corbit wrote:Kiril's data contains both the predicted move and also the score.xcomponent wrote:Hi guys!
Wouldn't be more correct if the test includes the similarities in the pv as well.
A single move could misleads since the reason for picking it could be even a move ordering bug. I know that huge amount of positions could suppress these kind of possibilities, but they are still remains. I think that the test make sense, but in a perfect world, could be more correct prove if it is combined with the pv similarities and the evaluation value.
However, it is quite easy to fudge the score (for example, multiply the every score by 1.5 or by 0.75 and the engine will play the same).
I think that other interesting things might fall out of this sort of classification. For instance, we might discover families of engines that like to lock the position and play a slow, closed game. We might discover families of engines that like fireworks and pirates storming over the wall.
We might discover engines that can build a fortress or engines that can dismantle a fortress (and conversely those that can't).
I am beginning to think more and more that it is perhaps not a great idea to accuse someone of something very bad because his engine plays similarly to another engine. But it is also possible that some magic formula will occur that is foolproof. In any case, the idea makes me very nervous.
analysis and style similarity instead of plag. detection. But still there is
one direction in which it may be very usable. If new closed source engine arrives and the test shows ,let's say, about 99% similarity to already known engine, maybe this could be a serous argument for cloning issue.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Clone detection test
I believe the only reliable measure is the actual move played because everything else can be faked. But you cannot fake the move.xcomponent wrote:Hi guys!
Wouldn't be more correct if the test includes the similarities in the pv as well.
A single move could misleads since the reason for picking it could be even a move ordering bug. I know that huge amount of positions could suppress these kind of possibilities, but they are still remains. I think that the test make sense, but in a perfect world, could be more correct prove if it is combined with the pv similarities and the evaluation value.
But even if it were not faked, I'm not sure the PV is very reliable. In my own program the PV changes frequently, even if the first move does not. My guess is that each successive move is increasingly unreliable as a measure of similarity. And I admit that I'm guessing here, but my sense of this is that even if it was an improvement it would be only a very minor one. And as I mentioned, it can still be faked.
But there is no need to guess, try the experiment yourself, it was very easy to construct - and see if you can produce a better measure using many moves of the PV.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Clone detection test
I think you are being overly concerned. I have seen over the years many cheaters exposed (or let's just say misunderstandings cleared up) simply because the author of a program noticed that a program in a tournament was playing like his.Dann Corbit wrote: I am beginning to think more and more that it is perhaps not a great idea to accuse someone of something very bad because his engine plays similarly to another engine. But it is also possible that some magic formula will occur that is foolproof. In any case, the idea makes me very nervous.
For instance several years ago I am playing at the Dutch Computer Chess Championship and get an email from John Stanback - who is watching the games from the states and asks me to check into a problem, he notices that one of the program is playing just like Zarkov.
In another tournament Richard Lang is sitting across from a clone of his own program (in this case it's a perfect clone) and he notices that a program is just too similar to his even though it's disguised in a different housing (it's one of those hardware chess computers.)
In yet another tournament Bob Hyatt notices remotely that one of the contestants is playing exactly the same moves as Crafty.
As a result of these observations, the problems in each case were investigated and were brought to some kind of resolution. Just noticing the similarity itself was not considered the proof but it was good enough to start asking questions.
So just relax - if we build this tool it will be with the understanding that it's imperfect - it's just a crude measurement. Like almost any tool, it's not a bad thing but someone could grab a screwdriver and use as a weapon to hurt someone with, it doesn't mean we should not have screwdrivers.
So I believe this is a powerful test but like you I agree that it should not be used as a weapon to hit someone over the head with.
I could build a simple UCI test harness to run my test and produce a result file - anyone interesting in doing some kind of blind test?
Don
-
- Posts: 1822
- Joined: Thu Mar 09, 2006 11:54 pm
- Location: The Netherlands
Re: Clone detection test
I didn't follow the thread, but what's all this idiocy man?Don wrote:I think you are being overly concerned. I have seen over the years many cheaters exposed (or let's just say misunderstandings cleared up) simply because the author of a program noticed that a program in a tournament was playing like his.Dann Corbit wrote: I am beginning to think more and more that it is perhaps not a great idea to accuse someone of something very bad because his engine plays similarly to another engine. But it is also possible that some magic formula will occur that is foolproof. In any case, the idea makes me very nervous.
For instance several years ago I am playing at the Dutch Computer Chess Championship and get an email from John Stanback - who is watching the games from the states and asks me to check into a problem, he notices that one of the program is playing just like Zarkov.
In another tournament Richard Lang is sitting across from a clone of his own program (in this case it's a perfect clone) and he notices that a program is just too similar to his even though it's disguised in a different housing (it's one of those hardware chess computers.)
In yet another tournament Bob Hyatt notices remotely that one of the contestants is playing exactly the same moves as Crafty.
As a result of these observations, the problems in each case were investigated and were brought to some kind of resolution. Just noticing the similarity itself was not considered the proof but it was good enough to start asking questions.
So just relax - if we build this tool it will be with the understanding that it's imperfect - it's just a crude measurement. Like almost any tool, it's not a bad thing but someone could grab a screwdriver and use as a weapon to hurt someone with, it doesn't mean we should not have screwdrivers.
So I believe this is a powerful test but like you I agree that it should not be used as a weapon to hit someone over the head with.
I could build a simple UCI test harness to run my test and produce a result file - anyone interesting in doing some kind of blind test?
Don
Just install IDA pro and look in the assembler code of an engine and you know instantly whether it's a clone.
And nothing else can prove anything unless the guy who clones is a major idiot; note majority is major idiots to copy things 100%. With source code available now it's easy to make some modifications modifying behaviour.
So just take a look in the assembler code of the engine and you know it all. Simple as that.
Thanks,
Vincent
-
- Posts: 270
- Joined: Thu Jan 15, 2009 12:52 pm
Re: Clone detection test
How about we take a step back and review what has been said:
We have a tool which looks at the move made, and from that, we can determine similarities and differences in play style.
Let's leave it at that.
We could go on forever arguing about whether the full PV should be analyzed or the pondermove, etc etc. Let's leave that to another tool and keep this simple.
Let's not aspire to do a clone detection test. Let's make a "Play Style Proximity Detector". We already have the specification for it, and the analysis done so far has yielded a lot of interesting information.
Then, if someone wants to use the results to claim that A may be a clone of B, they can do so, and the discussion can start for that particular case in a particular thread.
We have a tool which looks at the move made, and from that, we can determine similarities and differences in play style.
Let's leave it at that.
We could go on forever arguing about whether the full PV should be analyzed or the pondermove, etc etc. Let's leave that to another tool and keep this simple.
Let's not aspire to do a clone detection test. Let's make a "Play Style Proximity Detector". We already have the specification for it, and the analysis done so far has yielded a lot of interesting information.
Then, if someone wants to use the results to claim that A may be a clone of B, they can do so, and the discussion can start for that particular case in a particular thread.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Clone detection test
Disassembled code looks different on different compilers and requires an expert. It's easy for you and I, but not everyone. A lot of good C programmers do not know assembler.diep wrote:I didn't follow the thread, but what's all this idiocy man?Don wrote:I think you are being overly concerned. I have seen over the years many cheaters exposed (or let's just say misunderstandings cleared up) simply because the author of a program noticed that a program in a tournament was playing like his.Dann Corbit wrote: I am beginning to think more and more that it is perhaps not a great idea to accuse someone of something very bad because his engine plays similarly to another engine. But it is also possible that some magic formula will occur that is foolproof. In any case, the idea makes me very nervous.
For instance several years ago I am playing at the Dutch Computer Chess Championship and get an email from John Stanback - who is watching the games from the states and asks me to check into a problem, he notices that one of the program is playing just like Zarkov.
In another tournament Richard Lang is sitting across from a clone of his own program (in this case it's a perfect clone) and he notices that a program is just too similar to his even though it's disguised in a different housing (it's one of those hardware chess computers.)
In yet another tournament Bob Hyatt notices remotely that one of the contestants is playing exactly the same moves as Crafty.
As a result of these observations, the problems in each case were investigated and were brought to some kind of resolution. Just noticing the similarity itself was not considered the proof but it was good enough to start asking questions.
So just relax - if we build this tool it will be with the understanding that it's imperfect - it's just a crude measurement. Like almost any tool, it's not a bad thing but someone could grab a screwdriver and use as a weapon to hurt someone with, it doesn't mean we should not have screwdrivers.
So I believe this is a powerful test but like you I agree that it should not be used as a weapon to hit someone over the head with.
I could build a simple UCI test harness to run my test and produce a result file - anyone interesting in doing some kind of blind test?
Don
Just install IDA pro and look in the assembler code of an engine and you know instantly whether it's a clone.
And nothing else can prove anything unless the guy who clones is a major idiot; note majority is major idiots to copy things 100%. With source code available now it's easy to make some modifications modifying behaviour.
So just take a look in the assembler code of the engine and you know it all. Simple as that.
Thanks,
Vincent
The similarity testers would be a tool and nothing more. It would be used in conjunction with other things, such as the disassembler.
Don