oh even more then 70%
RobboLito 0.085d1 - Houdini 1.00 70.37%
When will we see HOUDINI in official tournaments?
Moderator: Ras
-
Engin
- Posts: 1001
- Joined: Mon Jan 05, 2009 7:40 pm
- Location: Germany
- Full name: Engin Üstün
-
Albert Silver
- Posts: 3026
- Joined: Wed Mar 08, 2006 9:57 pm
- Location: Rio de Janeiro, Brazil
Re: When will we see HOUDINI in official tournaments?
I don't see why it would be at all hard for an author to sabotage this test.Engin wrote:oh even more then 70%
RobboLito 0.085d1 - Houdini 1.00 70.37%
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
-
Don
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: When will we see HOUDINI in official tournaments?
I think it's difficult to sabotage the test without weakening the target program but I'm sure with some hard work it can be done. But I'll bet it's much harder than you think.Albert Silver wrote:I don't see why it would be at all hard for an author to sabotage this test.Engin wrote:oh even more then 70%
RobboLito 0.085d1 - Houdini 1.00 70.37%
Komodo has gained hundreds of ELO over a few years and yet the similarity to the ancient versions is striking so it's not a trivial problem to "fool" the test. So I think it would be difficult to take Ivanhoe (for example) and make enough non-weakening changes to it to make most of the similarity go away. The test measures the evaluation function more than anything else and it's very difficult to make substantial changes to the playing style of a well engineered and developed program that does not kill the ELO.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
Mike S.
- Posts: 1480
- Joined: Thu Mar 09, 2006 5:33 am
Re: When will we see HOUDINI in official tournaments?
That is exactly what I have pointed to, earlier. While I do not doubt that your similarity testing tool is a very good achievement and a step forward, I DO doubt that the exclusion of any engine, based on similarity percentage, should be based on evaluation similartiy (almost) only. What about differences of the search code?The test measures the evaluation function more than anything else (...)
Not being a programmer, I still think that the set of useful evaluation terms is limited and it's basically "predefined" (e.g. as described in chess literature), differences of implementation are more or less small details. But I can imagine that for the search code(s) - a "highly technical" thing as viewn from my limited horizon of these things - there are endless opportunities and variations of how to do it.
In other words, I critizise that it may be said an engine is "65% similar" just because the eval is maybe 61% similar, but the search much different. We are on a wrong path.
Regards, Mike
-
Don
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: When will we see HOUDINI in official tournaments?
If someone heavily modifies the search, the similarity detector will NOT detect that the program has changed substantially.Mike S. wrote:That is exactly what I have pointed to, earlier. While I do not doubt that your similarity testing tool is a very good achievement and a step forward, I DO doubt that the exclusion of any engine, based on similarity percentage, should be based on evaluation similartiy (almost) only. What about differences of the search code?The test measures the evaluation function more than anything else (...)
However that is one of those "bugs" that I consider a feature. It's easy to make substantial search changes - even small changes that can have a big impact on the search. I can make Komodo search a ply deeper for example without it hurting the ELO much.
By the way, I am not in favor of the test being used to exclude programs from tournaments. However I am in favor of it being used to clear a program or as an investigative tool but not to be taken too seriously until it's better understood.
However if it were used like that you are obviously bothered by the fact that a programmer could modify the evaluation function significantly and get away with it since it does not catch search changes. I would not worry too much about that - the real grunt work is in the evaluation function and the cheaters are not usually going to put a lot of time into evaluation - they took someone else's code in order to not have to deal with writing an evaluation function from scratch and they are lazy.
The good news is that a plagiarist is likely to make substantial search modifications and even improve the program significantly - and yet will still get caught by the test because writing an evaluation from scratch is almost the same as writing a new chess program from scratch. I can build a search function in a day or two that is reasonably good, but it takes months to write a reasonably good evaluation function from scratch.
Wow! You have this exactly backwards. It's true the search can be highly technical but as I just said you can build a reasonable search very quickly, but you cannot build a reasonable evaluation function without taking a few months.
Not being a programmer, I still think that the set of useful evaluation terms is limited and it's basically "predefined" (e.g. as described in chess literature), differences of implementation are more or less small details. But I can imagine that for the search code(s) - a "highly technical" thing as viewn from my limited horizon of these things - there are endless opportunities and variations of how to do it.
I give the evaluation function at least 80% of the weight and the search is almost an insignificant detail in comparison. Now please understand that I'm not saying the search is not critical - it's extremely important and a huge contributor to the strength of a program. But from the perspective of the originality of the program the evaluation is much more important. Remember, we are talking about a CHESS program and the domain specific part of the program is the evaluation function - not the search (except for the move generator itself which is the most mundane part of a program.)
In other words, I critizise that it may be said an engine is "65% similar" just because the eval is maybe 61% similar, but the search much different. We are on a wrong path.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
Mike S.
- Posts: 1480
- Joined: Thu Mar 09, 2006 5:33 am
Re: When will we see HOUDINI in official tournaments?
Thanks for this enlightening reply.
Simplyfying, I conclude: The eval is the fingerprint.
Regards, Mike
-
Don
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: When will we see HOUDINI in official tournaments?
Yes, that is a reasonable analogy.Mike S. wrote:Thanks for this enlightening reply.Simplyfying, I conclude: The eval is the fingerprint.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
LudiBuda
- Posts: 76
- Joined: Sat Mar 03, 2012 7:53 pm
Re: When will we see HOUDINI in official tournaments?
I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.
-
Don
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: When will we see HOUDINI in official tournaments?
I could hardly fail to disagree with you less.LudiBuda wrote:I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
Graham Banks
- Posts: 45855
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: When will we see HOUDINI in official tournaments?
I wonder which engine Alex is the author of.Don wrote:I could hardly fail to disagree with you less.LudiBuda wrote:I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.
gbanksnz at gmail.com