When will we see HOUDINI in official tournaments?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Engin
Posts: 1001
Joined: Mon Jan 05, 2009 7:40 pm
Location: Germany
Full name: Engin Üstün

Re: When will we see HOUDINI in official tournaments?

Post by Engin »

oh even more then 70%

RobboLito 0.085d1 - Houdini 1.00 70.37%
Albert Silver
Posts: 3026
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: When will we see HOUDINI in official tournaments?

Post by Albert Silver »

Engin wrote:oh even more then 70%

RobboLito 0.085d1 - Houdini 1.00 70.37%
I don't see why it would be at all hard for an author to sabotage this test.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: When will we see HOUDINI in official tournaments?

Post by Don »

Albert Silver wrote:
Engin wrote:oh even more then 70%

RobboLito 0.085d1 - Houdini 1.00 70.37%
I don't see why it would be at all hard for an author to sabotage this test.
I think it's difficult to sabotage the test without weakening the target program but I'm sure with some hard work it can be done. But I'll bet it's much harder than you think.

Komodo has gained hundreds of ELO over a few years and yet the similarity to the ancient versions is striking so it's not a trivial problem to "fool" the test. So I think it would be difficult to take Ivanhoe (for example) and make enough non-weakening changes to it to make most of the similarity go away. The test measures the evaluation function more than anything else and it's very difficult to make substantial changes to the playing style of a well engineered and developed program that does not kill the ELO.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Mike S.
Posts: 1480
Joined: Thu Mar 09, 2006 5:33 am

Re: When will we see HOUDINI in official tournaments?

Post by Mike S. »

The test measures the evaluation function more than anything else (...)
That is exactly what I have pointed to, earlier. While I do not doubt that your similarity testing tool is a very good achievement and a step forward, I DO doubt that the exclusion of any engine, based on similarity percentage, should be based on evaluation similartiy (almost) only. What about differences of the search code?

Not being a programmer, I still think that the set of useful evaluation terms is limited and it's basically "predefined" (e.g. as described in chess literature), differences of implementation are more or less small details. But I can imagine that for the search code(s) - a "highly technical" thing as viewn from my limited horizon of these things - there are endless opportunities and variations of how to do it.

In other words, I critizise that it may be said an engine is "65% similar" just because the eval is maybe 61% similar, but the search much different. We are on a wrong path.
Regards, Mike
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: When will we see HOUDINI in official tournaments?

Post by Don »

Mike S. wrote:
The test measures the evaluation function more than anything else (...)
That is exactly what I have pointed to, earlier. While I do not doubt that your similarity testing tool is a very good achievement and a step forward, I DO doubt that the exclusion of any engine, based on similarity percentage, should be based on evaluation similartiy (almost) only. What about differences of the search code?
If someone heavily modifies the search, the similarity detector will NOT detect that the program has changed substantially.

However that is one of those "bugs" that I consider a feature. It's easy to make substantial search changes - even small changes that can have a big impact on the search. I can make Komodo search a ply deeper for example without it hurting the ELO much.

By the way, I am not in favor of the test being used to exclude programs from tournaments. However I am in favor of it being used to clear a program or as an investigative tool but not to be taken too seriously until it's better understood.

However if it were used like that you are obviously bothered by the fact that a programmer could modify the evaluation function significantly and get away with it since it does not catch search changes. I would not worry too much about that - the real grunt work is in the evaluation function and the cheaters are not usually going to put a lot of time into evaluation - they took someone else's code in order to not have to deal with writing an evaluation function from scratch and they are lazy.

The good news is that a plagiarist is likely to make substantial search modifications and even improve the program significantly - and yet will still get caught by the test because writing an evaluation from scratch is almost the same as writing a new chess program from scratch. I can build a search function in a day or two that is reasonably good, but it takes months to write a reasonably good evaluation function from scratch.

Not being a programmer, I still think that the set of useful evaluation terms is limited and it's basically "predefined" (e.g. as described in chess literature), differences of implementation are more or less small details. But I can imagine that for the search code(s) - a "highly technical" thing as viewn from my limited horizon of these things - there are endless opportunities and variations of how to do it.
Wow! You have this exactly backwards. It's true the search can be highly technical but as I just said you can build a reasonable search very quickly, but you cannot build a reasonable evaluation function without taking a few months.

In other words, I critizise that it may be said an engine is "65% similar" just because the eval is maybe 61% similar, but the search much different. We are on a wrong path.
I give the evaluation function at least 80% of the weight and the search is almost an insignificant detail in comparison. Now please understand that I'm not saying the search is not critical - it's extremely important and a huge contributor to the strength of a program. But from the perspective of the originality of the program the evaluation is much more important. Remember, we are talking about a CHESS program and the domain specific part of the program is the evaluation function - not the search (except for the move generator itself which is the most mundane part of a program.)
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Mike S.
Posts: 1480
Joined: Thu Mar 09, 2006 5:33 am

Re: When will we see HOUDINI in official tournaments?

Post by Mike S. »

Thanks for this enlightening reply. :mrgreen: Simplyfying, I conclude: The eval is the fingerprint.
Regards, Mike
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: When will we see HOUDINI in official tournaments?

Post by Don »

Mike S. wrote:Thanks for this enlightening reply. :mrgreen: Simplyfying, I conclude: The eval is the fingerprint.
Yes, that is a reasonable analogy.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
LudiBuda
Posts: 76
Joined: Sat Mar 03, 2012 7:53 pm

Re: When will we see HOUDINI in official tournaments?

Post by LudiBuda »

I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: When will we see HOUDINI in official tournaments?

Post by Don »

LudiBuda wrote:I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.
I could hardly fail to disagree with you less.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Graham Banks
Posts: 45855
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: When will we see HOUDINI in official tournaments?

Post by Graham Banks »

Don wrote:
LudiBuda wrote:I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.
I could hardly fail to disagree with you less.
I wonder which engine Alex is the author of.
gbanksnz at gmail.com