When will we see HOUDINI in official tournaments?

Engin · Post by **Engin** » Thu May 10, 2012 11:15 pm

oh even more then 70%

RobboLito 0.085d1 - Houdini 1.00 70.37%

Albert Silver · Post by **Albert Silver** » Thu May 10, 2012 11:38 pm

Engin wrote:oh even more then 70%

RobboLito 0.085d1 - Houdini 1.00 70.37%

I don't see why it would be at all hard for an author to sabotage this test.

Don · Post by **Don** » Thu May 10, 2012 11:48 pm

Albert Silver wrote:
Engin wrote:oh even more then 70%

RobboLito 0.085d1 - Houdini 1.00 70.37%
I don't see why it would be at all hard for an author to sabotage this test.

I think it's difficult to sabotage the test without weakening the target program but I'm sure with some hard work it can be done. But I'll bet it's much harder than you think.

Komodo has gained hundreds of ELO over a few years and yet the similarity to the ancient versions is striking so it's not a trivial problem to "fool" the test. So I think it would be difficult to take Ivanhoe (for example) and make enough non-weakening changes to it to make most of the similarity go away. The test measures the evaluation function more than anything else and it's very difficult to make substantial changes to the playing style of a well engineered and developed program that does not kill the ELO.

Mike S. · Post by **Mike S.** » Fri May 11, 2012 5:35 am

The test measures the evaluation function more than anything else (...)

That is exactly what I have pointed to, earlier. While I do not doubt that your similarity testing tool is a very good achievement and a step forward, I DO doubt that the exclusion of any engine, based on similarity percentage, should be based on evaluation similartiy (almost) only. What about differences of the search code?

Not being a programmer, I still think that the set of useful evaluation terms is limited and it's basically "predefined" (e.g. as described in chess literature), differences of implementation are more or less small details. But I can imagine that for the search code(s) - a "highly technical" thing as viewn from my limited horizon of these things - there are endless opportunities and variations of how to do it.

In other words, I critizise that it may be said an engine is "65% similar" just because the eval is maybe 61% similar, but the search much different. We are on a wrong path.

Don · Post by **Don** » Fri May 11, 2012 6:11 am

Mike S. wrote:
The test measures the evaluation function more than anything else (...)
That is exactly what I have pointed to, earlier. While I do not doubt that your similarity testing tool is a very good achievement and a step forward, I DO doubt that the exclusion of any engine, based on similarity percentage, should be based on evaluation similartiy (almost) only. What about differences of the search code?

If someone heavily modifies the search, the similarity detector will NOT detect that the program has changed substantially.

However that is one of those "bugs" that I consider a feature. It's easy to make substantial search changes - even small changes that can have a big impact on the search. I can make Komodo search a ply deeper for example without it hurting the ELO much.

By the way, I am not in favor of the test being used to exclude programs from tournaments. However I am in favor of it being used to clear a program or as an investigative tool but not to be taken too seriously until it's better understood.

However if it were used like that you are obviously bothered by the fact that a programmer could modify the evaluation function significantly and get away with it since it does not catch search changes. I would not worry too much about that - the real grunt work is in the evaluation function and the cheaters are not usually going to put a lot of time into evaluation - they took someone else's code in order to not have to deal with writing an evaluation function from scratch and they are lazy.

The good news is that a plagiarist is likely to make substantial search modifications and even improve the program significantly - and yet will still get caught by the test because writing an evaluation from scratch is almost the same as writing a new chess program from scratch. I can build a search function in a day or two that is reasonably good, but it takes months to write a reasonably good evaluation function from scratch.

Not being a programmer, I still think that the set of useful evaluation terms is limited and it's basically "predefined" (e.g. as described in chess literature), differences of implementation are more or less small details. But I can imagine that for the search code(s) - a "highly technical" thing as viewn from my limited horizon of these things - there are endless opportunities and variations of how to do it.

Wow! You have this exactly backwards. It's true the search can be highly technical but as I just said you can build a reasonable search very quickly, but you cannot build a reasonable evaluation function without taking a few months.

In other words, I critizise that it may be said an engine is "65% similar" just because the eval is maybe 61% similar, but the search much different. We are on a wrong path.

I give the evaluation function at least 80% of the weight and the search is almost an insignificant detail in comparison. Now please understand that I'm not saying the search is not critical - it's extremely important and a huge contributor to the strength of a program. But from the perspective of the originality of the program the evaluation is much more important. Remember, we are talking about a CHESS program and the domain specific part of the program is the evaluation function - not the search (except for the move generator itself which is the most mundane part of a program.)

Mike S. · Post by **Mike S.** » Fri May 11, 2012 6:23 am

Thanks for this enlightening reply.

Simplyfying, I conclude: The eval is the fingerprint.

Don · Post by **Don** » Fri May 11, 2012 6:27 am

Mike S. wrote:Thanks for this enlightening reply. Simplyfying, I conclude: The eval is the fingerprint.

Yes, that is a reasonable analogy.

LudiBuda · Post by **LudiBuda** » Fri May 11, 2012 7:11 pm

I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.

Don · Post by **Don** » Fri May 11, 2012 8:25 pm

LudiBuda wrote:I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.

I could hardly fail to disagree with you less.

Graham Banks · Post by **Graham Banks** » Fri May 11, 2012 8:40 pm

Don wrote:
LudiBuda wrote:I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.
I could hardly fail to disagree with you less.

I wonder which engine Alex is the author of.

When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?