World Computer Chess Championship ?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
rvida
Posts: 481
Joined: Thu Apr 16, 2009 12:00 pm
Location: Slovakia, EU

Re: World Computer Chess Championship ?

Post by rvida »

Adam Hair wrote: ... This belief is based on statistics, the totality of my data,
= good
Adam Hair wrote: and on Mark Watkins' RE efforts.


= bad

This opens a possibility of [an implicit] hidden fitting based on subjective data input.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: World Computer Chess Championship ?

Post by Rebel »

Adam Hair wrote:
bob wrote:
Rebel wrote:In addition I can tell the following, some programmer (who wants to remain unknown) has done the following experiment:

1. Take the Fruit 2.1 source and modify each of Fruit's EVAL equal to Rybka 1.0

2. Similarity detector reported an only 4% increase.

This is BAD NEWS for cloners who think they can take an existing source code modify all eval values (some even use multiplication) and think they can get away with it. Playing style is hard to remove from an engine.
Try drastically changing null-move, or LMR threshold in Fruit, or a few very SIMPLE things like those. Those things greatly influence move selection.
I tried this last November:
Adam Hair wrote:For what it is worth, here are the results of the test Miguel referred to. I included a pass with history pruning and null move turned off, at 100ms and 1s.

Code: Select all

sim version 3

  Key:

  1) Fruit 2.1 (time: 100 ms  scale: 1.0)
  2) Fruit 2.1 (time: 1000 ms  scale: 1.0)
  3) Fruit 2.1_history_pruning_off (time: 100 ms  scale: 1.0)
  4) Fruit 2.1_history_pruning_off (time: 1000 ms  scale: 1.0)
  5) Fruit 2.1_historypruning_nullmove_off (time: 100 ms  scale: 1.0)
  6) Fruit 2.1_historypruning_nullmove_off (time: 1000 ms  scale: 1.0)
  7) Fruit 2.1_nullmove_off (time: 100 ms  scale: 1.0)
  8) Fruit 2.1_nullmove_off (time: 1000 ms  scale: 1.0)

         1     2     3     4     5     6     7     8
  1.  ----- 63.94 89.55 64.45 72.42 68.40 76.37 67.41
  2.  63.94 ----- 61.97 81.66 57.42 69.65 58.85 72.78
  3.  89.55 61.97 ----- 63.44 74.11 67.12 77.17 65.62
  4.  64.45 81.66 63.44 ----- 58.47 71.35 59.86 73.89
  5.  72.42 57.42 74.11 58.47 ----- 64.08 86.37 62.90
  6.  68.40 69.65 67.12 71.35 64.08 ----- 64.86 81.51
  7.  76.37 58.85 77.17 59.86 86.37 64.86 ----- 65.61
  8.  67.41 72.78 65.62 73.89 62.90 81.51 65.61 -----
10x search time has the biggest effect on the move selections. As you can see, history pruning and null move off at 100ms and 1s are both still very similar (as based on earlier observations) to normal Fruit at 100ms and the largest difference is both off at 100ms and normal Fruit at 1s.

None of this is extraordinary, if I have understood everything I have read on these search techniques. Turning both off has the same effect on Elo as approximately halving the search time (according to other reports and some testing I've done). So, it is not surprising 10x increase in search time has a larger effect. What might be surprising (or maybe not :) ) is that move selection and Elo are not strongly coupled, at least in terms of intra-engine comparison. If turning off both history pruning and null move has the same effect on Fruit as it does on Crafty, then it is a 100+ Elo reduction. With a 1 second search per position, normal Fruit and both off Fruit share ~70% of the same moves (using the positions in the sim test). In context with all of the similarity testing I have done, this does not indicated a large difference.
And by large difference, I mean that ~70% indicates we are looking at related engines. So, even turning off null move and LMR in Fruit 2.1 does not remove the move selection characteristics (when judged in this manner).
Bob with his enormous experience ought to know better that search changes may affect move choices but never are able to bring back the percentage below 60, EVAL is just too dominant for that. For curiosity reasons I ran Bob's suggestions with SIM.

Code: Select all

sim version 3

  Key:

  1) ProDeo 1.74 (default)      (time: 500 ms  scale: 1.0)
  2) ProDeo 1.74 (lmr=off)      (time: 500 ms  scale: 1.0)
  3) ProDeo 1.74 (nullmove=off) (time: 500 ms  scale: 1.0)

         1     2     3
  1.  ----- 74.28 75.84
  2.  74.28 ----- 70.92
  3.  75.84 70.92 -----
Considerable loss in strength, obviously the programs are related.

Used parameters

Code: Select all

[Search Technique = NORMAL]	// turns off null-move
[HISTORY redu = OFF]       	// turns off LMR
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: World Computer Chess Championship ?

Post by Adam Hair »

rvida wrote:
Adam Hair wrote: ... This belief is based on statistics, the totality of my data,
= good
Adam Hair wrote: and on Mark Watkins' RE efforts.


= bad

This opens a possibility of [an implicit] hidden fitting based on subjective data input.
You are right. The one thing that I am lacking is how changes in evaluation terms, be it changes in the parameters or additions/deletions of components to the eval function, affect move selection. At the moment, I have a little bit of my own data, what you have reported about the effect of PSTs with Critter, the comparisons of Rybka to Fruit and to Ippolit, and some anecdotal evidence from other authors not directly related to the similarity comparison. I need to accumulate more data so that the subjective input is not needed as a crutch (by me, a non-programmer).
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: World Computer Chess Championship ?

Post by bob »

Laskos wrote:
bob wrote:
Ivanhoe is a robolito derivative. So it IS a "clone". Ditto for Firebird, Houdini, and all the other "kin". Houdini 1.5 or so was a PERFECT match with one of the robo's. You can find the specifics on open chess. And it was NOT "3 lines of code"...

I've not accused Stockfish of anything. If you think they chose the "magic right threshold" for CSVN, glad you are happy. I consider their "number" to be arbitrary and without any particular justification other than "it feels right."
What do you know what Robbolito, IvanHoe, Ippolit are? Nobody knows for sure. Suddenly, after repeatedly stating that you did not study them, you are the best specialist in Ippos, more so, calling them collectively clones. I will post a thread "Crafty is a proven clone of Fruit and Belle", fine? About StockFish, you yourself stated that it needs to be investigated. Komodo has to be investigated? Could you explain why not, besides that Don in your silly appreciation is a good, old guy? Joker of HGM has to be investigated or it is too crappy even for you? Look guy, close your useless sources of the clone called Crafty, and leave us for good. CSVN "magic" 60% is MUCH better than your sick appreciation of who is good guy and who is not. Understood?

Kai
Read my lips. Ippolit came first. Ivanhoe, firebird, and houdini are acknowledged derivatives. So it doesn't MATTER what ippolit is/was. What matters is the above programs are NOT original because they were derived from ippolit...

Simple enough???

As far as the rest of your nonsense, I am not sure it even deserves a comment. If someone protests Don's program, it will be investigated just like I was, like Vas was, etc. Ditto for HGM's programs. You've not seen "ME" say anything about "who is good and who is bad" except for the known/proven authors of unoriginal programs. You need a clue, not me...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: World Computer Chess Championship ?

Post by bob »

michiguel wrote:
bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
You seem to be very srupulous with regard to this test, but you yourself (and some others) are accusing directly or not a lot of authors unscrupulously. Generally speaking, this test is say >95% correct in positive detections, while your accusations are pretty random. Besides that, the engines will be labeled only as suspicious, and the inspection of the sources will establish the copying.

Kai
Who have I accused? Vas? Lots of supporting evidence. Houdart? Ditto. Beyond that, the only ones _I_ have accused have all been proven. El Chinito, Le Petite. etc... So please feel free to show me my "random accusations" that seem to be more imagination on your part than anything else. As a control experiment, I wonder what would happen on (say) the ponder hit data if human games were used? Should NOT produce suspicious behaviour, would you agree?
You have accused the engines like IvanHoe or even StockFish of possible cloning (or in need of investigation), besides Houdini (which if took something, took from the open domain, and that thing in the open domain was never proven anything). Do you have something better than hearsay on OpenChess, three lines of code and mob lynching which is so dear to you? Even for copyright violation, significant chunks of code must be found as copied.
How ponder hit works for human games? It would be great to use something analogous to sim on humans, you will see that it's much more reliable on correct positives than you imply.
After reading you and others in the same vein, and knowing that you have influence in ICGA, CSVN 60% tourney rule seems much more adequate than your views.

Kai
Ivanhoe is a robolito derivative. So it IS a "clone". Ditto for Firebird, Houdini, and all the other "kin". Houdini 1.5 or so was a PERFECT match with one of the robo's. You can find the specifics on open chess. And it was NOT "3 lines of code"...

I've not accused Stockfish of anything. If you think they chose the "magic right threshold" for CSVN, glad you are happy. I consider their "number" to be arbitrary and without any particular justification other than "it feels right."
After you asked something in this thread, Adam was gracious enough to provide real data, not guesswork. He has done a thorough analysis with 3 digit number engines (not 6 or 7). You should read it. You are coming very late to the discussion, because many of the questions you ask have been debated, investigated, and answered with real experiments. Before you criticize, you should familiarize yourself with the topic and the data. The 60% (with the set he used) represents 5 standard deviations.

Miguel
All well and good. IF you believe the positions and programs used are representative of the set of all programs...

It is STILL arbitrary, when you think about it...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: World Computer Chess Championship ?

Post by bob »

mar wrote:
Rebel wrote: The ponder hits were first introduced by Kai and then used in the Chessbase article of Soren Riis. I did not pay (much) attention back then because I felt that search had too much influence as the data was extracted from CCRL games. Similarity detector is different, with short time controls like 1/10 of a second you are limiting the influence of search enormously, even more if you run the test at low fixed depths, you get more pure eval the lower you go. And yet (much to my surprise) ponder-hits and similarity detector tell the same story, the same suspects are listed.

I think no one yet can claim expert status in this new and unexplored area and more research is needed but the results are too good to be ignored and I am hoping something very good may come out of it once we understand all the in's and out's better.
I agree Ed, what bothers me about ponder hits are sequences of forced moves. I would believe ponder hit ratio even less than similarity tests. But that's only theory so I may be wrong.

EDIT: one of the older versions of my engine had >50% ponder hit ratio with 15 other engines (according to CCRL), while the latest version only has 11-17%, though still the same breed :) So I wonder what informative value ponder hits really have... Or pehaps I'm missing something?
What do you draw from these numbers? (Crafty vs a couple of Houdini clones on ICC, 15m + some seconds of increment.

48/78 (48 predicted correctly in a 78 move game)
65/82
68/122
36/50
31/48
32/55
47/74
44/63

I've seen numbers like those since the 80's and don't draw any conclusions from them, other than my opponent is fairly strong...
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: World Computer Chess Championship ?

Post by michiguel »

bob wrote:
michiguel wrote:
bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
You seem to be very srupulous with regard to this test, but you yourself (and some others) are accusing directly or not a lot of authors unscrupulously. Generally speaking, this test is say >95% correct in positive detections, while your accusations are pretty random. Besides that, the engines will be labeled only as suspicious, and the inspection of the sources will establish the copying.

Kai
Who have I accused? Vas? Lots of supporting evidence. Houdart? Ditto. Beyond that, the only ones _I_ have accused have all been proven. El Chinito, Le Petite. etc... So please feel free to show me my "random accusations" that seem to be more imagination on your part than anything else. As a control experiment, I wonder what would happen on (say) the ponder hit data if human games were used? Should NOT produce suspicious behaviour, would you agree?
You have accused the engines like IvanHoe or even StockFish of possible cloning (or in need of investigation), besides Houdini (which if took something, took from the open domain, and that thing in the open domain was never proven anything). Do you have something better than hearsay on OpenChess, three lines of code and mob lynching which is so dear to you? Even for copyright violation, significant chunks of code must be found as copied.
How ponder hit works for human games? It would be great to use something analogous to sim on humans, you will see that it's much more reliable on correct positives than you imply.
After reading you and others in the same vein, and knowing that you have influence in ICGA, CSVN 60% tourney rule seems much more adequate than your views.

Kai
Ivanhoe is a robolito derivative. So it IS a "clone". Ditto for Firebird, Houdini, and all the other "kin". Houdini 1.5 or so was a PERFECT match with one of the robo's. You can find the specifics on open chess. And it was NOT "3 lines of code"...

I've not accused Stockfish of anything. If you think they chose the "magic right threshold" for CSVN, glad you are happy. I consider their "number" to be arbitrary and without any particular justification other than "it feels right."
After you asked something in this thread, Adam was gracious enough to provide real data, not guesswork. He has done a thorough analysis with 3 digit number engines (not 6 or 7). You should read it. You are coming very late to the discussion, because many of the questions you ask have been debated, investigated, and answered with real experiments. Before you criticize, you should familiarize yourself with the topic and the data. The 60% (with the set he used) represents 5 standard deviations.

Miguel
All well and good. IF you believe the positions and programs used are representative of the set of all programs...
Please read Adam's report. Your opinion carries weight on other people's opinion, because of your experience, but in this case is completely uninformed.

Miguel

It is STILL arbitrary, when you think about it...
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: World Computer Chess Championship ?

Post by Rebel »

bob wrote:
michiguel wrote:
bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
You seem to be very srupulous with regard to this test, but you yourself (and some others) are accusing directly or not a lot of authors unscrupulously. Generally speaking, this test is say >95% correct in positive detections, while your accusations are pretty random. Besides that, the engines will be labeled only as suspicious, and the inspection of the sources will establish the copying.

Kai
Who have I accused? Vas? Lots of supporting evidence. Houdart? Ditto. Beyond that, the only ones _I_ have accused have all been proven. El Chinito, Le Petite. etc... So please feel free to show me my "random accusations" that seem to be more imagination on your part than anything else. As a control experiment, I wonder what would happen on (say) the ponder hit data if human games were used? Should NOT produce suspicious behaviour, would you agree?
You have accused the engines like IvanHoe or even StockFish of possible cloning (or in need of investigation), besides Houdini (which if took something, took from the open domain, and that thing in the open domain was never proven anything). Do you have something better than hearsay on OpenChess, three lines of code and mob lynching which is so dear to you? Even for copyright violation, significant chunks of code must be found as copied.
How ponder hit works for human games? It would be great to use something analogous to sim on humans, you will see that it's much more reliable on correct positives than you imply.
After reading you and others in the same vein, and knowing that you have influence in ICGA, CSVN 60% tourney rule seems much more adequate than your views.

Kai
Ivanhoe is a robolito derivative. So it IS a "clone". Ditto for Firebird, Houdini, and all the other "kin". Houdini 1.5 or so was a PERFECT match with one of the robo's. You can find the specifics on open chess. And it was NOT "3 lines of code"...

I've not accused Stockfish of anything. If you think they chose the "magic right threshold" for CSVN, glad you are happy. I consider their "number" to be arbitrary and without any particular justification other than "it feels right."
After you asked something in this thread, Adam was gracious enough to provide real data, not guesswork. He has done a thorough analysis with 3 digit number engines (not 6 or 7). You should read it. You are coming very late to the discussion, because many of the questions you ask have been debated, investigated, and answered with real experiments. Before you criticize, you should familiarize yourself with the topic and the data. The 60% (with the set he used) represents 5 standard deviations.

Miguel
All well and good. IF you believe the positions and programs used are representative of the set of all programs...

It is STILL arbitrary, when you think about it...
Would you say the same thing if Rybka 1.0 would top Adam's list?

It's a mean question I know and not only directed at you. The course of this discussion, the value and acceptance of similarity detector is closely related to the ICGA verdict and IMO is a major obstacle that is unconsciously ignored and therefore it's good to be open about it.