My failed attempt to change TCEC NN clone rules

crem · Post by **crem** » Thu Sep 19, 2019 12:25 pm

Back to the idea of source code similarity. I had another thought, unrelated to what I wrote above (also from the doc!).

When there is a question whether engine X borrowed something essential from Y, one should ask opinion from Y’s authors in any case. It’s weird that this didn’t happen (neither with DeusX nor with Allie case)!

One popular example is "tablebase code that everyone uses anyway", or movegen.
My opinion is that even in this case, if the author of Y says that they think that this part of an engine is essential to the engine (e.g. if this particular well optimized implementation is important advantage of their chess engine), then it’s not right to dismiss it saying “no, we think this your code is actually not that clever or important, so we’ll take it, but our engine is still 100% original and unique”.

So I think that protocol (if X borrows code from Y) should be the following:

1. You ask Y’s author, whether borrowed code is essential for engine’s strength (not whether you allow to use this code, it’s already covered by GPL; also not whether you allow engine to participate in TCEC, it’s covered by TCEC rules, so “I think it’s a clone but I don’t object” and “I think is not a clone, but I don’t allow” are not allowed).
If author of Y thinks the code is not essential, it’s all fine, otherwise go to step 2.

2. Authors of X and Y try to discuss and find where disagreement comes from. Maybe they run some tests. I expect most of arguments will stop at this point and there will be an agreement and friendship, and no conflict at all! But if not, go to step 3.

3. TCEC “rule committee” decides himself. But I suggest that unless it’s a clear case that it’s just the author of Y just doesn’t want engine X to participate no matter what (i.e. claiming that something like command line flags parsing code is essential for chess strength), it’s fair to accept “source engine’s Y” side.

dkappe · Post by **dkappe** » Thu Sep 19, 2019 1:01 pm

Crem,

If you look at the ScorpioNN graphs, they are also quite similar to the Leela kibitzer. I would suggest this has more to do with the nature of mcts.

crem · Post by **crem** » Thu Sep 19, 2019 1:23 pm

dkappe wrote: ↑Thu Sep 19, 2019 1:01 pm Crem,

If you look at the ScorpioNN graphs, they are also quite similar to the Leela kibitzer. I would suggest this has more to do with the nature of mcts.

I'm not sure, to me ScorpioNN graphs look much more "jumpy" and it's not easy to confuse it:

Blue-Lc0, Black-Allie, White-ScorpioNN

There are some games where ScorpioNN is much smoother, maybe there it's easier to confuse, and also maybe knowing that Allie's eval is inflated helped to get that "28 of 30" score (although I tried to "ignore" it, but obviously it's hard and it makes it less blind guess).

Actually to me evals of Stoofvlees generally looked much more similar to Lc0 than ScorpioNN, but still very far from Allie which usually mirrors the shape in much details. And it's not so much about similarity of smoothness and general shape, but rather that usually both Allie and Lc0 see something when none of other engines see (other engines e.g. see it a few moves later), and vice versa.
To me it seemed that ScorpioNN and Stoffvlees pick moments when to change eval more independently.

But I agree that it would better be with some proper stats rather than me with confirmation bias in my head trying to see something with naked eyes.

crem · Post by **crem** » Thu Sep 19, 2019 1:29 pm

dkappe wrote: ↑Thu Sep 19, 2019 1:01 pm

dkappe, what uniqueness rule would you suggest to TCEC?

Like I said already, for me I think it's the best to remove all mentions of disallowing clones/uniqueness completely from the rules, and just state that TD (aka "rule committee") decides what's interesting to watch. Basically like CCCC.
That way all those DeusXes can participate, and TCEC fences themselves from clones question completely.

crem · Post by **crem** » Thu Sep 19, 2019 1:58 pm

dkappe wrote: ↑Thu Sep 19, 2019 1:01 pm I would suggest this has more to do with the nature of mcts.

Ah, and also when I asked Allie's author that question (why are eval graphs so similar), he suggested that he's a wrong person to talk about that and I should talk to jjosh (author of "Stein" part of Allie+Stein), implying that it's not because of the algorithm (which is claimed to be completely different, not just MCTS), but because of the weights file.

gonzochess75 · Post by **gonzochess75** » Thu Sep 19, 2019 4:15 pm

crem wrote: ↑Thu Sep 19, 2019 12:25 pm Back to the idea of source code similarity. I had another thought, unrelated to what I wrote above (also from the doc!).

When there is a question whether engine X borrowed something essential from Y, one should ask opinion from Y’s authors in any case. It’s weird that this didn’t happen (neither with DeusX nor with Allie case)!

In the case of Allie, yes it did happen. I came into the various discord channels of Lc0 and asked about the backend code and said I was working on an engine that uses the backend. You even helped to answer one question.

Since Allie has been playing in the tournaments I have asked Ankan and he has said - most recently yesterday - that he is happy that Allie is using his cudann backend code and wished congratulations of success in TCEC.

Most of what I would say on this thread has already been answered by others including author of Winter engine so I'll just leave it to them.

As for engine similarity test by move picking evaluation I don't think it a good idea and find it pretty silly. Apart from ability to game it, I just don't think it is all that compelling of a criteria. I am not sure who it is supposed to satisfy or what problem it is attempting to solve. Maybe you could change my mind - and others - if you actually tried to implement it in a robust way and used actual objective data analysis rather than anecdotal graphs which you eyeball and declare similar which I don't think shows much of anything.

The reason I said you should go and talk to JJosh is because to whatever extent that AllieStein move similarity does match Lc0 I would assume - maybe wrongly - that it is largely a feature of the network and the fact that stein uses many t30 games in training. Since I'm not responsible for the creation or development of stein nor know much about neural net training I don't have much original to add. Plus, there has been a lot of anger directed my way about it lately and I'm getting pretty fatigued.

In the end, TCEC rules are up to the admins there and I like their tournament and think they do a good job and I'm happy that they allow a forum to showcase the time and effort I have spent developing Allie for others to enjoy and appreciate. I've done my very best to give credit to the Lc0 project and have from the start. I really don't understand what the problem is other than Allie is competing against Lc0 and having some success lately and that is making some more upset than when Lc0 was beating AS handily.

Robert Pope · Post by **Robert Pope** » Thu Sep 19, 2019 5:10 pm

gonzochess75 wrote: ↑Thu Sep 19, 2019 4:15 pm In the end, TCEC rules are up to the admins there and I like their tournament and think they do a good job and I'm happy that they allow a forum to showcase the time and effort I have spent developing Allie for others to enjoy and appreciate. I've done my very best to give credit to the Lc0 project and have from the start. I really don't understand what the problem is other than Allie is competing against Lc0 and having some success lately and that is making some more upset than when Lc0 was beating AS handily.

From my perspective, in order to have a fair tournament, you don't want the same entity entered multiple times. That goes for any engine. All the other engines get put at a disadvantage and are less likely to be able to win/promote, not because they are worse engines, but because they get fewer attempts. That just isn't right.

If you agree with the above, then you need to have something in place to make sure engines are "different". It's not as big a deal when you just have a couple duplicates like with Lc0 and DeusX, and I don't have an issue with different groups that are trying different approaches, but when you can take GPL code and 20 people can churn out 20 flavors of the same engine, you need something in place to limit them, or you end up with just a ranking tournament of clones.

And it would be nice if those rules were clear and up front. I'm hoping to finally be able to get into this next spring, and my first attempt will probably be just to train my own Lc0 before going my own way. But eventually I would want to be able to participate in TCEC and other events, so I want to understand what criteria I need to meet as a potential participant.

MikeB · Post by **MikeB** » Thu Sep 19, 2019 5:49 pm

crem wrote: ↑Thu Sep 19, 2019 12:19 pm One of the ideas in the document that I wrote to TCEC was not to compare source code at all, and only look at some kind of “style” or “move” or “eval” similarity (I know all the arguments that those metrics are possible to game, but let’s imagine for a moment we found a way to make it robust).

I.e. if someone changed 1 line in Stockfish, and it made it’s play completely differently (but still strong), let it into TCEC as not clone.

On the other hand, if there are 10 fully independent AlphaZero clones which make the same moves (i.e. if it happens that all they will converge to the same thing), or, hypothetically, 10 different implementations of perfect chess (“32-men tablebase”), there’s no point to have them all.

I don’t think popular bestmove-based similarity test is robust enough. Instead we could for example make engines eval 100 positions of different kinds/imbalances (e.g. picked from games archives when engines don’t agree in eval), sort them by eval value, and then look at permutation similarity between different engines.
Unless there’s explicit scramble of eval, it’s pretty hard to game. And when there’s scramble, it will be noticeable in eval graph.

Back in March I thought that that idea would resolve Allie case by declaring it as unique despite using Lc0 code, but now I think it would actually not pass such test:

In TCEC during all games there are kibitz engines, “blue leela” and “red stockfish” (named after the colors of the lines in graphs). That makes it possible to see eval of Leela for all games of the DivP.

The shape of Allie’s eval graph is very similar to Lc0, that why I think now it wouldn’t pass that difference test. There is no such similarity between other engines, including ones which use NN (scorpioNN, Stoofvlees) or MCTS (KomodoMCTS). You can go to https://www.tcec-chess.com/archive.html, blindly click on random games on the right, and then play “guess the Allie” game by looking at the graph (for me, out of 30 attempts, I failed twice, misclassified Stoofvlees as Allie here, and didn’t recognize Allie here).

Examples (some cropped because when scaled up to +/-10, most of the details are collapsed to zero, also note that Allie still uses old Lc0 win%-to-centipawn conversion formula, so its evals are inflated from zero, but the shape is very similar)):

Allie is white (blue is Leela the kibitzer):

This game is Allie vs LCZero, here Allie is white, blue is Leela the kibitzer, and black is Leela the player:

Allie is black (blue is Leela the kibitzer):

You certainly have made very strong argument that Allie is a clone , regardless of what anyone points to as being different. And NN (scorpioNN, Stoofvlees) or MCTS (KomodoMCTS) are not clones.

Not sure if it will change anything under the current infatuation state with anything neural.

gonzochess75 · Post by **gonzochess75** » Thu Sep 19, 2019 5:55 pm

Robert Pope wrote: ↑Thu Sep 19, 2019 5:10 pm If you agree with the above, then you need to have something in place to make sure engines are "different". It's not as big a deal when you just have a couple duplicates like with Lc0 and DeusX, and I don't have an issue with different groups that are trying different approaches, but when you can take GPL code and 20 people can churn out 20 flavors of the same engine, you need something in place to limit them, or you end up with just a ranking tournament of clones.

Allie is not a clone of Lc0. By far the most important part of code that Allie uses from Lc0 project is the cudann backend that Ankan wrote. Here is a direct quote from Ankan when I asked him if he had any concerns about Allie using this code and playing in TCEC:

"Hi. No, i don't have any issues. In fact I am happy that my code is used in multiple engines

Btw, congratulations on great performance by Allie at TCEC"

That code is great, very important and all praise to Ankan for writing it, but that code all by itself is a far cry from a chess engine. So you see Allie is not a clone or a copy of Lc0. And other developers are encouraged to use that cudann backend all they want to make actual NN chess engines. Feel free and enjoy.

That really should put this issue to bed IMO.

dkappe · Post by **dkappe** » Thu Sep 19, 2019 6:13 pm

MikeB wrote: ↑Thu Sep 19, 2019 5:49 pm
You certainly have made very strong argument that Allie is a clone , regardless of what anyone points to as being different. And NN (scorpioNN, Stoofvlees) or MCTS (KomodoMCTS) are not clones.

Not sure if it will change anything under the current infatuation state with anything neural.

If you run SIMEX on neural network engines (other than Stoofvlees), you get similarity of over 60%. That’s because they all derive from the alpha zero pseudo code which dictates certain neural network structures. They’re all “conceptual” clones of alpha zero, not lc0, which is itself a clone (though here clone is not a really useful term). Even a from scratch 100 line python implementation of mcts will produce the the same moves on the same net once multithreading is disabled. It’s in the nature of deterministic algorithms.

If you find this sort of hand waving argument from pictures compelling, I have a bridge to sell you.

My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules

Re: My failed attempt to change TCEC NN clone rules