So Alpha Zero was a hoax?

jhellis3 · Post by **jhellis3** » Sat Mar 17, 2018 2:50 am

Well, people currently ignore or work around all the defects of present engines because they are still the best we currently have.... why would a NN based engine be any different?

noobpwnftw · Post by **noobpwnftw** » Sat Mar 17, 2018 2:57 am

jhellis3 wrote:Well, people currently ignore or work around all the defects of present engines because they are still the best we currently have.... why would a NN based engine be any different?

But whoever else does something new(convincing or not) don't get this many of fanboys neglecting reality, do they?

jhellis3 · Post by **jhellis3** » Sat Mar 17, 2018 2:59 am

Has there been another engine recently that played a 100 game match vs SF and not dropped a single game? Nevermind the wins...

I will also say if you think properly trained NN evals have holes, human programmed evals likely have at least an order of magnitude more... but w/e...

noobpwnftw · Post by **noobpwnftw** » Sat Mar 17, 2018 3:21 am

jhellis3 wrote:Has there been another engine recently that played a 100 game match vs SF and not dropped a single game?

I will also say if you think NNs have holes, human programmed evals likely have at least an order of magnitude more... but w/e...

There is no doubt that NNs worked very well in Go, but that aura does not get carried over to everything else, so how about this: you can use "something" vs nobookfish and try some different time controls until you get a satisfying set of 100 games when the fate is on your side, so it is the fact and you are not forging it. Just think if you really did this, can you get the same level of public acceptance of your "something" is better than SF?

I'm not pro or against NNs, just to say, exaggerated papers shouldn't be there.

jhellis3 · Post by **jhellis3** » Sat Mar 17, 2018 3:32 am

I can't help but notice you did not answer my question...

I would also note they did not allow A0 to train vs SF.... Imagine the result if they had....

noobpwnftw · Post by **noobpwnftw** » Sat Mar 17, 2018 3:43 am

Answer: there is none. But it is not related to the topic whether A0 was a hoax, and my previous post answered you why it can never be.

If A0 was trained using SF as oracle then it would technically be weaker than what it is now because reinforcement learning can teach NNs more than just simulating what the oracle would say.
And it would need longer training time due to SF won't produce zillions of quality results quickly for it to train without overfitting.

The thing with all the 0 approach is you need to stick to it, normally you can't "kickstart" the learning process or it would either end up throwing out what you fed it by taking extra time to convergence, or you intervened too much that it start to imitate the samples.

jhellis3 · Post by **jhellis3** » Sat Mar 17, 2018 3:54 am

and my previous post answered you why it can never be.

I am sorry but I do not understand want you mean by this sentence. Perhaps you can reword it for clarity.

If A0 was trained using SF as oracle then it would technically be weaker than what it is now because reinforcement learning can teach NNs more than just simulating what the oracle would say.

No. You missed the point. Also, for that to be the case would mean A0 must be stronger than SF prior to training against it

(any time A0 plays against a stronger entity it will learn and improve). Training exclusively vs SF would indeed result in an inferior A0 if it was already superior to SF. Regardless, it would quickly learn all of SF's numerous search and eval holes and exploit them mercilessly in a match. It would quickly reduce SF's score to near 0. At which point you can revert back to conventional training, correcting all of the over-fitting while retaining the exploitation tactics which do not cost Elo. The result would be devastating, while also being the objectively strongest chess playing entity on the planet. And this can be done for any/all non-AI engines (although I expect the vast majority of present day engines will share 90%+ of SF's holes).

noobpwnftw · Post by **noobpwnftw** » Sat Mar 17, 2018 4:20 am

jhellis3 wrote:I am sorry but I do not understand want you mean by this sentence. Perhaps you can reword it for clarity.

You can use anything vs SF and run as many matches as you want until you get a satisfying set of 100 games when the fate is on your side, it is still the fact and you are not forging it.

jhellis3 wrote:Also, for that to be the case would mean A0 must be stronger than SF prior to training against it

This is not necessarily to be true because my understanding of being "strong" is for A0 to play against any opponent not just to perform better against SF. For the former I think training against SF would probably make it weaker.

jhellis3 wrote:Regardless, it would quickly learn all of SF's numerous search and eval holes and exploit them mercilessly.

In a case with SF playing without a book this can also be done directly, just to say when you've collected all this amount of data why bother to train the NNs, just use them to play as a book already.

It is unlikely that NNs can find something systematically wrong and SF can't improve on. If there is something systematically wrong then it means you already have a generalization of what's wrong which is exactly what they need write a code to fix it.

Unfortunately it would more likely to be the case like Go, you don't really know what's wrong and you also don't know why the NNs did better, you can't translate the NNs into code nor the reverse.

jhellis3 wrote:At which point you can revert back to conventional training, correcting all of the over-fitting while retaining the exploitation tactics which do not cost Elo.

You can read my edit on why this is not likely to happen.
There is no "cure" to overfitting, and sometimes we even need it to solve particular problems because we don't have that many samples.

Again I can see your point and you'd probably think I'm against innovation, in fact I have done more "garage experiments" than trolling on the forums to come to such conclusions.
If there is something interesting from what A0 would share I'm more than happy to learn and try, but so far the information provided was just lame in a scientific sense.

Dann Corbit · Post by **Dann Corbit** » Sat Mar 17, 2018 4:37 am

The Alpha Zero team has explained their approach.
It makes sense to me.

They have mind blowing hardware at their disposal.

Their TPUs are a sort of horsepower that we are not used to.
It is like having a boatload of high end epyc chips and the SMP loss is small.

If you throw enough energy at a difficult problem, the problem may crumble.

Now, it is possible that something was faked. Faked things do happen in science. But I guess that this is not one of them.

In order to assume that something has been faked or doctored, I would normally want real evidence that shows this. Otherwise, it is simpler to assume that everyone is telling the truth.

But I admit, I can be a little naive.

David Xu · Post by **David Xu** » Sat Mar 17, 2018 4:43 am

noobpwnftw wrote:
David Xu wrote:The issue with that interpretation is that DeepMind is not a PR company. How, concretely, does publishing a misleading preprint benefit them? They are a research group, and their funding is reliant on shareholder approval, not public opinion--do you think that making claims they can't back up is a sustainable long-term strategy?

As far as the efficacy of machine learning techniques is concerned, no one knows for sure which tasks they work well on and which they don't, which is precisely why experimentation is necessary. It's not clear to me why you (Michael), Bojun, and so many others seem to spurn said experimentation, to the point of postulating what essentially amounts to a conspiracy theory.

At this point it isn't even about AlphaZero. I'm honestly curious: is it that inconceivable to you that a reinforcement learning based approach could outstrip the decades-old approach of conventional chess engines? I'm honestly not seeing where you and Bojun are pulling all of this confidence from; it seems entirely unfounded to me.

EDIT: I see that Bojun mentioned something about FineArt and overfitting in the LCZero thread; I'm replying to that here in order to condense things. Overfitting is a known issue in machine learning of all types, not just this specific case, and is generally addressable by tuning the training hyperparamters until the net no longer overfits. I'm not sure why Bojun is touting this as some kind of evidence against the effectiveness of neural networks.
I fail to see why would I care if this particular preprint from the research group is more of a PR approach to please their shareholders in the first place?

Because the shareholders are going to (eventually) want some kind of concrete demonstration of DeepMind's claims. Blowing hot air does them no good, since the opinion of the general public has essentially zero effect on how much funding they receive.

And I don't see any conspiracy theory here, if they want to do the experiments then do the experiments and based on what the preprint said I find it hard to be convinced that the comparisons made between the outcome of their experiment and SF was properly measured.

I refer you to the title of this thread. "So Alpha Zero was a hoax?" sounds pretty conspiracy theory-esque to me. If this is not your view, then I apologize for lumping you in with everyone else, but frankly, if you don't want the label "conspiracy theorist", you're going to need to explain why you called AlphaZero an "attention-seeking attempt" in the LCZero thread earlier.

In order to properly prove that NNs perform better than SF in general, which is the fundamental thing about the experiments, more matches should be done while not necessarily to publish all details, just the stats are enough, and a book needs to be used on the SF side to introduce diversity, yet, for them being such a research group I feel disturbed to see such claims being made so unscientifically.

I fail to see what is unscientific about the preprint. They described the architecture of the network itself, they described the training procedure, and they described the results against a specific setup for Stockfish--results which are in principle replicable by a third party, given the information they provided. That's what it takes for a result to be "scientific". More varied experiments would have been nice, certainly, but that fact hardly invalidates the results of the experiment they did perform.

I had kept saying that NNs has a potential in certain parts of chess programs in general even before A-whatsoever, you can do a search of my posts here if you want, but how do you define "conventional"? As far as I can tell, automatic parameter tuning is more or less training on a fixed model.

Certainly if you are not against NNs in principle I'm willing to give you the benefit of the doubt, but then one must ask: why are you so reluctant to believe that DeepMind accomplished what they said they accomplished? If you agree that reinforcement learning is capable of superseding conventional alpha-beta approaches in principle, then why the difficulty in believing that it was accomplished in a single specific case? More directly: what is so unbelievable to you about this result that you would sooner postulate deception on the part of Google DeepMind than take them at their word?

If one know anything about programming then he probably wouldn't draw such a line and label them as "decades old", if you are just picking up on the PVS search algorithm, care to tell me the age of MCTS used in your new reinforcement learning based approach?

Monte-Carlo tree search is certainly a well-known technique that has existed for years, but the technique of using it as a policy improvement operator is, as far as I'm aware, a novel one. This technique, of course, the critical aspect of DeepMind's approach, since it's what allowed them to generate such high-quality training data.

I brought the topic up about overfitting because it is a common issue with NNs when they don't get the zillions of samples they need, and I name a particular case with evidence while you pass it with "it can be solved in general". If it is just that easy to solve then why it become a "known issue in machine learning of all types"?

Also are you suggesting that should we ignore all the problems NNs may have just because they are more "effective"?

What I am saying is that the issue of overfitting is well-known, so bringing it up as a specific argument against neural networks in chess is not a particularly strong objection. In particular, there are several easy ways to address overfitting: (1) decreasing the learning rate, (2) performing value-preserving transformations on the data in order to artificially increase your number of training examples, or (3) simply acquire more data. These approaches are not always viable, of course, which is why overfitting is still a problem in general, but considering that all three of them are possible in games such as Go and chess, bringing up overfitting as if it's a massive issue is disingenuous at best.

So Alpha Zero was a hoax?

Re: So Alpha Zero was a hoax?

Re: So Alpha Zero was a hoax?

Re: So Alpha Zero was a hoax?

Re: So Alpha Zero was a hoax?

Re: So Alpha Zero was a hoax?

Re: So Alpha Zero was a hoax?

Re: So Alpha Zero was a hoax?

Re: So Alpha Zero was a hoax?

Re: So Alpha Zero was a hoax?

Re: So Alpha Zero was a hoax?