catastrophic forgetting

Daniel Shawul · Post by **Daniel Shawul** » Thu May 09, 2019 6:16 pm

I am trying to train neural networks for my chess variant playing program Nebiyu. It supports > 10 chess variants and can also play Go/Hex/Reversi/Amazons/Checkers. Focusing only on the chess variants, if I set out to train a separate network for each variant, it will be cumbersome. So I am thinking to train one network for all chess variants. However, there is a well known issue, which is NN tend to forget old data
while learning new information, so called "catastrophic forgetting". That is if I train the network first for standard chess, and then for suicide chess it will essentially remember only the weights necessary for playing suicide chess well. One way to solve this is to play training games for all variants simultaneously and train the network with all variants simultaneously. I could feed in the variant type by an input plane (which I am planning to do). Deepmind has already investigated this problem with their Atari and came up with a different solution that allows you to train games in sequence and be able to remember past data. I don't fully understand this paper but here it is : https://arxiv.org/pdf/1612.00796.pdf .
I think this is the correct path towards general AI i.e. one brain for multiple tasks, instead of a specialized brain (neural network) for each task.
Any thoughts ?

Michael Sherwin · Post by **Michael Sherwin** » Thu May 09, 2019 7:30 pm

First off pretend it is not me making this post.

Heuristics if I am correct seems to be the basis for the evaluation of a NN. Heuristics are based on macro statistics, i.e. results. However underneath results are lower level statistics like check and checkmate and a slew of other countable things that have meaning. It is my opinion that NN's would be more successful if the lower level statistics was at the basis of their decisions. It might then then be more generalized and able to work in more cases. But that is just a guess on my part as I know next to nothing about NNs.

chrisw · Post by **chrisw** » Thu May 09, 2019 7:36 pm

We could implement this idea in a generalized transfer learning with a full-on weight freeze policy (also a bit easier to understand and implement than the papers complicated weight change reductions).
Calling apon the other Google work, the massively trained object recognition tower, which identifies something like 1000 various objects. I think somebody won a cats ‘n dogs recognition competition by using that frozen tower as a base (arguing it had plenty of useable transferable pattern detection knowledge) and stacking some more trainable layers on top.

We could try this for chess or chess like games even. A massively trained tower, and then individual game layers. Different programmers could play with the knowledge base.
Similar idea?

Actually, I did wonder if the google ready trained tower might contain chess useful patterns, but that’s probably wild dreaming and the pixel base of the google tower out of scale too much.

Evert · Post by **Evert** » Thu May 09, 2019 11:08 pm

I think this is a really interesting project, with a lot of potential. Also sadly one that I lack the time and resources to look into myself.
I’ll be very interested to hear about your results.

Daniel Shawul · Post by **Daniel Shawul** » Fri May 10, 2019 3:57 pm

A less complex problem is to train one neural network for the same game but on different board sizes. This has been done for Go which is played on board sizes of 9x9, 13x13, and 19x19. This paper https://arxiv.org/abs/1902.10565 uses an input plane mask to indicate on-board locations. They go further than that and apply the mask after each convolution step and similar operations. I am gonna first try and see if providing the mask suffices.

@chris The subject of transfer learning is very interesting because it generalizes learning to other similar domains than what it is orignally trained for.
I think the gist of the Deepmind paper is to figure out important weights for playing a game and slow down learning rate for them so that they are not forgotten quickly.

AlvaroBegue · Post by **AlvaroBegue** » Fri May 10, 2019 4:44 pm

Is training for all the variants at the same time an option? I mean, have each minibatch contain examples from each of the variants.

catastrophic forgetting

catastrophic forgetting

Re: catastrophic forgetting

Re: catastrophic forgetting

Re: catastrophic forgetting

Re: catastrophic forgetting

Re: catastrophic forgetting