The paper reports they couldn't get learning to work with 5 layers. There are a couple of easy tricks that make learning 5 layers trivial: Careful initialization of weights and using skip connections (see the ResNet paper). Other useful tricks are batch normalization and weight normalization.
I have trained CNNs for go with 30 layers using just the first two tricks.
ConvChess CNN
Moderators: hgm, Rebel, chrisw
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
-
- Posts: 31
- Joined: Fri Nov 25, 2016 10:14 am
- Location: Singapore