Learning time growing exponentially with number of training examples

Henk · Post by **Henk** » Wed Sep 12, 2018 8:31 pm

After three days tuning. Using 100000 examples. Still playing bad.

0.152658045592678  0.146846164052304
0.152672921563584  0.146796660198495
0.152628787384212  0.146750211537819
0.152533272772998  0.146706348167116

[pgn] 1. Ng1-f3 { 2.005 0.13 } { } 1. f7-f6 { 2.008 -0.19 } { } 2. c2-c4 { 1.975 0.12 } { } 2. d7-d5 { 1.968 -0.17 } { } 3. Qd1-c2 { 1.934 0.13 } { } 3. Bc8-f5 { 1.945 -0.11 } { } 4. e2-e4 { 1.902 0.08 } { } 4. Bf5-e4 { 1.905 -0.06 } { } 5. d2-d3 { 1.871 -0.09 } { } 5. Be4-f3 { 1.871 -0.06 } { } 6. g2-f3 { 1.844 0.04 } { } 6. Nb8-c6 { 1.847 -0.09 } { } 7. c4-d5 { 1.808 0.10 } { } 7. Nc6-b4 { 1.864 -0.03 } { } 8. Qc2-c5 { 1.780 -0.02 } { } 8. e7-e5 { 1.807 -0.10 } { } 9. Qc5-b5 { 1.750 -0.08 } { } 9. c7-c6 { 1.753 0.00 } { } 10. d5-c6 { 1.722 0.01 } { } 10. b7-c6 { 1.727 -0.10 } { } 11. Qb5-a4 { 1.693 0.00 } { } 11. Qd8-e7 { 1.692 -0.12 } { } 12. Ke1-d2 { 1.663 0.10 } { } 12. e5-e4 { 1.663 -0.16 } { } 13. f3-e4 { 1.636 0.17 } { } 13. Ra8-b8 { 1.642 -0.23 } { } 14. d3-d4 { 1.607 0.24 } { } 14. g7-g5 { 1.612 -0.21 } { } 15. d4-d5 { 1.584 0.21 } { } 15. Qe7-e4 { 1.586 -0.20 } { } 16. d5-c6 { 1.560 0.12 } { } 16. Qe4-f3 { 1.554 -0.21 } { } 17. c6-c7 { 1.534 0.46 } { } 17. Ke8-f7 { 1.527 -0.97 } { } 18. c7-b8Q { 1.512 1.11 } { } 18. a7-a6 { 1.511 -1.11 } { } 19. Qb8-b5 { 1.488 0.72 } { } 19. a6-b5 { 1.481 -0.74 } { } 20. Qa4-b5 { 1.457 0.53 } { } 20. Kf7-g7 { 1.459 -0.63 } { } 21. Bf1-e2 { 1.429 0.57 } { } 21. Qf3-h1 { 1.428 -0.47 } { } 22. f2-f4 { 1.409 0.31 } { } 22. Qh1-e4 { 1.403 -0.37 } { } 23. f4-g5 { 1.384 0.31 } { } 23. Bf8-e7 { 1.384 -0.35 } { } 24. Be2-h5 { 1.363 0.29 } { } 24. f6-g5 { 1.365 -0.25 } { } 25. Nb1-c3 { 1.335 0.28 } { } 25. Qe4-f4 { 1.342 -0.50 } { } 26. Kd2-d1 { 1.319 0.47 } { } 26. Qf4-f8 { 1.332 -0.45 } { } 27. Bc1-g5 { 1.296 0.23 } { } 27. Be7-g5 { 1.296 -0.51 } { } 28. Qb5-g5 { 1.270 0.98 } { } [/pgn]

Henk · Post by **Henk** » Mon Sep 17, 2018 12:47 pm

Joost Buijs wrote: ↑Tue Aug 28, 2018 4:12 pm
Henk wrote: ↑Tue Aug 28, 2018 2:58 pm For loss function I still use mean square error. Maybe I should change that. For activation function I use SELU. I read that if you use a SELU you don't need batch normalization for it is self normalizing. But computing an Exp(x) is one of slowest operations during training.
Code: Select all
       static public double SELU(double x)
        {
            return 1.0507 * (x >= 0 ? x : 1.67326 * (Math.Exp(x) - 1));
        }
I don't think that for SELU the accuracy of the Exp() function plays a big role, maybe you can use an approximation with a Taylor series or something alike. It won't make a difference of a magnitude, but every bit of speed you can gain will help of course.

Used this code to approximate Math.Exp(x) but it makes it only far more slow.

Code: Select all

double sum = 1;
double term = 1;
for (int i = 1; i <= 9; i++)
 {
      term = term * x / i;
       sum += term;
 }
 return sum;

But this helped a bit

Code: Select all

        // assumes x < 0

        if (x > -0.1)
        {
                unchecked
                {
                    double sqX = x * x;
                    double sum = 1 + x + sqX * (0.5 + x / 6 + sqX * (0.041666666 + x / 120));
                    Debug.Assert(Math.Abs(sum - Math.Exp(x)) <= 0.0001);
                    return sum;
                }
              
        }

Joost Buijs · Post by **Joost Buijs** » Mon Sep 17, 2018 1:12 pm

Why don't you switch to ReLU as activation function, at least it is fast because it doesn't need exp().

Henk · Post by **Henk** » Mon Sep 17, 2018 1:16 pm

Joost Buijs wrote: ↑Mon Sep 17, 2018 1:12 pm Why don't you switch to ReLU as activation function, at least it is fast because it doesn't need exp().

Relu is not self normalizing. If I use RELU I have to implement batch normalization as wel. Might be that adding batch normalization might be making it more slow. I don't know yet.

Joost Buijs · Post by **Joost Buijs** » Mon Sep 17, 2018 1:21 pm

Henk wrote: ↑Mon Sep 17, 2018 1:16 pm
Joost Buijs wrote: ↑Mon Sep 17, 2018 1:12 pm Why don't you switch to ReLU as activation function, at least it is fast because it doesn't need exp().
Relu is not self normalizing. If I use RELU I have to implement batch normalization as wel. Might be that adding batch normalization might be making it more slow. I don't know yet.

Like others alread said, purchasing a decent GPU would help a lot, of course you have to dive into CUDA programming, or use Python with Tensorflow like all these script kiddies do.

Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples