Hacking with Winograd

Henk · Post by **Henk** » Wed Oct 17, 2018 7:04 pm

I still forgot to rename some variables. For instance first branch should be

   if (l >= 0)
            {
                unchecked
                {
                    var sliceRow0 = slice.Value[l];
                    var weightRow = w[0];
                    var w00 = weightRow[0];
                    var w01 = weightRow[1];
                    var w02 = weightRow[2];
                    var winoGradfactorPair0 = winoGradFactors[0][0];
                    var win01 = winoGradfactorPair0.Item1;
                    var win02 = winoGradfactorPair0.Item2;

                    var u00 = (m >= 0) ? sliceRow0[m] : 0;
                    var u01 = sliceRow0[m + 1];
                    var u02 = sliceRow0[m + 2];
                    var u03 = sliceRow0[m + 3];
                    var u04 = sliceRow0[m + 4];
                    var u05 = (m + 5 < startHeight) ? sliceRow0[m + 5] : 0;

                    var m01 = (u00 - u02) * w00;
                    var m02 = (u01 + u02) * win01;
                    var m03 = (u02 - u01) * win02;
                    var m04 = (u01 - u03) * w02;
                    var m31 = (u02 - u04) * w00;
                    var m32 = (u03 + u04) * win01;
                    var m33 = (u04 - u03) * win02;
                    var m34 = (u03 - u05) * w02;

                    sum1 += m01 + m02 + m03;
                    sum2 += m02 - m03 - m04;
                    sum3 += m31 + m32 + m33;
                    sum4 += m32 - m33 - m34;

                    Debug.Assert(Math.Abs(m01 + m02 + m03 - (u00 * w00 + u01 * w01 + u02 * w02)) <= 1E-7);
                    Debug.Assert(Math.Abs(m02 - m03 - m04 - (u01 * w00 + u02 * w01 + u03 * w02)) <= 1E-7);
                    Debug.Assert(Math.Abs(m31 + m32 + m33 - (u02 * w00 + u03 * w01 + u04 * w02)) <= 1E-7);
                    Debug.Assert(Math.Abs(m32 - m33 - m34 - (u03 * w00 + u04 * w01 + u05 * w02)) <= 1E-7);

                }
            }

Henk · Post by **Henk** » Mon Oct 22, 2018 5:36 pm

Henk wrote: ↑Wed Oct 17, 2018 7:04 pm I still forgot to rename some variables. For instance first branch should be

Code: Select all

   if (l >= 0)
            {
                unchecked
                {
                    var sliceRow0 = slice.Value[l];
                    var weightRow = w[0];
                    var w00 = weightRow[0];
                    var w01 = weightRow[1];
                    var w02 = weightRow[2];
                    var winoGradfactorPair0 = winoGradFactors[0][0];
                    var win01 = winoGradfactorPair0.Item1;
                    var win02 = winoGradfactorPair0.Item2;

                    var u00 = (m >= 0) ? sliceRow0[m] : 0;
                    var u01 = sliceRow0[m + 1];
                    var u02 = sliceRow0[m + 2];
                    var u03 = sliceRow0[m + 3];
                    var u04 = sliceRow0[m + 4];
                    var u05 = (m + 5 < startHeight) ? sliceRow0[m + 5] : 0;

                    var m01 = (u00 - u02) * w00;
                    var m02 = (u01 + u02) * win01;
                    var m03 = (u02 - u01) * win02;
                    var m04 = (u01 - u03) * w02;
                    var m31 = (u02 - u04) * w00;
                    var m32 = (u03 + u04) * win01;
                    var m33 = (u04 - u03) * win02;
                    var m34 = (u03 - u05) * w02;

                    sum1 += m01 + m02 + m03;
                    sum2 += m02 - m03 - m04;
                    sum3 += m31 + m32 + m33;
                    sum4 += m32 - m33 - m34;

                    Debug.Assert(Math.Abs(m01 + m02 + m03 - (u00 * w00 + u01 * w01 + u02 * w02)) <= 1E-7);
                    Debug.Assert(Math.Abs(m02 - m03 - m04 - (u01 * w00 + u02 * w01 + u03 * w02)) <= 1E-7);
                    Debug.Assert(Math.Abs(m31 + m32 + m33 - (u02 * w00 + u03 * w01 + u04 * w02)) <= 1E-7);
                    Debug.Assert(Math.Abs(m32 - m33 - m34 - (u03 * w00 + u04 * w01 + u05 * w02)) <= 1E-7);

                }
            }

This code still forms a computational bottleneck. Maybe making specialized code that checks for sparsity might help. I don't know. Perhaps I even have to use sparse matrices. Don't know yet. Only getting more complicated which might introduce bad bugs. Already encountered a bug few days ago.

One epoch costs me already about twelve minutes and at least 200 epochs are needed etc.
And then you find out you need more filters/layers making it even more slow and then it starts all over again.

Henk · Post by **Henk** » Sun Oct 28, 2018 11:27 am

Sparse matrices are jagged dictionaries. So that means accessing matrix elements get log(n) slower.
So I doubt if it will get any faster. But only way to know for sure is try it (master of pain).

Henk · Post by **Henk** » Mon Nov 05, 2018 11:10 am

Changed interface of matrix. What a disaster. Have to repair great many lines. Also when making mistakes bugs creep in costing too much time.

Only because I also wanted to support non jagged sparse matrices. The Idiot keeps going on.

Best is to minimize your code and check all classes that are heavily used and check their interfaces if they are generic enough.

But that is only theory.

Henk · Post by **Henk** » Mon Nov 05, 2018 11:44 am

Henk wrote: ↑Mon Nov 05, 2018 11:10 am Changed interface of matrix. What a disaster. Have to repair great many lines. Also when making mistakes bugs creep in costing too much time.

Only because I also wanted to support non jagged sparse matrices. The Idiot keeps going on.

Best is to minimize your code and check all classes that are heavily used and check their interfaces if they are generic enough.

But that is only theory.

Main problem is that I start forgetting how backpropagation works. Seeing network as a black box.
But same problem with PVS. Last week I had to recall what PVS is.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Fri Nov 09, 2018 9:13 am

Daniel Shawul wrote: ↑Thu Oct 11, 2018 8:20 pm I don't disagree on the need to understand the inner workings but you will have a hard time
beating vendor supplied optimized libraries such as Intel MKL, CuDNN, TensorRT etc...
Lczero already tried the former approach first and eventualy switched to cuDNN and MKL blas etc.
I am sure GCP put a lot of effort into coding winograd etc but these AI libraries are used by a lot of industry
so nvida/intel has a lot to gain from offering highly optimized libraries.

FWIW, Leela Zero and lc0 in OpenCL mode still use my code (though Henrik Forsten co-wrote large parts of the current implementation and he should get credit). When we benchmarked it against cuDNN in Leela Zero it was faster. It seems we need much more aggressive batching for cuDNN to outperfom it (for chess things are very different). This may have changed with RTX cards and tensor cores, which is why I was asking about this in the other thread. People are working on more aggressive batching for Leela Zero as well, but that should remind you that these days you cannot separate the DCNN implementation from the search specifics and tuning.

The main reason to write my own implementation was in any case to not have to depend on the whims of the vendors' licensing, and to not have split versions for both card vendors.

Note that lc0's cuDNN backend was written by an NVIDIA driver engineer, and he also dealt with getting the redistribution permission. I'm sure the implementation is state of the art

Hacking with Winograd

Re: Hacking with Winograd

Re: Hacking with Winograd

Re: Hacking with Winograd

Re: Hacking with Winograd

Re: Hacking with Winograd

Re: Hacking with Winograd