To use int8 an additional step (calibration) is required to minimize loss of information -- measured by the Kullback-Leibler divergence between the fp32 and int8 model. I added the calibration step now and measured performance on an 12x128 net using 10 batches of 1024 positions for calibration. Here are the results on volta, which supports both fp16 and int8 besides fp32
Tensorflow pb : 24849 nodes/s
TensorRT FP32 : 20790 nodes/s
TensorRT FP16 : 53405 nodes/s
TensorRT INT8 : 44355 nodes/s
Though int8 is about twice faster than FP32 it is slower than FP16 on volta. I don't know why that is but
on standard image recognition samples INT8 was slightly faster than FP16. So I need to investigate why
mine is slower ..
Comparing policy values of FP16 and INT8 models it seems there is not much loss of information.
FP16
Code: Select all
# Move Value=(V,P,V+P) Policy Visits PV
#----------------------------------------------------------------------------------
# 1 (0.523,0.512,0.521) 26.45 62233 e2-e4 c7-c5 Ng1-f3 e7-e6 d2-d4 c5xd4 Nf3xd4 a7-a6 Bf1-d3 Qd8-c7 Ke1-g1 Ng8-f6 Qd1-e2 d7-d6 c2-c4 Nb8-d7 Nb1-c3 Bf8-e7 f2-f4 Ke8-g8
# 2 (0.530,0.530,0.530) 22.62 425256 d2-d4 d7-d5 Ng1-f3 c7-c5 c2-c4 c5xd4 Qd1xd4 Ng8-f6 c4xd5 Qd8xd5 Nb1-c3 Qd5xd4 Nf3xd4 a7-a6 e2-e4 e7-e5 Nd4-c2 Bf8-c5 Bc1-e3 Bc5xe3 Nc2xe3 Nb8-c6 Ne3-d5 Nf6xd5 Nc3xd5
# 3 (0.519,0.530,0.522) 16.61 47308 Ng1-f3 d7-d5 d2-d4 c7-c5 c2-c4 c5xd4 Qd1xd4 Ng8-f6 c4xd5 Qd8xd5 Nb1-c3 Qd5xd4 Nf3xd4 a7-a6 e2-e4 e7-e5 Nd4-c2 Bf8-c5 Bc1-e3 Bc5xe3 Nc2xe3
# 4 (0.520,0.494,0.515) 11.42 18343 c2-c4 e7-e5 Nb1-c3 Ng8-f6 Ng1-f3 Nb8-c6 g2-g3 d7-d5 c4xd5 Nf6xd5 Bf1-g2 Nd5-b6 Ke1-g1 Bf8-e7 d2-d3 Ke8-g8 a2-a3 Bc8-e6 b2-b4
# 5 (0.509,0.474,0.502) 5.85 6069 g2-g3 d7-d5 Bf1-g2 e7-e5 d2-d3 Nb8-c6 Ng1-f3 Ng8-f6 Ke1-g1 Bf8-e7 e2-e4 Ke8-g8 Nb1-c3 d5xe4 d3xe4 Qd8xd1
# 6 (0.482,0.446,0.475) 4.18 2448 f2-f4 d7-d5 Ng1-f3 Bc8-g4 e2-e3 Nb8-d7 Bf1-e2 Ng8-f6 Ke1-g1 e7-e6 b2-b3 Bf8-d6 Bc1-b2
# 7 (0.508,0.509,0.508) 2.30 2839 Nb1-c3 c7-c5 e2-e4 Nb8-c6 g2-g3 g7-g6 Bf1-g2 Bf8-g7 d2-d3 d7-d6 f2-f4 e7-e6 Ng1-f3 Ng8-e7
# 8 (0.486,0.456,0.480) 1.56 1001 b2-b3 e7-e5 Bc1-b2 Nb8-c6 e2-e3 Ng8-f6 Bf1-b5 Bf8-d6 Nb1-a3 Ke8-g8 Na3-c4 Rf8-e8 Nc4xd6 c7xd6 Ng1-e2
# 9 (0.487,0.430,0.476) 1.51 902 b2-b4 e7-e5 Bc1-b2 Bf8xb4 Bb2xe5 Ng8-f6 Ng1-f3 Ke8-g8 e2-e3 Nb8-c6 Be5-b2 d7-d5 Bf1-e2
# 10 (0.476,0.466,0.474) 1.49 868 c2-c3 d7-d5 d2-d4 Ng8-f6 Bc1-f4 c7-c5 e2-e3 Nb8-c6 Ng1-f3 Bc8-g4 h2-h3 Bg4xf3
# 11 (0.482,0.504,0.486) 1.34 966 d2-d3 d7-d5 e2-e4 d5xe4 d3xe4 Qd8xd1 Ke1xd1 Ng8-f6 Bf1-d3 Nb8-c6 Nb1-c3 e7-e5 Bc1-g5
# 12 (0.481,0.530,0.491) 1.25 990 e2-e3 Ng8-f6 Ng1-f3 g7-g6 c2-c4 Bf8-g7 Nb1-c3 Ke8-g8 Bf1-e2 d7-d6 Ke1-g1 e7-e5
# 13 (0.472,0.416,0.461) 0.55 265 Nb1-a3 e7-e5 c2-c4 Ng8-f6 Na3-c2 d7-d5 c4xd5 Nf6xd5 Ng1-f3 Nb8-c6 d2-d3 Bf8-e7
# 14 (0.470,0.506,0.477) 0.54 328 a2-a3 d7-d5 Ng1-f3 c7-c5 g2-g3 Nb8-c6 d2-d4 c5xd4 Nf3xd4 e7-e5
# 15 (0.469,0.453,0.466) 0.51 263 f2-f3 e7-e5 c2-c4 Ng8-f6 Nb1-c3 d7-d5 c4xd5 Nf6xd5 e2-e4 Nd5-b4 d2-d3 Nb8-c6
# 16 (0.470,0.415,0.459) 0.44 206 Ng1-h3 d7-d5 d2-d4 c7-c5 c2-c3 Nb8-c6 Nh3-f4 c5xd4 c3xd4
# 17 (0.462,0.471,0.464) 0.43 214 h2-h3 e7-e5 c2-c4 Ng8-f6 e2-e3 d7-d5 c4xd5 Nf6xd5 a2-a3
# 18 (0.456,0.437,0.452) 0.34 147 a2-a4 e7-e5 e2-e4 Ng8-f6 Nb1-c3 Bf8-b4 Ng1-f3 Ke8-g8 Nf3xe5 d7-d5
# 19 (0.452,0.383,0.439) 0.25 93 g2-g4 d7-d5 Bf1-g2 Bc8xg4 c2-c4 c7-c6 c4xd5 c6xd5 Qd1-b3 e7-e6
# 20 (0.460,0.437,0.456) 0.25 113 h2-h4 d7-d5 Ng1-f3 c7-c5 g2-g3 Nb8-c6 Bf1-g2 e7-e5
# nodes = 14581824 <34% qnodes> time = 10689ms nps = 1364189 eps = 753686 nneps = 50488
# Tree: nodes = 17740947 depth = 28 pps = 53405 visits = 570853
# qsearch_calls = 387 search_calls = 0
Code: Select all
# Move Value=(V,P,V+P) Policy Visits PV
#----------------------------------------------------------------------------------
# 1 (0.507,0.526,0.511) 25.07 216737 e2-e4 d7-d6 d2-d4 Ng8-f6 Nb1-c3 g7-g6 Bc1-g5 Bf8-g7 Qd1-d2 h7-h6 Bg5xf6 Bg7xf6 f2-f4 Nb8-c6 Ng1-f3
# 2 (0.494,0.526,0.500) 21.00 90126 d2-d4 Ng8-f6 Ng1-f3 g7-g6 c2-c4 Bf8-g7 Nb1-c3 d7-d6 e2-e4 e7-e5 d4xe5 d6xe5 Qd1xd8 Ke8xd8 Nf3xe5
# 3 (0.495,0.526,0.501) 16.75 83362 Ng1-f3 Ng8-f6 c2-c4 e7-e6 g2-g3 d7-d5 b2-b3 Bf8-e7 Bf1-g2 Ke8-g8 Bc1-b2 Nb8-d7 d2-d3 c7-c6 Nb1-d2 b7-b6 Qd1-c2
# 4 (0.497,0.491,0.496) 13.04 34174 c2-c4 Ng8-f6 Nb1-c3 g7-g6 g2-g3 Bf8-g7 Bf1-g2 d7-d6 d2-d4 Nb8-d7 Ng1-f3 Ke8-g8 Ke1-g1 e7-e5
# 5 (0.490,0.458,0.483) 6.13 8009 g2-g3 e7-e5 Bf1-g2 d7-d5 d2-d3 Nb8-c6 Ng1-f3 Ng8-f6 c2-c3 Bf8-e7 b2-b4 Ke8-g8 b4-b5 e5-e4 Nf3-d4 Nc6xd4
# 6 (0.500,0.458,0.491) 4.13 8010 f2-f4 d7-d5 Ng1-f3 Bc8-g4 Nf3-e5 Bg4-f5 e2-e3 Nb8-d7 Ne5xd7 Qd8xd7 Bf1-e2 Ng8-f6
# 7 (0.491,0.440,0.481) 2.01 2283 b2-b3 e7-e5 e2-e3 d7-d5 Bc1-b2 Nb8-d7 g2-g3 Ng8-f6 Bf1-g2 Bf8-d6 Ng1-e2
# 8 (0.494,0.526,0.500) 1.99 8209 Nb1-c3 e7-e5 Ng1-f3 Nb8-c6 g2-g3 d7-d5 d2-d3 Ng8-f6 Bf1-g2 Bf8-e7 Bc1-g5 d5-d4 Bg5xf6 Be7xf6
# 9 (0.498,0.447,0.488) 1.94 2988 c2-c3 Ng8-f6 d2-d4 g7-g6 Bc1-g5 Bf8-g7 Nb1-d2 d7-d6 Ng1-f3 Nb8-d7 e2-e4 h7-h6 Bg5-h4
# 10 (0.498,0.496,0.498) 1.53 4701 d2-d3 d7-d5 Ng1-f3 Bc8-g4 Nb1-d2 Nb8-c6 e2-e4 d5xe4 d3xe4 e7-e6 Bf1-e2
# 11 (0.499,0.430,0.485) 1.42 1916 b2-b4 e7-e5 a2-a3 d7-d5 e2-e3 Bf8-d6 Bc1-b2 Ng8-f6 c2-c4 c7-c6 Ng1-f3 e5-e4
# 12 (0.495,0.526,0.501) 1.36 8027 e2-e3 e7-e5 c2-c4 Ng8-f6 Nb1-c3 Bf8-b4 Ng1-e2 Bb4xc3 Ne2xc3 Ke8-g8 Bf1-e2
# 13 (0.493,0.500,0.494) 0.65 1460 a2-a3 e7-e5 e2-e4 Ng8-f6 Nb1-c3 Nb8-c6 Ng1-f3 Bf8-c5 Bf1-e2 d7-d6 d2-d3 a7-a6 Bc1-g5 h7-h6
# 14 (0.490,0.420,0.476) 0.52 493 Nb1-a3 e7-e5 c2-c4 Ng8-f6 Na3-c2 d7-d5 c4xd5 Nf6xd5 Ng1-f3 Nb8-c6 d2-d3
# 15 (0.498,0.491,0.497) 0.49 1355 h2-h3 e7-e5 e2-e4 Ng8-f6 Nb1-c3 Nb8-c6 Ng1-f3 d7-d5 e4xd5 Nf6xd5 Bf1-b5 Nd5xc3 b2xc3
# 16 (0.487,0.436,0.477) 0.46 445 f2-f3 d7-d5 d2-d4 c7-c5 c2-c3 Nb8-c6 d4xc5 e7-e5 b2-b4
# 17 (0.493,0.415,0.477) 0.44 433 Ng1-h3 e7-e5 e2-e3 d7-d5 d2-d4 Nb8-c6 d4xe5 Nc6xe5 Nb1-d2 Ng8-f6 Nh3-f4 Bf8-d6
# 18 (0.492,0.437,0.481) 0.38 426 a2-a4 e7-e5 e2-e4 Ng8-f6 Nb1-c3 Nb8-c6 Ng1-f3 d7-d5 e4xd5 Nf6xd5 Bf1-b5 Nd5xc3
# 19 (0.495,0.383,0.473) 0.31 269 g2-g4 e7-e5 Bf1-g2 Ng8-f6 g4-g5 Nf6-h5 d2-d3 d7-d5 Ng1-f3 Nb8-c6
# 20 (0.461,0.437,0.456) 0.28 163 h2-h4 e7-e5 c2-c4 Ng8-f6 Nb1-c3 d7-d5 c4xd5 Nf6xd5 Ng1-f3
# nodes = 12742111 <40% qnodes> time = 10677ms nps = 1193416 eps = 793153 nneps = 41199
# Tree: nodes = 13932845 depth = 21 pps = 44355 visits = 473587
# qsearch_calls = 396 search_calls = 0
Daniel