The net is input->hidden->relu->output, which is about as simple as a network can get. However, for this net in particular, the outputs of the quantized version are sometimes extremely different from the outputs of the unquantized version.
The pseudocode for my quantization process is something like this:
Code: Select all
hidden_weights = torch.round(HIDDEN_QUANTIZE_SCALE * hidden_weights)
hidden_biases = torch.round(HIDDEN_QUANTIZE_SCALE * hidden_biases)
MAX_FEATURES = 32
max_hidden_weight = torch.max(torch.abs(hidden_weights)).item()
max_hidden_bias = torch.max(torch.abs(hidden_biases)).item()
max_hidden_output = MAX_FEATURES * max_hidden_weight + max_hidden_bias
assert max_hidden_weight < MAX_INT16
assert max_hidden_bias < MAX_INT16
assert max_hidden_output < MAX_INT16
output_weights = torch.round(OUTPUT_QUANTIZE_SCALE * output_weights)
output_biases = torch.round(HIDDEN_QUANTIZE_SCALE * OUTPUT_QUANTIZE_SCALE * output_biases)
max_output_weight = torch.max(torch.abs(output_weights)).item()
max_output_bias = torch.max(torch.abs(output_biases)).item()
max_output = HIDDEN_SIZE * max_hidden_output * max_output_weight + max_output_bias
assert max_output_weight < MAX_INT16
assert max_output_bias < MAX_INT32
assert max_output < MAX_INT32