Introduction
Quantization is a key technique in machine learning that is used to reduce the model size and speed up inference, especially when deploying models on hardware with resource constraints. Nevertheless, achieving a good quantization setup means balancing the model performance against the computational efficiency required by the deployment environment.