Deep Learning models Compression Techniques - Pruning, Knowledge Distillation
Published:
The aim of this project is to find an efficient COVID face mask detection model for Deployment. In deep neural networks the computational cost for inference is higher and is proportional to the number of users/queries. When these deep models are deployed on the cloud, edge devices, mobile phones, etc. for various applications, low latency and less memory consumption are the key aspects for inference to decrease the computational cost on the hardware. In order to reduce the compute demand we can either optimize hardware and software stack or compress the model itself by reducing the number of parameters. Since the latter looks more feasible and in comparison to optimizing the hardware/software stack itself, We aim to explore different model compression techniques in this project.