InDistill
An information flow-preserving knowledge distillation methodology for model compression.
- A method that applies channel pruning on the teacher’s intermediate layers before distillation, enabling for the first time direct feature map transfer and consequently information flow path preservation, termed InDistill.
- A curriculum learning-based training strategy taking into consideration both the increasing distillation difficulty of successive layers and the critical learning periods of a neural network.
InDistill is a model compression approach that combines knowledge distillation and channel pruning in a unified framework for the transfer of the critical information flow paths from a heavyweight teacher to a lightweight student. Such information is typically collapsed in previous methods due to an encoding stage prior to distillation. By contrast, InDistill leverages a pruning operation applied to the teacher’s intermediate layers reducing their width to the corresponding student layers’ width. In that way, we force architectural alignment enabling the intermediate layers to be directly distilled without the need of an encoding stage. Additionally, a curriculum learning-based training scheme is adopted considering the distillation difficulty of each layer and the critical learning periods in which the information flow paths are created.