Pytorch implementation of various Knowledge Distillation (KD) methods.
| Name | Method | Paper Link | Code Link |
|---|---|---|---|
| Baseline | basic model with softmax loss | — | code |
| ST | soft target | paper | code |
| AT | attention transfer | paper | code |
| Fitnet | hints for thin deep nets | paper | code |
| NST | neural selective transfer | paper | code |
| FT | factor transfer | paper | code |
| RKD | relational knowledge distillation | paper | code |
- Note, there are some differences between this repository and the original papers:
- For
AT: I use the sum of absolute values with power p=2 as the attention. - For
Fitnet: The training procedure is one stage without hint layer. - For
NST: I employ polynomial kernel with d=2 and c=0.
- For
- CIFAR10
- CIFAR100
- Resnet-20
- Resnet-110
- Create
./datasetdirectory and download CIFAR10/CIFAR100 in it. - You can simply specify the hyper-parameters listed in
train_xxx.pyor manually change them.- Use
train_base.pyto train the teacher model in KD and then save the model. - Before traning, you can choose the method you need in
./kd_lossesdirectory, and runtrain_kd.pyto train the student model.
- Use
- python 3.7
- pytorch 1.3.1
- torchvision 0.4.2
This repo is partly based on the following repos, thank the authors a lot.
- HobbitLong/RepDistiller
- bhheo/BSS_distillation
- clovaai/overhaul-distillation
- passalis/probabilistic_kt
- lenscloth/RKD
- [AberHu/Knowledge-Distillation-Zoo](https://github.com/AberHu/Knowledge-Distillation-Zoo)
If you employ the listed KD methods in your research, please cite the corresponding papers.