BiLD: Bi-directional Logits Difference Loss for Large Language Model Distillation
Patient Knowledge Distillation for BERT - based Natural Language Processing Models | Siqi Sun、Yu Cheng、Zhe Gan、Jingjing Liu
Distilling the Knowledge in a Neural Network | Geoffrey Hinton、Oriol Vinyals 、Jeff Dean
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models | Liang Li
鉴于思惟链的狂言语模子常识蒸馏 | 李枯涵
深度神经收集模子质化办法综述 | 杨秋
Pre-training Distillation for Large Language Models: A Design Space Exploration | Hao Peng
Greener yet Powerful: Taming Large Code Generation Models with Quantization | Xiaokai Wei