Quantifying Knowledge Distillation Using Partial Information Decomposition
Proposed information-theoretic metrics based on Partial Information Decomposition (PID) to quantify and explain knowledge transfer in distillation. This led to the Redundant Information Distillation (RID) framework, which filters task-irrelevant information and improves distillation under nuisance teachers.