Knowledge Distillation for PSE

Paper

Sunwoo Kim and Minje Kim, “Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning With Knowledge Distillation,” in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, Oct. 17-20, 2021 [pdf].

Source codes

https://github.com/kimsunwiub/PSE_ZeroShot_KD

Video Presentation

Audio samples

The small GRU student model (2×32)

Test mixture (5.02 dB)
Pre-trained model (8.70 dB)
Fine-tuned by the GRU teacher (9.95 dB)
Fine-tuned by the CTN teacher (10.39 dB)
3×1024 GRU teacher (10.89 dB)
CTN teacher (13.92 dB)
Test mixture (0.03 dB)
Pre-trained model (5.19 dB)
Fine-tuned by the GRU teacher (8.03 dB)
Fine-tuned by the CTN teacher (8.57 dB)

3×1024 GRU teacher (7.87 dB)
CTN teacher (11.41 dB)

The large GRU student model (2×256)

Test mixture (9.99 dB)
Pre-trained model (13.79 dB)
Fine-tuned by the GRU teacher (14.25 dB)
Fine-tuned by the CTN teacher (14.45 dB)

3×1024 GRU teacher (14.60 dB)
CTN teacher (15.6 dB)

※ The material discussed here is partly based upon work supported by the National Science Foundation under Award #: 2046963. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.