Knowledge Distillation for Personalized Speech Enhancement

This page contains demo and source codes for the paper, “Test-Time Adaptation Toward Personalized Speech Enhancement:Zero-Shot Learning with Knowledge Distillation,” submitted to WASPAA 2021.

Source codes

https://github.com/kimsunwiub/PSE_ZeroShot_KD

Audio samples

The small GRU student model (2×32)

Test mixture (5.02 dB)
Pre-trained model (8.70 dB)
Fine-tuned by the GRU teacher (9.95 dB)
Fine-tuned by the CTN teacher (10.39 dB)
3×1024 GRU teacher (10.89 dB)
CTN teacher (13.92 dB)
Test mixture (0.03 dB)
Pre-trained model (5.19 dB)
Fine-tuned by the GRU teacher (8.03 dB)
Fine-tuned by the CTN teacher (8.57 dB)

3×1024 GRU teacher (7.87 dB)
CTN teacher (11.41 dB)

The large GRU student model (2×256)

Test mixture (9.99 dB)
Pre-trained model (13.79 dB)
Fine-tuned by the GRU teacher (14.25 dB)
Fine-tuned by the CTN teacher (14.45 dB)

3×1024 GRU teacher (14.60 dB)
CTN teacher (15.6 dB)