Psychoacoustic Loss Functions for Neural Audio Coding

This is the demo page for the paper submitted to IEEE Signal Processing Letters for review:
Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, and Minje Kim “Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding,” IEEE Signal Processing Letters (under review)

Source Codes

https://github.com/cocosci/pam-nac

Decoded Samples

The bitrate for uncompressed waveforms is 1.411 Mbps, the same as CD’s with the stereo setup. For the mono setup in this work, the uncompressed bitrate is 705.6 kbps.

Low Bitrates, 32 kHz

  • Model-A (NAC with MSE loss), 64 kbps; 0.45M parameters
  • Model-A (NAC with MSE loss), 79 kbps; 0.9M parameters
  • Model-B (Model-A loss + mel-scale), 79 kbps, 0.9M parameters
  • Model-C (Model-B loss + priority weighting), 64 kbps, 0.45M parameters
  • Model-D (Model-C loss + noise modulation), 64 kbps, 0.45M parameters
  • MP3, 64 kbps
Low bitrates example #1
Reference
Model-A 64kbps; 1AE
Model-A 79kbps; 2AE
Model-B 79kbps; 2AE
Model-C 64kbps; 1AE
Model-D 64kbps; 1AE
MP3 64kbps
Low bitrates example #2
Reference
Model-A 64kbps; 1AE
Model-A 79kbps; 2AE
Model-B 79kbps; 2AE
Model-C 64kbps; 1AE
Model-D 64kbps; 1AE
MP3 64kbps
Low bitrates example #3
Reference
Model-A 64kbps; 1AE
Model-A 79kbps; 2AE
Model-B 79kbps; 2AE
Model-C 64kbps; 1AE
Model-D 64kbps; 1AE
MP3 64kbps

High Bitrates, 44.1 kHz

  • Model-A (NAC with MSE loss), 168 kbps; 0.9M parameters
  • Model-C (Model-A loss + mel-scale and priority weighting), 96 kbps, 0.45M parameters
  • Model-C (Model-A loss +mel-scale and priority weighting), 112 kbps, 0.9M parameters
  • Model-D (Model-C loss + noise modulation), 96 kbps, 0.45M parameters
  • Model-D (Model-C loss + noise modulation), 112 kbps, 0.9M parameters
  • MP3, 112 kbps
High bitrates example #1
Reference
Model-A 168kbps; 2AE
Model-C 96kbps; 1AE
Model-C 112kbps; 2AE
Model-D 96kbps; 1AE
Model-D 112kbps; 2AE
MP3 112kbps
High bitrates example #2
Reference
Model-A 168kbps; 2AE
Model-C 96kbps; 1AE
Model-C 112kbps; 2AE
Model-D 96kbps; 1AE
Model-D 112kbps; 2AE
MP3 112kbps
High bitrates example #3
Reference
Model-A 168kbps; 2AE
Model-C 96kbps; 1AE
Model-C 112kbps; 2AE
Model-D 96kbps; 1AE
Model-D 112kbps; 2AE
MP3 112kbps