Neural Upmixing via Style Transfer

Title

Upmixing via Style Transfer: a Variational Autoencoder for Disentangling Spatial Images and Musical Content

Authors

Haici Yang, Sanna Wager, Spencer Russell, Mike Luo, Minje Kim, and Wontak Kim

Abstract

In the stereo-to-multichannel upmixing problem for music, one of the main tasks is to set the directionality of the instrument sources in the multichannel rendering results. In this paper, we propose a modified variational autoencoder model that learns a latent space to describe the spatial images in multichannel music. We seek to disentangle the spatial images and music content, so the learned latent variables are invariant to the music. At test time, we use the latent variables to control the panning of sources. We propose two upmixing use cases: transferring the spatial images from one song to another and blind panning based on the generative model. We report objective and subjective evaluation results to empirically show that our model captures spatial images separately from music content and achieves transfer-based interactive panning.

Sound Examples

Link to the Google Drive
It contains two sub-folders, “Blind upmix” and “Transferring upmix.” Each .wav file is with five channels in the order of Left, Right, Center, Rear Left, Rear Right, which need to be mapped to the proper speakers to play, e.g., by using Adobe Audition. We recommend the standard 5.1 channel surround system for a playback. Or, you can check out each different channel individually.

ICASSP 2022 Virtual Presentation