SAIGE is a group of researchers, led by Minje Kim (CV), working on machine learning to solve real-world signal processing problems.
We live in a world full of complex problems. For an AI system to solve them, it involves a complex model that can best approximate the problem. In this era of AI, we finally seem to afford those complex models thanks to the technological advances in computing power, the theoretical advances in machine learning algorithms, and the availability of big data. As a result, AI starts to compete with human intelligence in some problems.
However, we believe that AI is still missing many powerful features that natural intelligence surely possesses. For example, does your brain need ten times more sugar intake when it solves a problem ten times more difficult? Maybe just a little bit more, but not that much. Does a mosquito need much energy to sense the target and fly to the destination accordingly (imagine a mosquito-sized drone that can do it)? We know that the intelligent systems found in nature are not only effective, but efficient. One of the main research agendas in SAIGE is building machine learning models that run more efficiently during the test time and in hardware. This kind of systems range from a deep neural network and probabilistic topic models defined and operate in a bitwise fashion to a psychoacoustically informed cost function for training a less complex model that still produces perceptually equivalent results.
Another important intelligent behavior is collaboration. It’s a rather abstract concept and not straightforward for a computational model to mimic. Still, we did find some interesting applications that can benefit from a collaboration between devices and sensors. For example, we have been interested in consolidating many different audio signals recorded by various devices to come up with a commonly dominant source of the audio scene. Since the recordings can contain both the dominant source of interest and its own artifact (e.g. additive noise, reverberation, band-pass filtering, etc), a naïve average of the recordings is not a good solution to this problem. We call this kind of problem collaborative audio enhancement. Another collaboration in nature can happen between different sensors, like we recognize someone else’s emotion by looking at her facial expression and listening to her voice. Therefore, building a machine learning model that fuses all the different decisions made from different kinds of sensor signals is our another research direction.