Academy & Industry Research Collaboration Center (AIRCC)

Volume 12, Number 16, September 2022

An Context-Aware Intelligent System to Automate the Conversion of 2D Audio to 3D Audio using Signal Processing and Machine Learning


Bolin Gao1 and Yu Sun2, 1Fairmont Preparatory Academy, USA, 2California State Polytechnic University, USA


As virtual reality technologies emerge, the ability to create immersive experiences visually drastically improved [1]. However, in order to accompany the visual immersion, audio must also become more immersive [2]. This is where 3D audio comes in. 3D audio allows for the simulation of sounds from specific directions, allowing a more realistic feeling [3]. At the present moment, there lacks sufficient tools for users to design immersive audio experiences that fully exploit the abilities of 3D audio.

This paper proposes and implements the following systems [4]:

1. Automatic separation of stems from the incoming audio file, or letting the user upload the stems themselves
2. A simulated environment in which the separated stems will be automatically placed in
3. A user interface in order to manipulate the simulated positions of the separated stems. We applied our application to a few selected audio files in order to conduct a qualitative evaluation of our approach. The results show that our approach was able to successfully separate the stems and simulate a dimensional sound effect.


3D Audio, signal processing, Head Related Transfer Functions.