2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM)
Download PDF

Abstract

This paper presents a data-driven framework to speech source localization (SSL) using deep neural network (DNN), which directly construct the nonlinear regressive transform between the extracted feature and the direction-of-arrival (DOA) of indoor speech source. The proposed method incorporates a feature extractor front-end and a regression network back-end. First, since the DOA information contained in the steering vector of speech source can be represented by the eigenvector associated with the signal subspace, it is extracted as the input feature by eigenanalysis. Second, a regression DNN is adopted to model the nonlinear relationship between the eigenvector and source direction, where time delay neural network (TDNN) is chosen as the basic network architecture. Several experiments are conducted under the simulated and real environments using an eight-channel circular array, which reveal the superiority and potential of the proposed method for SSL.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles