Proceedings of International Conference on Acoustics, Speech and Signal Processing (CASSP'02)
Download PDF

Abstract

Far-field speaker identification is very challenging since varying recording conditions often result in un-matching training and testing situations. Although the widely used Gaussian Mixture Models (GMM) approach achieves reasonable good results when training and testing conditions match, its performance degrades dramatically under un-matching conditions. In this paper we propose a new approach for far-field speaker identification: the usage of multilingual phone strings derived from phone recognizers in eight different languages. The experiments are carried out on a database of 30 speakers recorded with eight different microphone distances. The results show that the multi-lingual phone string approach is robust against un-matching conditions and significantly outperforms the GMMs. On 10-second test chunks, the average closed-set identification performance achieves 96.7% on variable distance data.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles