2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
Download PDF

Abstract

We propose a mathematical model to describe the relation between the formant frequencies of speakers and show that with the proposed affine model, speaker differences separate out as translation factors when a "Mel-like" warping is performed. Using speech data, we estimate the parameters of this warping function and show that it is close to the usual Mel-formula. This model is motivated by Rohit Sinha and S. Umesh's shift-based non-uniform speaker-normalization method (see Proc. IEEE ICASSP, 2002), which provides improvement over conventional maximum-likelihood based speaker normalization methods. We therefore provide a unified framework that relates the relationship between formants of speakers and the method of removing speaker differences (which involves Mel-warping) in a neat mathematical framework which is substantiated by our recognition experiments.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles