Acoustics, Speech, and Signal Processing, IEEE International Conference on
Download PDF

Abstract

This paper investigates techniques for minimizing the impact of non-speech events on the performance of large vocabulary continuous speech recognition (LVCSR) systems. An experimental study is presented that investigates whether the careful manual labeling of disfluency and background events in conversational speech can be used to provide an additional level of supervision in training HMM acoustic models and statistical language models. First, techniques are investigated for incorporating explicitly labeled disfluency and background events directly into the acoustic HMM. Second, phrase-based statistical language models are trained from utterance transcriptions which include labeled instances of these events. Finally, it is shown that significant word accuracy and run-time performance improvements are obtained for both sets of techniques on a telephone-based spoken language understanding task.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!