Information, Intelligence, and Systems, International Conference on
Download PDF

Abstract

Language models are used extensively in state-of-the-art speech recognition systems to help determine the probability of a hypothesized word sequence. These probabilities, along with the acoustic model scores allow the system to constrain the search space during recognition to only those word sequences that have a reasonable chance of being correct. In order to determine these probabilities, knowledge of the entire problem space is necessary. However, in speech recognition this is an unreasonable, if not impossible, task especially when one is using the SWITCHBOARD Corpus. Many statistical and rule-based approaches have been applied to this problem in order to arrive at a language model that produces the minimal word error rate (WER) of the recognizer. One technique includes part of speech (POS) information in the language model [1][2]. This paper discusses the task of tagging the SWITCHBOARD Corpus with POS information in the usual manner and the problems encountered when trying to conform conversational speech to these tags.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!