2013 Fourth International Conference on Emerging Security Technologies (EST)
Download PDF

Abstract

Existing Lexical Punctuation Prediction methods are mainly trained on the standard clean data while losing the generalization in practical automatic speech recognition (ASR) system with ubiquitous transcription errors. To bridge the gap between clean training data and noisy testing data, we propose three random (3R) data augmentation strategies: random word deletion (RWD), random word substitution (RWS), and random phoneme edition (RPE) in both word and phoneme levels on the training dataset. Specifically, we contribute an acoustically similar vocabulary with phoneme level editions for acoustically similar word substitution. In addition, we first introduce the RoBERTa-large model into a punctuation prediction task to capture the semantics and the long-distance dependencies in language. Extensive experiments on the English dataset IWSLT2011 yield to a new state-of-the-art comparing to the prevalent punctuation prediction methods.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles