Abstract
Both linear predictive coding (LPC) and mel scale frequency cepstral coefficient (MFCC) analysis, the most common techniques for speech recognition signal processing, make the assumption that the speech signal is stationary for some analysis window and produce a representation based upon the "stationary" frequency content within the window. This work uses a technique based upon Cohen's (1989) class of generalized time frequency representations (TFR) to produce selected frequency representations that are not based upon an assumption of stationarity. This representation is used in a speech recognition system to produce improved accuracy. The proposed approach requires a kernel design to specify the attributes of the representations. The considerations used for analyzing speech signals and the resulting attributes are discussed. Comparisons with standard analysis techniques are presented. The significant computational requirements are also discussed.