Using the modulation wavelet transform for feature extraction in automatic speech recognition

Proc. of the International Conf. on Spoken Language Processing (ICSLP), Vol. 1, pp. 337-340, Beijing, 2000

Using the modulation wavelet transform for feature extraction in automatic speech recognition

K. Okada, T. Arai, N. Kanedera, Y. Momomura and Y. Murahara

Abstract: In this paper, we examine robust feature extraction methode for automatic speech recognition (ASR) in noise-distorted environments. Several perceptual experiments have shown that the range between 1 and 16 Hz of modulation frequency band is important for human speech recognition. Furthermore it has been reported the same modulation frequency band is important for ASR. Combining the coefficients of multi-resolutional Fourier transform to split the important modulation frequency band for ASR into several bands especially increased recognition performance. Combining corresponds to a wavelet transform. To test the effectiveness and efficiency of a multi resolutional Fourier transform corresponds to a wavelet transform. To test the effectiveness and efficiency of the wavelet transform, we, therefore, applied the wavelet transform to recognition experiments. This approach yielded an average of 3% increase in recognition accuracy compared to the standard approach using mel-frequency cepstral coefficients (MFCC) in several noise-distorted environments.

[PDF (173kB)]