Proc. of the RISP International Workshop on Nonlinear Circuit and Signal Processing (NCSP),pp. 407-410, Hawaii, 2004
Improvement of the modulation wavelet transform in ASR
K. Okada, T. Arai, N. Kanedera and K. Asai
Abstract: In this paper, we examine robust feature extraction methods for automatic speech recognition (ASR) in noise-distorted environments. Several perceptual experiments have shown that the range between 1 and 10 Hz of modulation frequency band is important for ASR. Combining the coefficients of multi-resolutional Fourier transform to split the important modulation frequency band for ASR into several bands especially increased recognition performance. We applied the wavelet transform to the feature extraction instead of multi-resolutional Fourier transform. We called this method of feature extraction “modulation wavelet transform” (MWT). The feature extraction of the previously proposed MWT covered the modulation frequency between 1 and 15 Hz. Therefore, we conducted speech recognition experiments using the MWT which covers the modulation frequency between 1 and 12 Hz by choosing the center frequencies of 2.5, 5.0, and 7.5 Hz. This new set of subbands yielded 3% increase in recognition accuracy compared to the previous results in several noise-distorted environments.