Continuous speech recognition based on the contribution of modulation spectrum

International Workshop on Speech Dynamics by Ear, Eye, Mouth and Machine, Technical Report of IEICE Japan, Vol. SP2003-54, pp. 67-72, 2003

Continuous speech recognition based on the contribution of modulation spectrum

N. Kanedera, T. Arai, K. Okada and K. Asai

Abstract: The Fourier transform of the time trajectories of a parameter such as logarithmic spectrum or cepstrum is called the modulation spectrum. In this paper we propose new feature for automatic speech recognition based on the contribution of modulation frequency components. The contribution shows the importance of each modulation frequency component for speech recognition. In proposed method, the time trajectory of each transformed spectral component is filtered by a linear-phase FIR filter with modulation-frequency characteristics based on the contribution as a substitute for RASTA filter. The proposed feature has two important properties: (1) little phase distortion and (2) effective enhancement of important modulation frequency comportant modulation frequency components of speech according to the contribution, while alleviates most modulation frequency components of noise. Testing proposed feature on IPA98 task (Japanese continuous speech recognition task) in niosy environments (SNR 10 dB) gave a relative improvement of 5% in word accuracy over the MFCC with dynamic feature. The results show proposed modulation filtering based on the contribution of modulation frequency components is effective.

Keywords: modulation frequency, continuous speech recognition

[PDF (518 kB)]