Proc. of the IEEE International Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 2, pp. 613-616, Seattle, 1998

On properties of modulation spectrum for robust automatic speech recognition

N. Kanedera, H. Hermansky and T. Arai

Abstract: We report on the effect of band-pass filtering of the time trajectories of spectral envelopes on speech recognition. Several types of filter (linear-phase FIR, DCT, and DFT) are studied. Results indicate the relative importance of different components of the modulation spectrum of speech for ASR. General conclusions are: (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz, (2) it is important to preserve the phase information in modulation frequency domain, (3) The features which include components at around 4 Hz in modulation spectrum outperform the conventional delta features, (4) The features which represent the several modulation frequency bands with appropriate center frequency and band width increase recognition performance.

