Proc. of the International Conf. on Spoken Language Processing (ICSLP), Vol. 3, pp. 774-777, Beijing, 2000
On the important modulation-frequency bands of speech for human speaker recognition
T. Arai, M. Takahashi, N. Kanedera, Y. Takano and Y. Murahara
Abstract: By means of perceptual experiments, we investigated what range of modulation frequency components of the mel-frequency cepstral coefficients (MFCC) contains the most important information for speaker identification. In our study, we conducted two perceptual experiments using an MFCC-based re-synthesis scheme with two types of excitation. In Experiment I, speech sounds were re-synthesized from the extracted pitch and white noise. In Experiment II, speech sounds were re-synthesized only from white noise to avoid including pitch information. For each experiment the original speech sounds were uttered by two sets of five professors. A total of 44 students (16 for Exp. I and 28 for Exp. II) who attend the professors’ classes participated in the experiments. We analyzed the experimental results in order to estimate the relative importance of different modulation frequencies in speaker recognition. The results show that the most important speaker information was in modulation frequency components from 2 to 8 Hz for both Exp. I (pitch-excited) and Exp. II (noise-excited). These results also show that some contribution was derived from including modulation frequency components around 0 Hz. Hence, we concluded that dynamic features are important for human speaker identification as well as static features.