Estimating number of speakers by the modulation characteristics of speech

Proc. of the IEEE International Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 2, pp. 197-200, Hong Kong, 2003

Estimating number of speakers by the modulation characteristics of speech

T. Arai

Abstract: A method for estimating number of speakers of mixed speech signals was proposed. The algorithm was based on the modulation characteristics of speech, specifically that a single speech utterance typically has a distinct modulation pattern with a peak around 4-5 Hz. Having observed that the modulation peak decreases as number of speakers increases, our estimation algorithm used the region of the modulation frequency between 2 and 8 Hz. We obtained a novel parameter we called “equivalent number of speakers” to estimate the number of simultaneous speakers when speech signals contain multiple speakers.

[PDF (176 kB)]