Temporal constraints on speech intelligibility as deduced from exceedingly sparse spectral representations

Proc. of the European Conf. on Speech Communication and Technology (Eurospeech), Vol. 6, pp. 2687-2690, Budapest, 1999

Temporal constraints on speech intelligibility as deduced from exceedingly sparse spectral representations

R. Silipo, S. Greenberg and T. Arai

Abstract: A novel means of quantifying the contribution of specific spectral bands for intelligibility is described. The spectrum of spoken English sentences is partitioned into one-third octave bands (“slits”) and the contribution of each of four slits ascertained independently and in combination with other slits distributed across the spectrum. The intelligibility baseline (four concurrent slits) yields ca. 85% intelligibility. The current study demonstrates that intelligibility progressively declines as the two central slits (2+3) are desynchronized between 25 and 250 ms. Beyond 250 ms intelligibility often declines even further but then begins to increase for greater degrees of asynchrony, suggesting the presence of a perceptual processing buffer of ca. 200-300 ms in duration. The utility of the spectral slit technique is also demonstrated for estimating the contribution towards intelligibility of different regions of the modulation spectrum. The mid-frequency (10-25 Hz) modulations are shown to be of particular significance for encoding speech information above 1.5 kHz. These two experiments demonstrate the power and utility of using circumscribed portions of the spectrum for quantitative evaluation of the contribution made by specific spectro-temporal properties of the speech signal.

[PDF (72 kB)]