Proc. of the International Congress on Acoustics, Vol. 4, pp. 2677-2678, Seattle, 1998
Speech intelligibility is highly tolerant of cross-channel spectral asynchrony
S. Greenberg and T. Arai
Abstract: A detailed auditory analysis of the short-term acoustic spectrum is generally considered essential for understanding spoken language. This assumption is called into question by the results of an experiment in which the spectrum of spoken sentences was partitioned into quarter-octave channels and the onset of each channel shifted in time relative to the others so as to desynchronize spectral information across the frequency plane. Intelligibility of sentential material (as measured in terms of word accuracy) is unaffected by a (maximum) onset jitter of 80 ms or less and remains high (> 75%) even for jitter intervals of 140 ms. Only when the jitter imposed across channels exceeds 200 ms does intelligibility fall below 50%. These results imply that the cues required to understand spoken language are not optimally specified in the short-term spectral domain, but may rather be based on some other set of representational cues such as the modulation spectrogram [S. Greenberg and B. Kingsbury, Proc. IEEE ICASSP, 1997, pp. 1647-1650]. Consistent with this hypothesis is the fact that intelligibility (as a function of onset-jitter interval) is highly correlated with the magnitude of the modulation spectrum between 3 and 6 Hz.