Effective speech processing for various impaired listeners

Proc. of the International Congress on Acoustics, Vol. II, pp. 1389-1392, Kyoto, 2004 (Invited Paper)

Effective speech processing for various impaired listeners

T. Arai, K. Yasu and N. Hodoshima

Abstract: Normal hearing listeners are able to understand speech with different types of degradation, because speech has redundancy in the spectro-temporal domains. On the other hand, hearing impaired listeners have less such capability. Because of this, speech signal processing for hearing impairment needs to preserve important landmarks when enhancing a speech signal. The hearing impairments are characterized by high-frequency hearing loss, increase in the threshold of hearing, compression in the dynamic range, severity of temporal masking, and loss of spectral resolution due to the spread of masking. Thus, many studies on spectral and temporal enhancement have been proposed. In this paper we discussed effective speech processing techniques that we have developed to spectrally and temporally enhance speech signals for various types of impaired listeners.

For the spectral enhancement of speech, we have proposed two approaches: critical-band based frequency compression and formant enhancement. In critical-band based frequency compression, band-limited signals in each critical band are compressed along the frequency axis. For formant enhancement, we proposed a technique based on FFT and linear predictive coding. These techniques reduce the interference between adjacent subbands and augment the distance between the peak and valley in a speech spectrum. Also, the suppression in spectral power in the lower frequencies results in a reduction of the masking effect in higher frequencies.

For the temporal enhancement of speech, we proposed two approaches: modulation filtering and steady-state suppression. Both techniques emphasize the temporal dynamics of speech in the time domain. We previously found that the important frequency of temporal dynamics, or modulation frequency, for speech perception lie between 1-16 Hz, especially 4 Hz (Arai et al., JASA, 1999). Therefore, we proposed modulation filtering to emphasize these modulation frequencies. We further discussed steady-sate suppression (Arai et al., Acoust., Sci. & Tech., 2002) for improving speech intelligibility for hearing impaired listeners.

[PDF (75 kB)]