Improving Speech Intelligibility in Reverberant Environments

Backgrounds

In public spaces (e.g. multiple-purpose halls, train stations and airports) where public address systems transmit speech signals via loudspeakers, we receive the speech signals with reverberation. It is sometimes difficult to understand speech in such reverberant environments, especially for people with hearing impairments, elderly people, and non-native listeners. Reverberation masks speech segments that follows (i.e. overlap-masking (Nabelek et al., 1989)), and this degrades speech intelligibility. It is pointed out that when a previous segment has strong energy (e.g. a vowel), the following segments (e.g. a consonant) can be significantly smeared in reverberation (Arai et al., 2001, 2002). When we compare an original and a reverberant speech signals in Figure 1, we can see the envelope of the reverberant speech signal is smeared by reverberation.

Figure 1. Original (left) and reverberant (right) speech signals

Approaches

The goal of this study is to achieve “barrier-free listening environments” in reverberant environments, which means providing intelligible speech signals not only for young people but for elderly people, people with hearing impairments and non-native listeners in public spaces. We have studied this from two approaches: the public address system side and the talker side (See Figure 2).

Figure 2. Two approaches to provide “barrier-free listening environments”

As an approach of the public address system side, we have proposed “pre-processing” (i.e. processing a speech signal before we send it from loudspeakers). Pre-processing might be beneficial in public spaces where different kind of people listen to speech signals because we don’t need to attach special listening devices to reduce the effect of reverberation.

We have proposed two pre-processing approaches: modulation filtering (e.g. Kusumoto et al., 1999, 2005) and steady-state suppression (Arai et al., 2001, 2002; Hodoshima et al., 2006). Modulation filtering alters the temporal dynamics of speech (i.e. temporal modulation). This approach enhances particular low-frequency components of the temporal modulation (i.e. below 16Hz) which are important for speech perception (Houtgast and Steeneken, 1985).

Steady-state suppression effectively suppresses steady-state portions of speech (e.g. vowel nuclei) that have high energy in order to reduce overlap-masking (See Figure 3). The information in steady-state portions of a speech signal is relatively unimportant compared to transitions (Furui, 1986), therefore this approach can minimize the effect of overlap-masking without degrading speech intelligibility as much as possible.

Figure 3. Original (left) and steady-state suppressed (right) signals of the word /aka/

As an approach of the talker side, we have studied speech signals which are robust to reverberation. Speech intelligibility changes by talkers as well as by speaking style (e.g. clear, conversational) or speaking rate (slow, normal, fast) within an individual talker. This approach seeks characteristics of intelligible speech signals as well as the effect of clear speech and slowed speaking rate in reverberation (e.g. Hodoshima et al., 2007).

Major findings

Different public spaces have different room conditions, and the optimum approach would be different in different public spaces. So, we have studied from the public address system side and the talker side under various listening conditions.

Below are our major findings:

1) The public address system side

  • Modulation filtering improved consonant identification for young people with normal hearing in reverberation (Kusumoto et al., 2005).
  • People with severe hearing loss preferred processed speech signals by modulation filtering as easier to hear compared to unprocessed speech signals in reverberation (Kusumoto et al., 1999, 2000).
  • Steady-state suppression significantly improved consonant identification (e.g. Arai et al., 2007; Hodoshima et al., 2005, 2006, 2008a; Miyauchi et al., 2005; Nakata et al., 2006)
    • both in simulated reverberant environments and in a lecture hall (reverberation times of 0.7-1.3 s),
    • both for young people with normal hearing and for elderly people,
    • in both normal and slowed speaking rate.

2) The talker side (Hodoshima et al., 2007, 2008b)

  • “Clear” speech had higher speech intelligibility than “conversational” speech by grouping young listeners’ hearing impression of speech signals uttered by talkers who were told as if they spoke in reverberation.

Future works

We believe that our research contributes to realizing “barrier-free listening environment” for elderly people, people with hearing-impairment and non-native listeners as well as designing an algorithm for hearing aids (Kobayashi et al., 2008).

Speech demos (to be updated soon)

  1. S. Furui, “On the role of spectral transition for speech perception.” J. Acoust. Soc. Am., 80(4), 1016-1025, 1986.
  2. T. Houtgast and H. J. M. Steeneken, “A review of MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria.” J. Acoust. Soc. Am., 77(3), 1069-1077, 1985.
  3. A. K. Nabelek, T. R. Letowski and F. M. Tucker, “Reverberant overlap- and self-masking in consonant identification.” J. Acoust. Soc. Am., 86(4), 1259-1265, 1989.
  4. T. Arai, K. Kinoshita, N. Hodoshima, A. Kusumoto and T. Kitamura, “Effects of suppressing steady-state portions of speech on intelligibility in reverberant environments,” Proc. Autumn Meet. Acoust. Soc. Jpn., 1, 449-450, 2001 (in Japanese).
  5. T. Arai, K. Kinoshita, N. Hodoshima, A. Kusumoto and T. Kitamura, “Effects of suppressing steady-state portions of speech on intelligibility in reverberant environments,” Acoust. Sci. Tech., 23(4), 229-232, 2002.
  6. T. Arai, K. Yasu and N. Hodoshima, “Effective speech processing for various impaired listeners,” Proc. International Congress on Acoustics, II, 1389-1392, 2004 (Invited Paper).
  7. T. Arai, “Padding zero into steady-state portions of speech as a preprocess for improving intelligibility in reverberant environments,” Acoust. Sci. Tech., 25(5), 459-461, 2005.
  8. T. Arai and N. Hodoshima, “Temporally enhanced speech is more intelligible in reverberant environments,” Proc. WESPAC, 2006 (Invited Paper).
  9. T. Arai, “Preprocessing speech against reverberation,” J. Acoust. Soc. Am., 120(5), 3323, 2006 (Invited Paper).
  10. T. Arai, Y. Nakata, N. Hodoshima and K. Kurisu, “Decreasing speaking-rate with steady-state suppression to improve speech intelligibility in reverberant environments,” Acoust. Sci. Tech., 28(4), 282-285, 2007.
  11. T. Arai, Y. Murakami, N. Hayashi, N. Hodoshima and K. Kurisu, “Inverse correlation of intelligibility of speech in reverberation with the amount of overlap-masking,” Acoust. Sci. Tech., 28(6), 438-441, 2007.
  12. N. Hayashi, T. Arai, N. Hodoshima, Y. Miyauchi and K. Kurisu, “Steady-state pre-processing for improving speech intelligibility in reverberant environments: Evaluation in a hall with an electrical reverberator,” Proc. Interspeech, 1741-1744, 2005.
  13. N. Hayashi, N. Hodoshima, T. Arai and K. Kurisu, “Influence of Deutlichkeit value and reverberation time on improved speech intelligibility in reverberant environments because of steady-state suppression,” J. Acoust. Soc. Am., 120(5), 3360, 2006.
  14. N. Hodoshima, T. Arai and A. Kusumoto, “Enhancing temporal dynamics of speech to improve intelligibility in reverberant environments,” Proc. Forum Acusticum, 2002.
  15. N. Hodoshima, T. Inoue, T. Arai and A. Kusumoto, “Suppressing steady-state portions of speech for improving intelligibility in various reverberant environments,” Proc. China-Japan Joint Conference on Acoustics, 199-202, 2002.
  16. N. Hodoshima, T. Arai, T. Inoue, K. Kinoshita and A. Kusumoto, “Improving speech intelligibility by steady-state suppression as pre-processing in small to medium sized halls,” Proc. Eurospeech, 1365-1368, 2003.
  17. N. Hodoshima, T. Arai, T. Inoue, K. Kinoshita and A. Kusumoto, “Improving intelligibility of speech by steady-state suppression as pre-processing in small to medium sized halls,” International Workshop on Speech Dynamics by Ear, Eye, Mouth and Machine, Technical Report of IEICE Japan, SP2003-53, 61-66, 2003.
  18. N. Hodoshima, T. Inoue, T. Arai, A. Kusumoto and K. Kinoshita, “Suppressing steady-state portions of speech for improving intelligibility in various reverberant environments,” Acoust. Sci. Tech., 25(1), 58-60, 2004.
  19. N. Hodoshima, T. Goto, N. Ohata, T. Inoue and T. Arai, “The effect of pre-processing for improving speech intelligibility in the Sophia University lecture hall,” Proc. International Congress on Acoustics, III, 2389-2392, 2004.
  20. N. Hodoshima, T. Goto, N. Ohata, T. Inoue and T. Arai, “The effect of pre-processing approach for improving speech intelligibility in a hall: Comparison between diotic and dichotic listening conditions,” Acoust. Sci. Tech., 26(2), 212-214, 2005.
  21. N. Hodoshima, T. Arai, A. Kusumoto and K. Kinoshita, “Improving syllable identification by a preprocessing method reducing overlap-masking in reverberant environments,” J. Acoust. Soc. Am., 119(6), 4055-4064, 2006.
  22. N. Hodoshima and T. Arai, “Investigating an optimum suppression rate of steady-state portions of speech that improves intelligibility the most as a pre-processing approach in reverberant environments,” J. Acoust. Soc. Am., 118(3), 1930, 2005.
  23. N. Hodoshima, D. Behne and T. Arai, “Steady-state suppression in reverberation: A comparison of native and nonnative speech perception,” Proc. Interspeech, 873-876, 2006.
  24. N. Hodoshima, T. Arai and P. Svensson, “The effect of a preprocessing approach improving speech intelligibility in reverberation considering a public-address system and room acoustics,” J. Acoust. Soc. Am., 120(5), 3359, 2006.
  25. N. Hodoshima, D. Behne and T. Arai, “The effect of the steady-state suppression on consonant identification by native and non-native listeners in reverberant environments,” International Workshop on Frontiers in Speech and Hearing Research, Technical Report of IEICE Japan, SP2005-165, 15-20, 2006.
  26. N. Hodoshima and T. Arai, “Effect of talker variability on speech perception by elderly people in reverberation” in the Handbook of Auditory signal processing in hearing-impaired listeners. International Symposium on Auditory and Audiological Research, edited by T. Dau, J. M. Buchholz, J. M. Harte and T. U. Christiansen (Centertryk A/S, Holbaek), 383-387, 2007.
  27. N. Hodoshima, Y. Miyauchi, K. Yasu and T. Arai, “Steady-state suppression for improving syllable identification in reverberant environments: A case study in an elderly person,” Acoust. Sci. Tech., 28(1), 53-55, 2007.
  28. N. Hodoshima, P. Svensson and T. Arai, “Preprocessing effects on speech intelligibility in reverberation using mixed natural and electroacoustical sounds,” Proc. Japan-China Joint Conference of Acoustics, 2007.
  29. N. Hodoshima, T. Arai and K. Kurisu, “Effects of training, style, and rate of speaking on speech perception of young people in reverberation,” Acoustics 08 Paris, 2393-2397, 2008.
  30. N. Hodoshima, W. Yoshida and T. Arai, “Improving consonant identification in noise and reverberation by steady-state suppression as a preprocessing approach,” Proc. Interspeech, 1793-1796, 2008.
  31. T. Goto, T. Inoue, N. Ohata, N. Hodoshima and T. Arai, “The effect of pre-processing for improving speech intelligibility in the Sophia University lecture hall,” Proc. Autumn Meet. Acoust. Soc. Jpn., 1, 613-614, 2003 (in Japanese, Poster Award).
  32. T. Kitamura, K. Kinoshita, T. Arai, A. Kusumoto and Y. Murahara, “Designing modulation filters for improving speech intelligibility in reverberant environments,” Proc. ICSLP, 3, 586-589, 2000.
  33. K. Kobayashi, Y. Hatta, K. Yasu, S. Minamihata, N. Hodoshima, T. Arai and M. Shindo, “Improving speech intelligibility for elderly listeners by steady-state suppression,” International Workshop on Frontiers in Speech and Hearing Research, Technical Report of IEICE Japan, SP2005-168, 31-36, 2006.
  34. K. Kobayashi, K. Yasu, N. Hodoshima, T. Arai and M. Shindo, “A study of syllable enhancement for elderly listeners by suppressing energy of steady-state portions of vowels,” J. Acoust. Soc. Jp, 64(5), 278-289, 2008 (in Japanese).
  35. A. Kusumoto, T. Arai, T. Kitamura, M. Takahashi and Y. Murahara, “Speech processing on the room acoustics for the hearing-impaired,” Proc. Autumn Meet. Acoust. Soc. Jpn., 1, 389-390, 1999 (in Japanese).
  36. A. Kusumoto, T. Arai, T. Kitamura, M. Takahashi and Y. Murahara, “Modulation enhancement of speech as a preprocessing for reverberant chambers with the hearing-impaired,” Proc. IEEE ICASSP, 2, 853-856, 2000.
  37. A. Kusumoto, T. Arai, K. Kinoshita, N. Hodoshima and N. Vaughan, “Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments,” Speech Communication, 45(2), 101-113, 2005.
  38. Y. Miyauchi, N. Hodoshima, K. Yasu, N. Hayashi, T. Arai and M. Shindo, “A preprocessing technique for improving speech intelligibility in reverberant environments: The effect of steady-state suppression on elderly people,” Proc. Interspeech, 2769-2772, 2005.
  39. Y. Miyauchi and T. Arai, “Energy suppression of steady-state portions of vowels while maintaining the energy of consonants better improves speech intelligibility for elderly listeners in reverberation,” J. Acoust. Soc. Am., 120(5), 3346-3347, 2006.
  40. Y. Nakata, Y. Murakami, N. Hodoshima and T. Arai, “Slowed speech spreading into reverberant environments; steady-state suppression improves speech intelligibility,” J. Acoust. Soc. Am., 120(5), 3360, 2006.
  41. Y. Nakata, Y. Murakami, N. Hodoshima, N. Hayashi, Y. Miyauchi, T. Arai and K. Kurisu, “The effects of speech-rate slowing for improving speech intelligibility in reverberant environments,” International Workshop on Frontiers in Speech and Hearing Research, Technical Report of IEICE Japan, SP2005-166, 21-24, 2006.
  42. K. Takahashi, K. Yasu, N. Hodoshima, T. Arai and K. Kurisu, “Enhancing speech in reverberation by steady-state suppression,” Proc. International Congress on Acoustics, 2007.