Human language identification with reduced segmental information

Acoustical Science and Technology, Vol. 23, No. 3, pp. 143-153, 2002

Human language identification with reduced segmental information

M. Komatsu, K. Mori, T. Arai, M. Aoyagi and Y. Murahara

Abstract: We conducted human language identification experments using signals with reduced segmental information with Japanese and bilingual subjects. American English and Japanese excerpts from the OGI Multi-Language Telephone Speech Corpus were processed by spectral-enveloperemoval (SER), vowel extraction from SER (VES) and temporal-envelope modulation (TEM). The processed excerpts of speech were provided as stimuli for perceptual experiments. We calculated D indices from the subjects’ responses, ranging from -2 to +2 where positive/negative values indicate correct/incorrect responses, respectively. With the SER signal, where the spectral-envelope is eliminated, humans could still identify the languages fairly successfully. The overall D index of Japanese subjects for this signal was +1.17. With the VES signal, composed of white-noise-driven intensity envelopes from several frequency bands, the D index rose from +0.29 to +1.69 corresponding to the increasing number of bands from 1 to 4. Results varied depending on the stimulus language. Japanese and bilingual subjects scored differently from each other. These results indicate that humans can identify languages using with drastically reduced segmental information. The results also suggest due to the phonetic typologies of languages and subjects’ knowledge.

Keywords: Language identification, Human perception, Segmentals, Suprasegmentals, Prosody, OGI Multi-Language Telephone Speech Corpus

[PDF (1,906 kB)]