Proc. of the European Conf. on Speech Communication and Technology (Eurospeech), Vol. 1, pp. 391-394, Budapest, 1999
Human language identification with reduced spectral information
K. Mori, N. Toba, T. Harada, T. Arai, M. Komatsu, M. Aoyagi and Y. Murahara
Abstract: We conducted human language identification (LID) experiments using signals with reduced segmental information in pursuit of cues that humans use in their remarkable LID ability, which may be applicable to the development of robust automatic LID. American English and Japanese excerpts from the OGI-TS were processed by (1) spectral-envelope removal (SER) and (2) temporal-envelope modulation. With the SER signal, where the spectral-envelope is eliminated, humans could still identify the languages fairly successfully (85.2%). With the TEM signal, composed of white-noise driven, combined intensity envelopes from several frequency bands, the identification rate rose from 62.5% to 93.8% corresponding to the increasing number of bands from 1 to 4. These results, though with a limited number of languages, indicate that humans can identify languages using signal with its segmental information much reduced — in acoustic terms much reduced in spectral information.