Towards language identification using phoneme-based features

Proc. of the IEEE International Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 1, pp. 289-292, Adelaide, 1994

Towards language identification using phoneme-based features

K. M. Berkling, T. Arai and E. Barnard

Abstract: This paper presents an analysis of the phonemic language identification system introduced in [5], now extended to recognize German in addition to English and Japanese. In this system language identification is based on features derived from a superset of phonemes of all three languages. As we increase the number of languages, the need to reduce the feature space becomes apparent, Practical analysis of single-feature statistics in conjunction with linguistic knowledge leads to 90% reduction of the feature space with only a 5% loss in performance. Thus, the system discriminates between Japanese and English with 84.1% accuracy based on only 15 features compared to 84.6% based on the complete set of 318 phonemic features (or 83.6% using 333 broad-category features [4]). Results indicate that a language identification system may be designed based on linguistic knowledge and then implemented with a neural network of appropriate complexity.

[PDF (330 kB)]