Title :
LZW Based Distance Measures for Spoken Language Identification
Author :
Basavamja, S.V. ; Sreenivas, T.V.
Author_Institution :
Dept. of Electr. Commun. Eng., Indian Inst. of Sci., Bangalore
Abstract :
We present a new approach to spoken language modeling for language identification (LID) using the Lempel-Ziv-Welch (LZW) algorithm. The LZW technique is applicable to any kind of tokenization of the speech signal. Because of the efficiency of LZW algorithm to obtain variable length symbol strings in the training data, the LZW codebook captures the essentials of a language effectively. We develop two new deterministic measures for LID based on the LZW algorithm namely: (i) Compression ratio score (LZW-CR) and (ii) weighted discriminant score (LZW-WDS). To assess these measures, we consider error-free tokenization of speech as well as artificially induced noise in the tokenization. It is shown that for a 6 language LID task of OGI-TS database with clean tokenization, the new model (LZW-WDS) performs slightly better than the conventional bigram model. For noisy tokenization, which is the more realistic case, LZW-WDS significantly outperforms the bigram technique
Keywords :
natural languages; speech processing; speech recognition; LZW-WDS technique; Lempel-Ziv-Welch algorithm; OGI-TS database; error-free tokenization; speech signal; spoken language identification; weighted discriminant score; Databases; Electric variables measurement; Maximum likelihood estimation; Natural languages; Neural networks; Noise measurement; Speech enhancement; Stochastic processes; TV; Training data;
Conference_Titel :
Speaker and Language Recognition Workshop, 2006. IEEE Odyssey 2006: The
Conference_Location :
San Juan
Print_ISBN :
1-424400471-1
Electronic_ISBN :
1-4244-0472-X
DOI :
10.1109/ODYSSEY.2006.248103