DocumentCode
2696895
Title
LZW Based Distance Measures for Spoken Language Identification
Author
Basavamja, S.V. ; Sreenivas, T.V.
Author_Institution
Dept. of Electr. Commun. Eng., Indian Inst. of Sci., Bangalore
fYear
2006
fDate
28-30 June 2006
Firstpage
1
Lastpage
6
Abstract
We present a new approach to spoken language modeling for language identification (LID) using the Lempel-Ziv-Welch (LZW) algorithm. The LZW technique is applicable to any kind of tokenization of the speech signal. Because of the efficiency of LZW algorithm to obtain variable length symbol strings in the training data, the LZW codebook captures the essentials of a language effectively. We develop two new deterministic measures for LID based on the LZW algorithm namely: (i) Compression ratio score (LZW-CR) and (ii) weighted discriminant score (LZW-WDS). To assess these measures, we consider error-free tokenization of speech as well as artificially induced noise in the tokenization. It is shown that for a 6 language LID task of OGI-TS database with clean tokenization, the new model (LZW-WDS) performs slightly better than the conventional bigram model. For noisy tokenization, which is the more realistic case, LZW-WDS significantly outperforms the bigram technique
Keywords
natural languages; speech processing; speech recognition; LZW-WDS technique; Lempel-Ziv-Welch algorithm; OGI-TS database; error-free tokenization; speech signal; spoken language identification; weighted discriminant score; Databases; Electric variables measurement; Maximum likelihood estimation; Natural languages; Neural networks; Noise measurement; Speech enhancement; Stochastic processes; TV; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Speaker and Language Recognition Workshop, 2006. IEEE Odyssey 2006: The
Conference_Location
San Juan
Print_ISBN
1-424400471-1
Electronic_ISBN
1-4244-0472-X
Type
conf
DOI
10.1109/ODYSSEY.2006.248103
Filename
4013520
Link To Document