Title :
Efficient out-of-vocabulary term detection by n-gram array indices with distance from a syllable lattice
Author :
Iwami, Keisuke ; Fujii, Yasuhisa ; Yamamoto, Kazumasa ; Nakagawa, Seiichi
Author_Institution :
Dept. of Comput. Sci. & Eng., Toyohashi Univ. of Technol., Toyohashi, Japan
Abstract :
For spoken document retrieval, it is very important to con sider Out-of-Vocabulary (OOV) and mis-recognition of spoken words. Therefore, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken document retrieval system that is robust for considering OOV words and mis-recognition of sub-units. We used individual syllables as sub-word unit in continuous speech recognition and an n-gram sequence of syllables in a recognized syllable-based lattice. We propose an n-gram indexing/retrieval method with distance in the syllable lattice for attacking OOV, recognition errors, and high speed retrieval. We applied this method to academic lecture presentation database of 44 hours, and 0.58(F-value) of the OOV words were detected in less than 2.5 milliseconds.
Keywords :
document handling; indexing; information retrieval; natural language processing; speech recognition; vocabulary; Japanese spoken document retrieval system; academic lecture presentation database; continuous speech recognition; high speed retrieval; individual syllables; n-gram array indices; n-gram indexing retrieval method; n-gram sequence; out of vocabulary term detection; spoken document retrieval; subword unit based recognition; syllable based lattice; syllable lattice; Indexing; Out-of-Vocabulary; mis-recognition; n-gram; spoken term retrieval; syllable recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2011.5947645