مرکز منطقه ای اطلاع رساني علوم و فناوري - Unsupervised word segmentation from noisy input

DocumentCode :

672396

Title :

Unsupervised word segmentation from noisy input

Author :

Heymann, Jahn ; Walter, O. ; Haeb-Umbach, Reinhold ; Raj, Bhiksha

Author_Institution :

Dept. of Commun. Eng., Univ. of Paderborn, Paderborn, Germany

fYear :

2013

fDate :

8-12 Dec. 2013

Firstpage :

458

Lastpage :

463

Abstract :

In this paper we present an algorithm for the unsupervised segmentation of a character or phoneme lattice into words. Using a lattice at the input rather than a single string accounts for the uncertainty of the character/phoneme recognizer about the true label sequence. An example application is the discovery of lexical units from the output of an error-prone phoneme recognizer in a zero-resource setting, where neither the lexicon nor the language model is known. Recently a Weighted Finite State Transducer (WFST) based approach has been published which we show to suffer from an issue: language model probabilities of known words are computed incorrectly. Fixing this issue leads to greatly improved precision and recall rates, however at the cost of increased computational complexity. It is therefore practical only for single input strings. To allow for a lattice input and thus for errors in the character/phoneme recognizer, we propose a computationally efficient suboptimal two-stage approach, which is shown to significantly improve the word segmentation performance compared to the earlier WFST approach.

Keywords :

probability; speech recognition; unsupervised learning; word processing; character recognizer; computationally efficient suboptimal two-stage approach; error-prone phoneme recognizer; label sequence; language model probabilities; lexical unit discovery; noisy input; phoneme lattice; unsupervised word segmentation algorithm; word segmentation performance; zero-resource setting; Acoustics; Computational modeling; Context; Lattices; Probability; Speech; Transducers; Automatic speech recognition; Unsupervised learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location :

Olomouc

Type :

conf

DOI :

10.1109/ASRU.2013.6707773

Filename :

6707773

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=672396