Title :
Iterative Bayesian word segmentation for unsupervised vocabulary discovery from phoneme lattices
Author :
Heymann, Jahn ; Walter, O. ; Haeb-Umbach, Reinhold ; Raj, Bhiksha
Author_Institution :
Dept. of Commun. Eng., Univ. of Paderborn, Paderborn, Germany
Abstract :
In this paper we present an algorithm for the unsupervised segmentation of a lattice produced by a phoneme recognizer into words. Using a lattice rather than a single phoneme string accounts for the uncertainty of the recognizer about the true label sequence. An example application is the discovery of lexical units from the output of an error-prone phoneme recognizer in a zero-resource setting, where neither the lexicon nor the language model (LM) is known. We propose a computationally efficient iterative approach, which alternates between the following two steps: First, the most probable string is extracted from the lattice using a phoneme LM learned on the segmentation result of the previous iteration. Second, word segmentation is performed on the extracted string using a word and phoneme LM which is learned alongside the new segmentation. We present results on lattices produced by a phoneme recognizer on the WSJ-CAM0 dataset. We show that our approach delivers superior segmentation performance than an earlier approach found in the literature, in particular for higher-order language models.
Keywords :
Bayes methods; iterative methods; speech recognition; unsupervised learning; WSJ-CAM0 dataset; automatic speech recognition; higher-order language models; iterative Bayesian word segmentation; phoneme lattices; phoneme recognizer; segmentation performance; true label sequence; unsupervised learning; unsupervised vocabulary discovery; Acoustics; Computational modeling; Hidden Markov models; Iterative methods; Lattices; Speech; Vocabulary; Automatic speech recognition; Unsupervised learning; Word Segmentation;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854364