Title :
Zero resource spoken audio corpus analysis
Author :
Harwath, David F. ; Hazen, Timothy J. ; Glass, James R.
Author_Institution :
MIT Lincoln Lab., Lexington, MA, USA
Abstract :
Zero-resource speech processing involves the automatic analysis of a collection of speech data in a completely unsupervised fashion without the benefit of any transcriptions or annotations of the data. In this paper, our zero-resource system seeks to automatically discover important words, phrases and topical themes present in an audio corpus. This system employs a segmental dynamic time warping (S-DTW) algorithm for acoustic pattern discovery in conjunction with a probabilistic model which treats the topic and pseudo-word identity of each discovered pattern as hidden variables. By applying an Expectation-Maximization (EM) algorithm, our system estimates the latent probability distributions over the pseudo-words and topics associated with the discovered patterns. Using this information, we produce acoustic summaries of the dominant topical themes of the audio document collection.
Keywords :
audio signal processing; document handling; expectation-maximisation algorithm; speech processing; time warp simulation; EM algorithm; S-DTW algorithm; acoustic pattern discovery; audio document collection; dominant topical themes; expectation-maximization algorithm; probabilistic model; probability distributions; pseudoword identity; segmental dynamic time warping algorithm; speech data collection; zero resource spoken audio corpus analysis; zero-resource speech processing; Acoustics; Computers; Data models; Glass; Heuristic algorithms; Speech; Speech processing; Zero-resource speech processing; speech summarization; spoken term discovery;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639335