DocumentCode
1489591
Title
Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination
Author
Muscariello, Armando ; Gravier, Guillaume ; Bimbot, Frédéric
Author_Institution
IRISA, INRIA Rennes Bretagne Atlantique, Rennes, France
Volume
20
Issue
7
fYear
2012
Firstpage
2031
Lastpage
2044
Abstract
This paper describes and evaluates a computational architecture to discover and collect occurrences of speech repetitions, or motifs, in a totally unsupervised fashion, that is in the absence of acoustic, lexical or pronunciation modeling and training material. In the last few years, this task has known an increasing interest from the speech community because of a) its potential applicability in spoken document processing (as a preliminary step to summarization, topic clustering, etc.) and b) its novel methodology, that defines a new paradigm to speech processing that circumvents the issues common to all supervised, trained technologies. The contributions implied by the proposed system are two-fold: 1) the design of a discovery strategy that detects repetitions by extending matches of motif fragments, called seeds; 2) the implementation of template matching techniques to detect acoustically close segments, based on dynamic time warping (DTW) and self-similarity matrix (SSM) comparison of speech templates, in contrast to the decoding procedures of model-based recognition systems. The architecture is thoroughly evaluated on several hours of French broadcast news shows according to various parameter settings and acoustic features, namely mel-frequency cepstral coefficients (MFCCs) and different types of posteriorgrams: Gaussian mixture model (GMM)-based, and phone-based posteriors, in both language-matched and mismatched conditions. The evaluation highlights a) the improved robustness of the system that jointly employs DTW and SSM and b) the relevant impact of language-specific features to acoustic similarity detection based on template matching.
Keywords
Gaussian processes; decoding; matrix algebra; speech processing; speech recognition; DTW; GMM; Gaussian mixture model; MFCC; SSM; acoustic modeling; computational architecture; decoding procedures; dynamic time warping; language-matched conditions; lexical modeling; mel-frequency cepstral coefficients; model-based recognition systems; phone-based posteriors; posterior-grams; pronunciation modeling; seeded discovery; self-similarity matrix; speech community; speech processing; spoken document processing; template matching combination; training material; unsupervised motif acquisition; Acoustics; Materials; Pattern matching; Pragmatics; Speech; Speech processing; Speech recognition; Dynamic programming; histogram of oriented gradients; pattern matching; unsupervised learning; word discovery;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2012.2194283
Filename
6179978
Link To Document