Title :
Detection of nonlinguistic vocalizations using ALISP sequencing
Author :
Pammi, Sathish ; Khemiri, Houssemeddine ; Petrovska-Delacretaz, Dijana ; Chollet, Gerard
Author_Institution :
Inst. Mines-Telecom, Telecom ParisTech, Paris, France
Abstract :
In this paper, we present a generic methodology to detect nonlinguistic vocalizations using ALISP (Automatic Language Independent Speech Processing), which is a data-driven audio segmentation approach. Using Maximum Likelihood Linear Regression (MLLR) and Maximum A Posterior (MAP) techniques, the proposed method adapts ALISP models, which then facilitate detection of local regions of nonlinguistic vocalizations with the standard Viterbi decoding algorithm. We also illustrate how a simple majority voting scheme, using a sliding window on ALISP sequences, can be helpful in eliminating outliers from the Viterbi-predicted sequence automatically. We evaluate the performance of our method on detection of laughter, a nonlinguistic vocalization, in comparison with global acoustic models such as GMMs, left-to-right HMMs and ergodic HMMs. The results indicate that adapted ALISP acoustic models perform better than global acoustic models in terms of F-measure. Moreover, our majority voting scheme on ALISP-sequences further improves the performance yielding, in total, an increase of 19.6%, 8.1% and 5.6% on the F-measure against global acoustic models GMMs, left-to-right HMMs, and ergodic HMMs respectively.
Keywords :
Gaussian processes; Viterbi decoding; audio signal processing; hidden Markov models; maximum likelihood decoding; maximum likelihood estimation; regression analysis; speech coding; speech recognition; ALISP sequencing; GMM; Gaussian mixture models; MAP technique; MLLR technique; Viterbi decoding algorithm; automatic language independent speech processing; data driven audio segmentation approach; ergodic HMM; global acoustic model; hidden Markov models; laughter detection method; left to right HMM; majority voting scheme; maximum a posterior technique; maximum likelihood linear regression technique; nonlinguistic vocalization detection; sliding window; Acoustics; Adaptation models; Hidden Markov models; Speech; Training; Vectors; Viterbi algorithm; ALISP sequencing; acoustic models; audio segmentation; model adaptation;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639132