• DocumentCode
    1691298
  • Title

    Detection of nonlinguistic vocalizations using ALISP sequencing

  • Author

    Pammi, Sathish ; Khemiri, Houssemeddine ; Petrovska-Delacretaz, Dijana ; Chollet, Gerard

  • Author_Institution
    Inst. Mines-Telecom, Telecom ParisTech, Paris, France
  • fYear
    2013
  • Firstpage
    7557
  • Lastpage
    7561
  • Abstract
    In this paper, we present a generic methodology to detect nonlinguistic vocalizations using ALISP (Automatic Language Independent Speech Processing), which is a data-driven audio segmentation approach. Using Maximum Likelihood Linear Regression (MLLR) and Maximum A Posterior (MAP) techniques, the proposed method adapts ALISP models, which then facilitate detection of local regions of nonlinguistic vocalizations with the standard Viterbi decoding algorithm. We also illustrate how a simple majority voting scheme, using a sliding window on ALISP sequences, can be helpful in eliminating outliers from the Viterbi-predicted sequence automatically. We evaluate the performance of our method on detection of laughter, a nonlinguistic vocalization, in comparison with global acoustic models such as GMMs, left-to-right HMMs and ergodic HMMs. The results indicate that adapted ALISP acoustic models perform better than global acoustic models in terms of F-measure. Moreover, our majority voting scheme on ALISP-sequences further improves the performance yielding, in total, an increase of 19.6%, 8.1% and 5.6% on the F-measure against global acoustic models GMMs, left-to-right HMMs, and ergodic HMMs respectively.
  • Keywords
    Gaussian processes; Viterbi decoding; audio signal processing; hidden Markov models; maximum likelihood decoding; maximum likelihood estimation; regression analysis; speech coding; speech recognition; ALISP sequencing; GMM; Gaussian mixture models; MAP technique; MLLR technique; Viterbi decoding algorithm; automatic language independent speech processing; data driven audio segmentation approach; ergodic HMM; global acoustic model; hidden Markov models; laughter detection method; left to right HMM; majority voting scheme; maximum a posterior technique; maximum likelihood linear regression technique; nonlinguistic vocalization detection; sliding window; Acoustics; Adaptation models; Hidden Markov models; Speech; Training; Vectors; Viterbi algorithm; ALISP sequencing; acoustic models; audio segmentation; model adaptation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6639132
  • Filename
    6639132