• DocumentCode
    1187299
  • Title

    Speech and crosstalk detection in multichannel audio

  • Author

    Wrigley, Stuart N. ; Brown, Guy J. ; Wan, Vincent ; Renals, Steve

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Sheffield, UK
  • Volume
    13
  • Issue
    1
  • fYear
    2005
  • Firstpage
    84
  • Lastpage
    91
  • Abstract
    The analysis of scenarios in which a number of microphones record the activity of speakers, such as in a round-table meeting, presents a number of computational challenges. For example, if each participant wears a microphone, speech from both the microphone\´s wearer (local speech) and from other participants (crosstalk) is received. The recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe two experiments related to the automatic classification of audio into these four classes. The first experiment attempted to optimize a set of acoustic features for use with a Gaussian mixture model (GMM) classifier. A large set of potential acoustic features were considered, some of which have been employed in previous studies. The best-performing features were found to be kurtosis, "fundamentalness," and cross-correlation metrics. The second experiment used these features to train an ergodic hidden Markov model classifier. Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation.
  • Keywords
    acoustic signal detection; crosstalk; hidden Markov models; microphones; pattern classification; speech recognition; Gaussian mixture model classifier; automatic audio classification; automatic speech recognition; cross-correlation metric; crosstalk detection; ergodic hidden Markov model classifier; kurtosis metric; local speech; microphone; multichannel audio; speech detection; Audio recording; Automatic speech recognition; Computer science; Crosstalk; Hidden Markov models; Laboratories; Microphones; Speech analysis; Speech recognition; Video recording;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2004.838531
  • Filename
    1369314