• DocumentCode
    705368
  • Title

    Speech detection on broadcast audio

  • Author

    Zubari, Unal ; Ozan, Ezgi Can ; Acar, Banu Oskay ; Ciloglu, Tolga ; Esen, Ersin ; Ates, Tugrul K. ; Onur, Duygu Oskay

  • Author_Institution
    Video & Audio Process. Group, TUBITAK UZAY, Ankara, Turkey
  • fYear
    2010
  • fDate
    23-27 Aug. 2010
  • Firstpage
    85
  • Lastpage
    89
  • Abstract
    Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Nonspeech via Gaussian Mixture Model (GMM) based classification. GMM´s are trained using a novel feature, Spectral Flow Direction (SFD), and an improved multi-band harmonicity feature in addition to widely used Mel Frequency Cepstral Coefficients (MFCC´s).
  • Keywords
    Gaussian processes; audio signal processing; broadcasting; cepstral analysis; decision theory; mixture models; speaker recognition; GMM based classification; Gaussian mixture model; MFCC; SFD; activity-nonactivity decision; audio data; broadcast audio; keyword spotter; mel frequency cepstral coefficients; multiband harmonicity feature; preprocessor module; speaker recognition; spectral flow direction; speech boundary detection; speech boundary detector; speech recognition; Feature extraction; Harmonic analysis; Mel frequency cepstral coefficient; Speech; Speech processing; Speech recognition; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference, 2010 18th European
  • Conference_Location
    Aalborg
  • ISSN
    2219-5491
  • Type

    conf

  • Filename
    7096641