• DocumentCode
    748094
  • Title

    A Speech/Music Discriminator of Radio Recordings Based on Dynamic Programming and Bayesian Networks

  • Author

    Pikrakis, Aggelos ; Giannakopoulos, Theodoros ; Theodoridis, Sergios

  • Author_Institution
    Dept. of Inf. & Telecommun., Univ. of Athens, Athens
  • Volume
    10
  • Issue
    5
  • fYear
    2008
  • Firstpage
    846
  • Lastpage
    857
  • Abstract
    This paper presents a multistage system for speech/music discrimination which is based on a three-step procedure. The first step is a computationally efficient scheme consisting of a region growing technique and operates on a 1-D feature sequence, which is extracted from the raw audio stream. This scheme is used as a preprocessing stage and yields segments with high music and speech precision at the expense of leaving certain parts of the audio recording unclassified. The unclassified parts of the audio stream are then fed as input to a more computationally demanding scheme. The latter treats speech/music discrimination of radio recordings as a probabilistic segmentation task, where the solution is obtained by means of dynamic programming. The proposed scheme seeks the sequence of segments and respective class labels (i.e., speech/music) that maximize the product of posterior class probabilities, given the data that form the segments. To this end, a Bayesian Network combiner is embedded as a posterior probability estimator. At a final stage, an algorithm that performs boundary correction is applied to remove possible errors at the boundaries of the segments (speech or music) that have been previously generated. The proposed system has been tested on radio recordings from various sources. The overall system accuracy is approximately 96%. Performance results are also reported on a musical genre basis and a comparison with existing methods is given.
  • Keywords
    Bayes methods; audio signal processing; dynamic programming; radio networks; 1D feature sequence; Bayesian networks; audio recording; dynamic programming; probabilistic segmentation task; radio recordings; raw audio stream; speech/music discriminator; Bayesian networks; dynamic programming; speech-music discrimination;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2008.922870
  • Filename
    4540196