• DocumentCode
    3700131
  • Title

    Audio event recognition based on DBN features from multiple filter-bank representations

  • Author

    Feng Guo; Xiaoou Chen; Deshun Yang

  • Author_Institution
    Inst. of Comput. Sci. &
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In the audio event classification or detection research field, the representation of the audio itself is important. Many researchers tried to apply Deep Belief Network (DBN) to learn new representations of the audio. The mel filter-bank feature, which is obtained based on mel scale, is commonly used as the low level representation of the audio in the pre-processing procedure of DBN. However, the mel bands used in mel filter-bank feature may not be sufficient for the comprehensive representation of the diverse audio events in the real world and then it will make it difficult for DBN to learn good audio features. In this paper, two steps are taken to explore and tackle the problem. In the first step, we conduct a comparison of the effects among different arrangements of frequency bands to DBN feature learning in the audio event recognition. Here the arrangements of frequency bands include mel bands, bark bands, linear bands and pyramid bands. In the second step, in order to utilize the different classification capabilities of the DBN features on different audio events, we adopt the Adaboost algorithm to fuse them. We conduct the experiments on real datasets collected from findsound website, and the results verifies that our proposed audio event classification system, which uses diverse features selected by Adaboost from all sets of DBN features, outperforms the one using only one kind of DBN feature set.
  • Keywords
    "Feature extraction","Event detection","Training","Machine learning","Multimedia communication","Streaming media","Fuses"
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Signal Processing (MMSP), 2015 IEEE 17th International Workshop on
  • Type

    conf

  • DOI
    10.1109/MMSP.2015.7340807
  • Filename
    7340807