• DocumentCode
    1784895
  • Title

    Unsupervised audio segmentation based on Restricted Boltzmann Machines

  • Author

    Pikrakis, Aggelos

  • Author_Institution
    Dept. of Inf., Univ. of Piraeus, Piraeus, Greece
  • fYear
    2014
  • fDate
    7-9 July 2014
  • Firstpage
    311
  • Lastpage
    314
  • Abstract
    In this paper the Conditional Restricted Boltzmann Machine (CRBM) is employed in the context of unsupervised audio segmentation. The CRBM acts as a temporal modeling method and learns, from a maximum likelihood perspective, the temporal relationships of the feature vectors that have been extracted from a large corpus of training data. After the CRBM has been trained, we quantify the correlation of the activation of the neurons of the hidden layer for successive feature vectors by means of an appropriately defined similarity function. A simple thresholding scheme is then applied on the output of the similarity function to segment automatically the audio recording. Our experiments have been carried out on a large corpus of documentaries. We provide an interpretation of the segmentation results and comment on the segmentation efficiency of the method.
  • Keywords
    Boltzmann machines; audio recording; audio signal processing; maximum likelihood estimation; CRBM; audio recording; conditional restricted Boltzmann machine; feature vectors; hidden layer; maximum likelihood perspective; segmentation efficiency; similarity function; temporal modeling method; temporal relationships; thresholding scheme; training data; unsupervised audio segmentation; Audio recording; Correlation; Feature extraction; Speech; Speech processing; Training; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information, Intelligence, Systems and Applications, IISA 2014, The 5th International Conference on
  • Conference_Location
    Chania
  • Type

    conf

  • DOI
    10.1109/IISA.2014.6878838
  • Filename
    6878838