DocumentCode
1784895
Title
Unsupervised audio segmentation based on Restricted Boltzmann Machines
Author
Pikrakis, Aggelos
Author_Institution
Dept. of Inf., Univ. of Piraeus, Piraeus, Greece
fYear
2014
fDate
7-9 July 2014
Firstpage
311
Lastpage
314
Abstract
In this paper the Conditional Restricted Boltzmann Machine (CRBM) is employed in the context of unsupervised audio segmentation. The CRBM acts as a temporal modeling method and learns, from a maximum likelihood perspective, the temporal relationships of the feature vectors that have been extracted from a large corpus of training data. After the CRBM has been trained, we quantify the correlation of the activation of the neurons of the hidden layer for successive feature vectors by means of an appropriately defined similarity function. A simple thresholding scheme is then applied on the output of the similarity function to segment automatically the audio recording. Our experiments have been carried out on a large corpus of documentaries. We provide an interpretation of the segmentation results and comment on the segmentation efficiency of the method.
Keywords
Boltzmann machines; audio recording; audio signal processing; maximum likelihood estimation; CRBM; audio recording; conditional restricted Boltzmann machine; feature vectors; hidden layer; maximum likelihood perspective; segmentation efficiency; similarity function; temporal modeling method; temporal relationships; thresholding scheme; training data; unsupervised audio segmentation; Audio recording; Correlation; Feature extraction; Speech; Speech processing; Training; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Information, Intelligence, Systems and Applications, IISA 2014, The 5th International Conference on
Conference_Location
Chania
Type
conf
DOI
10.1109/IISA.2014.6878838
Filename
6878838
Link To Document