DocumentCode :
1784895
Title :
Unsupervised audio segmentation based on Restricted Boltzmann Machines
Author :
Pikrakis, Aggelos
Author_Institution :
Dept. of Inf., Univ. of Piraeus, Piraeus, Greece
fYear :
2014
fDate :
7-9 July 2014
Firstpage :
311
Lastpage :
314
Abstract :
In this paper the Conditional Restricted Boltzmann Machine (CRBM) is employed in the context of unsupervised audio segmentation. The CRBM acts as a temporal modeling method and learns, from a maximum likelihood perspective, the temporal relationships of the feature vectors that have been extracted from a large corpus of training data. After the CRBM has been trained, we quantify the correlation of the activation of the neurons of the hidden layer for successive feature vectors by means of an appropriately defined similarity function. A simple thresholding scheme is then applied on the output of the similarity function to segment automatically the audio recording. Our experiments have been carried out on a large corpus of documentaries. We provide an interpretation of the segmentation results and comment on the segmentation efficiency of the method.
Keywords :
Boltzmann machines; audio recording; audio signal processing; maximum likelihood estimation; CRBM; audio recording; conditional restricted Boltzmann machine; feature vectors; hidden layer; maximum likelihood perspective; segmentation efficiency; similarity function; temporal modeling method; temporal relationships; thresholding scheme; training data; unsupervised audio segmentation; Audio recording; Correlation; Feature extraction; Speech; Speech processing; Training; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information, Intelligence, Systems and Applications, IISA 2014, The 5th International Conference on
Conference_Location :
Chania
Type :
conf
DOI :
10.1109/IISA.2014.6878838
Filename :
6878838
Link To Document :
بازگشت