Title :
Combining Structural Analysis and Computer Vision Techniques for Automatic Speech Summarization
Author :
Sert, Mustafa ; Baykal, Buyurman ; Yazici, Adnan
Author_Institution :
Dept. of Comput. Eng., Baskent Univ., Ankara
Abstract :
Similar to verse and chorus sections that appear as repetitive structures in musical audio, key-concept (or topic) of some speech recordings (e.g., presentations, lectures, etc.) may also repeat itself over the time. Hence, accurate detection of these repetitions may be helpful to the success of automatic speech summarization. Based on this motivation, we consider the applicability of music structural analysis methods to speech summary generation. Our method transforms a 1-D time-domain speech signal to a 2-D image representation, namely (dis)similarity matrix and detects possible repetitions within the matrix by using proper computer vision techniques. In addition, the method does not transcribe speech signal into words, phrases, or sentences. Hence, it can be generalized as speech-to-speech summarization method, in which summarization results are presented by speech instead of text. Furthermore, the method does not need a prior knowledge about the language or grammar of speech signal. Experiments show that, our method can capture the main theme of speech signals compared to the ideal transcription sections defined by experts and computational analysis shows our proposed method has a good performance.
Keywords :
image representation; matrix algebra; speech processing; time-domain analysis; 1D time-domain speech signal; 2D image representation; automatic speech summarization; computational analysis; computer vision; dissimilarity matrix; music structural analysis methods; speech recordings; speech summary generation; speech-to-speech summarization method; transcription sections; Audio recording; Computational complexity; Computer vision; Image representation; Natural languages; Signal analysis; Speech analysis; Speech synthesis; Synthesizers; Time domain analysis; Audio content analysis; key-concept detection; speech summarization;
Conference_Titel :
Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on
Conference_Location :
Berkeley, CA
Print_ISBN :
978-0-7695-3454-1
Electronic_ISBN :
978-0-7695-3454-1
DOI :
10.1109/ISM.2008.90