DocumentCode
730909
Title
Language-resource independent speech segmentation using cues from a spectrogram image
Author
Su Jun Leow ; Eng Siong Chng ; Chin-Hui Lee
Author_Institution
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear
2015
fDate
19-24 April 2015
Firstpage
5813
Lastpage
5817
Abstract
In this paper, we use image processing techniques on the speech spectrogram to perform speech phoneme segmentation. The proposed method relies solely on visual cues on the spectrogram, without the need for language-specific training data. The results are evaluated on the TIMIT corpus, and compared to other unsupervised speech segmentation techniques, with comparable results obtained. We also fuse the results with those obtained by hidden Markov models (HMM) and HMM-based forced alignment to investigate if image features can provide an additional feature representation for speech processing tasks. With the fusion, up to 10% absolute improvement in segmentation accuracy over the HMM baselines can be obtained. Results are promising and suggests a strong potential for image-based features applying to speech processing.
Keywords
hidden Markov models; image segmentation; speech processing; TIMIT corpus; hidden Markov models; image processing technique; image-based features; language-resource independent speech segmentation; spectrogram image; speech phoneme segmentation; speech processing task; speech spectrogram; Hidden Markov models; Image segmentation; Spectrogram; Speech; Speech processing; Speech recognition; Visualization; image processing; low-resource languages; spectrogram processing; speech processing; speech segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location
South Brisbane, QLD
Type
conf
DOI
10.1109/ICASSP.2015.7179086
Filename
7179086
Link To Document