Title :
Language-resource independent speech segmentation using cues from a spectrogram image
Author :
Su Jun Leow ; Eng Siong Chng ; Chin-Hui Lee
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
In this paper, we use image processing techniques on the speech spectrogram to perform speech phoneme segmentation. The proposed method relies solely on visual cues on the spectrogram, without the need for language-specific training data. The results are evaluated on the TIMIT corpus, and compared to other unsupervised speech segmentation techniques, with comparable results obtained. We also fuse the results with those obtained by hidden Markov models (HMM) and HMM-based forced alignment to investigate if image features can provide an additional feature representation for speech processing tasks. With the fusion, up to 10% absolute improvement in segmentation accuracy over the HMM baselines can be obtained. Results are promising and suggests a strong potential for image-based features applying to speech processing.
Keywords :
hidden Markov models; image segmentation; speech processing; TIMIT corpus; hidden Markov models; image processing technique; image-based features; language-resource independent speech segmentation; spectrogram image; speech phoneme segmentation; speech processing task; speech spectrogram; Hidden Markov models; Image segmentation; Spectrogram; Speech; Speech processing; Speech recognition; Visualization; image processing; low-resource languages; spectrogram processing; speech processing; speech segmentation;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7179086