• DocumentCode
    730909
  • Title

    Language-resource independent speech segmentation using cues from a spectrogram image

  • Author

    Su Jun Leow ; Eng Siong Chng ; Chin-Hui Lee

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    5813
  • Lastpage
    5817
  • Abstract
    In this paper, we use image processing techniques on the speech spectrogram to perform speech phoneme segmentation. The proposed method relies solely on visual cues on the spectrogram, without the need for language-specific training data. The results are evaluated on the TIMIT corpus, and compared to other unsupervised speech segmentation techniques, with comparable results obtained. We also fuse the results with those obtained by hidden Markov models (HMM) and HMM-based forced alignment to investigate if image features can provide an additional feature representation for speech processing tasks. With the fusion, up to 10% absolute improvement in segmentation accuracy over the HMM baselines can be obtained. Results are promising and suggests a strong potential for image-based features applying to speech processing.
  • Keywords
    hidden Markov models; image segmentation; speech processing; TIMIT corpus; hidden Markov models; image processing technique; image-based features; language-resource independent speech segmentation; spectrogram image; speech phoneme segmentation; speech processing task; speech spectrogram; Hidden Markov models; Image segmentation; Spectrogram; Speech; Speech processing; Speech recognition; Visualization; image processing; low-resource languages; spectrogram processing; speech processing; speech segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7179086
  • Filename
    7179086