• DocumentCode
    3702117
  • Title

    Phonetic segmentation of speech using STEP and t-SNE

  • Author

    Adriana Stan;Cassia Valentini-Botinhao;Mircea Giurgiu;Simon King

  • Author_Institution
    Communications Department, Technical University of Cluj-Napoca, Romania
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    This paper introduces a first attempt to perform phoneme-level segmentation of speech based on a perceptual representation - the Spectro Temporal Excitation Pattern (STEP) - and a dimensionality reduction technique - the t-Distributed Stochastic Neighbour Embedding (t-SNE). The method searches for the true phonetic boundaries in the vicinity of those produced by an HMM-based segmentation. It looks for perceptually-salient spectral changes which occur at these phonetic transitions, and exploits t-SNE´s ability to capture both local and global structure of the data. The method is intended to be used in any language and it is therefore not tailored to any particular dataset or language. Results show that this simple approach improves segmentation accuracy of unvoiced phonemes by 4% within a 5 ms margin, and 5% at a 10 ms margin. For the voiced phonemes, however, accuracy drops slightly.
  • Keywords
    "Hidden Markov models","Speech","Training","Acoustics","Manuals","Stochastic processes","Three-dimensional displays"
  • Publisher
    ieee
  • Conference_Titel
    Speech Technology and Human-Computer Dialogue (SpeD), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/SPED.2015.7343105
  • Filename
    7343105