• DocumentCode
    231550
  • Title

    Reducing footprint of unit selection TTS system by removing linguistic segments with rarely selected units

  • Author

    Gruber, Martin ; Matousek, Jindrich ; Tihelka, Daniel ; Hanzlicek, Zdenek

  • Author_Institution
    Dept. of Cybern., Univ. of West Bohemia, Pilsen, Czech Republic
  • fYear
    2014
  • fDate
    19-23 Oct. 2014
  • Firstpage
    494
  • Lastpage
    499
  • Abstract
    This paper is focused on reducing the size of speech corpora that are used in the unit-selection-based TTS systems. The size of a speech corpus influences the system requirements like storage and memory demands and computational complexity. For high quality speech synthesis, the speech corpus usually consists of several thousands of sentences. Thus an appropriate reduction of the corpus size is likely to lead to a decrease in the system requirements. In this work, a comparison of impacts on synthetic speech quality is presented when removing specific instances of different linguistic segment types from the original corpus. Removal of the following segment types is used and compared with each other: whole sentences, phrases, words, and diphones. Only segments with rarely selected units are removed from the corpus so that the resulting footprint size reaches a predefined value. Results confirm that synthetic speech generated by the TTS systems using the reduced corpora is of a slightly worse quality when compared with speech produced by the system employing the original full corpus. The comparison of the reduction based on different linguistic segments is also presented here.
  • Keywords
    computational complexity; natural language processing; speech synthesis; Czech language; computational complexity; diphones; footprint reduction; high quality speech synthesis; linguistic segment type removal; memory demands; phrases; rarely selected units; speech corpus; storage demands; synthetic speech quality; system requirements; unit selection TTS system; whole sentences; words; Cybernetics; Databases; Measurement units; Pragmatics; Speech; Speech synthesis; TTS; reducing footprint; speech synthesis; unit selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing (ICSP), 2014 12th International Conference on
  • Conference_Location
    Hangzhou
  • ISSN
    2164-5221
  • Print_ISBN
    978-1-4799-2188-1
  • Type

    conf

  • DOI
    10.1109/ICOSP.2014.7015054
  • Filename
    7015054