• DocumentCode
    2260141
  • Title

    Document Structure Analysis and Text Normalization for Chinese Putonghua and Cantonese Text-to-Speech Synthesis

  • Author

    Zhou, Xinxin ; Wu, Zhiyong ; Yuan, Chun ; Zhong, Yuzhuo

  • Author_Institution
    Tsinghua-CUHK Joint Res. Center for Media Sci., Tsinghua Univ., Shenzhen
  • Volume
    1
  • fYear
    2008
  • fDate
    20-22 Dec. 2008
  • Firstpage
    477
  • Lastpage
    481
  • Abstract
    This paper describes our recent effort on document structure analysis (DSA) and text normalization (NORM) for Chinese Putonghua and Cantonese text-to-speech synthesis. A unified framework has been proposed, where DSA and NORM procedures are language-independent for the two-dialects of Chinese. For document structure analysis, regular expressions have been utilized to detect and identify the non-standard-words (NSWs) and punctuations related to document structure; a new document segmentation approach is then proposed by considering the information provided by NSWs and punctuations. For text normalization, a method which considers the contextual information is put forward to handle the ambiguity of the NSWs, symbols and punctuations.
  • Keywords
    natural language processing; speech synthesis; text analysis; Cantonese; Chinese Putonghua; document segmentation approach; document structure analysis; nonstandard-words; text normalization; text-to-speech synthesis; Digital signal processing; Engines; Flowcharts; Information analysis; Information technology; Intelligent structures; Natural languages; Signal synthesis; Speech synthesis; Text analysis; document structure analysis; speech synthesis; text normalization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3497-8
  • Type

    conf

  • DOI
    10.1109/IITA.2008.28
  • Filename
    4739619