DocumentCode
2260141
Title
Document Structure Analysis and Text Normalization for Chinese Putonghua and Cantonese Text-to-Speech Synthesis
Author
Zhou, Xinxin ; Wu, Zhiyong ; Yuan, Chun ; Zhong, Yuzhuo
Author_Institution
Tsinghua-CUHK Joint Res. Center for Media Sci., Tsinghua Univ., Shenzhen
Volume
1
fYear
2008
fDate
20-22 Dec. 2008
Firstpage
477
Lastpage
481
Abstract
This paper describes our recent effort on document structure analysis (DSA) and text normalization (NORM) for Chinese Putonghua and Cantonese text-to-speech synthesis. A unified framework has been proposed, where DSA and NORM procedures are language-independent for the two-dialects of Chinese. For document structure analysis, regular expressions have been utilized to detect and identify the non-standard-words (NSWs) and punctuations related to document structure; a new document segmentation approach is then proposed by considering the information provided by NSWs and punctuations. For text normalization, a method which considers the contextual information is put forward to handle the ambiguity of the NSWs, symbols and punctuations.
Keywords
natural language processing; speech synthesis; text analysis; Cantonese; Chinese Putonghua; document segmentation approach; document structure analysis; nonstandard-words; text normalization; text-to-speech synthesis; Digital signal processing; Engines; Flowcharts; Information analysis; Information technology; Intelligent structures; Natural languages; Signal synthesis; Speech synthesis; Text analysis; document structure analysis; speech synthesis; text normalization;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on
Conference_Location
Shanghai
Print_ISBN
978-0-7695-3497-8
Type
conf
DOI
10.1109/IITA.2008.28
Filename
4739619
Link To Document