DocumentCode
1855187
Title
Discrimination of Chinese quantitative style features based on text clustering
Author
Hou Renkui ; Jiang Minghu
Author_Institution
Lab. of Comput. Linguistics, Tsinghua Univ., Beijing, China
Volume
3
fYear
2012
fDate
21-25 Oct. 2012
Firstpage
2204
Lastpage
2207
Abstract
The styles of “News Broadcast” and “Qiang Qiang Conversation between Three Individuals” are different. The former is broadcasting, while the latter is conversational. This paper collects the corpus of both programs and selects sentence length, word length and sentence-initial word POS as the characters to generate the text vectors. And the texts are clustered by the Euclidean distance and ward algorithm. The analysis showed that the sentence length, word length and sentence-initial word POS can be used as Chinese quantitative stylistic characters.
Keywords
broadcasting; feature extraction; natural language processing; pattern clustering; text analysis; Chinese quantitative style feature discrimination; Chinese quantitative stylistic characters; Euclidean distance; Qiang Qiang conversation between three individuals; news broadcast; sentence length; sentence-initial word POS; text clustering; text vector generation; ward algorithm; word length; Text Clustering; sentence length; sentence-initial word POS; type of writing; word length;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing (ICSP), 2012 IEEE 11th International Conference on
Conference_Location
Beijing
ISSN
2164-5221
Print_ISBN
978-1-4673-2196-9
Type
conf
DOI
10.1109/ICoSP.2012.6492018
Filename
6492018
Link To Document