DocumentCode :
1855187
Title :
Discrimination of Chinese quantitative style features based on text clustering
Author :
Hou Renkui ; Jiang Minghu
Author_Institution :
Lab. of Comput. Linguistics, Tsinghua Univ., Beijing, China
Volume :
3
fYear :
2012
fDate :
21-25 Oct. 2012
Firstpage :
2204
Lastpage :
2207
Abstract :
The styles of “News Broadcast” and “Qiang Qiang Conversation between Three Individuals” are different. The former is broadcasting, while the latter is conversational. This paper collects the corpus of both programs and selects sentence length, word length and sentence-initial word POS as the characters to generate the text vectors. And the texts are clustered by the Euclidean distance and ward algorithm. The analysis showed that the sentence length, word length and sentence-initial word POS can be used as Chinese quantitative stylistic characters.
Keywords :
broadcasting; feature extraction; natural language processing; pattern clustering; text analysis; Chinese quantitative style feature discrimination; Chinese quantitative stylistic characters; Euclidean distance; Qiang Qiang conversation between three individuals; news broadcast; sentence length; sentence-initial word POS; text clustering; text vector generation; ward algorithm; word length; Text Clustering; sentence length; sentence-initial word POS; type of writing; word length;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing (ICSP), 2012 IEEE 11th International Conference on
Conference_Location :
Beijing
ISSN :
2164-5221
Print_ISBN :
978-1-4673-2196-9
Type :
conf
DOI :
10.1109/ICoSP.2012.6492018
Filename :
6492018
Link To Document :
بازگشت