• DocumentCode
    1855187
  • Title

    Discrimination of Chinese quantitative style features based on text clustering

  • Author

    Hou Renkui ; Jiang Minghu

  • Author_Institution
    Lab. of Comput. Linguistics, Tsinghua Univ., Beijing, China
  • Volume
    3
  • fYear
    2012
  • fDate
    21-25 Oct. 2012
  • Firstpage
    2204
  • Lastpage
    2207
  • Abstract
    The styles of “News Broadcast” and “Qiang Qiang Conversation between Three Individuals” are different. The former is broadcasting, while the latter is conversational. This paper collects the corpus of both programs and selects sentence length, word length and sentence-initial word POS as the characters to generate the text vectors. And the texts are clustered by the Euclidean distance and ward algorithm. The analysis showed that the sentence length, word length and sentence-initial word POS can be used as Chinese quantitative stylistic characters.
  • Keywords
    broadcasting; feature extraction; natural language processing; pattern clustering; text analysis; Chinese quantitative style feature discrimination; Chinese quantitative stylistic characters; Euclidean distance; Qiang Qiang conversation between three individuals; news broadcast; sentence length; sentence-initial word POS; text clustering; text vector generation; ward algorithm; word length; Text Clustering; sentence length; sentence-initial word POS; type of writing; word length;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing (ICSP), 2012 IEEE 11th International Conference on
  • Conference_Location
    Beijing
  • ISSN
    2164-5221
  • Print_ISBN
    978-1-4673-2196-9
  • Type

    conf

  • DOI
    10.1109/ICoSP.2012.6492018
  • Filename
    6492018