• DocumentCode
    2539330
  • Title

    Research on Text Clustering Based on Concept Weight

  • Author

    Li, Yuqin ; Lv, Xueqiang ; Liu, Yufang ; Shi, Shuicai

  • Author_Institution
    Chinese Inf. Process. Res. Center, Beijing Inf. Sci. & Technol. Univ., Beijing, China
  • fYear
    2010
  • fDate
    13-15 Dec. 2010
  • Firstpage
    232
  • Lastpage
    235
  • Abstract
    Through research on the calculation method of feature words´ weight in texts and semantic similarity between words, we proposed a calculation method of feature words´ weight based on concept weight for the semantic association phenomenon of text features and the prevalence of high-dimensional problem in a text vector space model. This method reduces the semantic loss of the feature set and the dimension of the text vector, and then makes the text vector space model better and improves the quality of text clustering. Experimental results show the feasibility of the method, and prove that concept-weight-based text clustering increased by 22 percentage points or so than non-concept-weight-based in the final evaluation of the FI index value.
  • Keywords
    feature extraction; pattern clustering; set theory; text analysis; word processing; concept weight-based text clustering; feature set; feature word; semantic association phenomenon; text vector space model; Data mining; Data models; Electronic mail; Feature extraction; Information processing; Information science; Semantics; Concept Document Frequency; Concept Frequency; Concept Weight; Text Clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Genetic and Evolutionary Computing (ICGEC), 2010 Fourth International Conference on
  • Conference_Location
    Shenzhen
  • Print_ISBN
    978-1-4244-8891-9
  • Electronic_ISBN
    978-0-7695-4281-2
  • Type

    conf

  • DOI
    10.1109/ICGEC.2010.64
  • Filename
    5715412