• DocumentCode
    2450311
  • Title

    The Similarity Computing of Documents Based on VSM

  • Author

    Guo, Qinglin

  • Author_Institution
    Sch. of Comput. Sci. & Technol., North China Electr. Power Univ., Beijing
  • fYear
    2008
  • fDate
    July 28 2008-Aug. 1 2008
  • Firstpage
    585
  • Lastpage
    586
  • Abstract
    The precision and efficiency of the similarity computing of documents is the foundation and key of other documents processing. In this paper, the DF and TF-IDF algorithms are improved. First, DF´s time complexity is linear which suits mass documents processing, but it has the fault that exceptional useful features may be deleted, so we make up that by adding the count of the words at the important places. Second, we rectify the weight of feature by the result of feature selection phase. In this way, we improve the precision of documents similarity without adding much time and space complexity.
  • Keywords
    computational complexity; document handling; TF-IDF algorithms; VSM; documents similarity computing; feature selection phase; mass documents processing; space complexity; time complexity; Application software; Computer applications; Computer science; Data mining; Entropy; Frequency; Information retrieval; Internet; Mutual information; Organizing; TF-IDF; VSM; documents similarity; feature selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Software and Applications, 2008. COMPSAC '08. 32nd Annual IEEE International
  • Conference_Location
    Turku
  • ISSN
    0730-3157
  • Print_ISBN
    978-0-7695-3262-2
  • Electronic_ISBN
    0730-3157
  • Type

    conf

  • DOI
    10.1109/COMPSAC.2008.196
  • Filename
    4591626