DocumentCode
2450311
Title
The Similarity Computing of Documents Based on VSM
Author
Guo, Qinglin
Author_Institution
Sch. of Comput. Sci. & Technol., North China Electr. Power Univ., Beijing
fYear
2008
fDate
July 28 2008-Aug. 1 2008
Firstpage
585
Lastpage
586
Abstract
The precision and efficiency of the similarity computing of documents is the foundation and key of other documents processing. In this paper, the DF and TF-IDF algorithms are improved. First, DF´s time complexity is linear which suits mass documents processing, but it has the fault that exceptional useful features may be deleted, so we make up that by adding the count of the words at the important places. Second, we rectify the weight of feature by the result of feature selection phase. In this way, we improve the precision of documents similarity without adding much time and space complexity.
Keywords
computational complexity; document handling; TF-IDF algorithms; VSM; documents similarity computing; feature selection phase; mass documents processing; space complexity; time complexity; Application software; Computer applications; Computer science; Data mining; Entropy; Frequency; Information retrieval; Internet; Mutual information; Organizing; TF-IDF; VSM; documents similarity; feature selection;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Software and Applications, 2008. COMPSAC '08. 32nd Annual IEEE International
Conference_Location
Turku
ISSN
0730-3157
Print_ISBN
978-0-7695-3262-2
Electronic_ISBN
0730-3157
Type
conf
DOI
10.1109/COMPSAC.2008.196
Filename
4591626
Link To Document