DocumentCode :
2450311
Title :
The Similarity Computing of Documents Based on VSM
Author :
Guo, Qinglin
Author_Institution :
Sch. of Comput. Sci. & Technol., North China Electr. Power Univ., Beijing
fYear :
2008
fDate :
July 28 2008-Aug. 1 2008
Firstpage :
585
Lastpage :
586
Abstract :
The precision and efficiency of the similarity computing of documents is the foundation and key of other documents processing. In this paper, the DF and TF-IDF algorithms are improved. First, DF´s time complexity is linear which suits mass documents processing, but it has the fault that exceptional useful features may be deleted, so we make up that by adding the count of the words at the important places. Second, we rectify the weight of feature by the result of feature selection phase. In this way, we improve the precision of documents similarity without adding much time and space complexity.
Keywords :
computational complexity; document handling; TF-IDF algorithms; VSM; documents similarity computing; feature selection phase; mass documents processing; space complexity; time complexity; Application software; Computer applications; Computer science; Data mining; Entropy; Frequency; Information retrieval; Internet; Mutual information; Organizing; TF-IDF; VSM; documents similarity; feature selection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Software and Applications, 2008. COMPSAC '08. 32nd Annual IEEE International
Conference_Location :
Turku
ISSN :
0730-3157
Print_ISBN :
978-0-7695-3262-2
Electronic_ISBN :
0730-3157
Type :
conf
DOI :
10.1109/COMPSAC.2008.196
Filename :
4591626
Link To Document :
بازگشت