Title :
The Similarity Computing of Documents Based on VSM
Author_Institution :
Sch. of Comput. Sci. & Technol., North China Electr. Power Univ., Beijing
fDate :
July 28 2008-Aug. 1 2008
Abstract :
The precision and efficiency of the similarity computing of documents is the foundation and key of other documents processing. In this paper, the DF and TF-IDF algorithms are improved. First, DF´s time complexity is linear which suits mass documents processing, but it has the fault that exceptional useful features may be deleted, so we make up that by adding the count of the words at the important places. Second, we rectify the weight of feature by the result of feature selection phase. In this way, we improve the precision of documents similarity without adding much time and space complexity.
Keywords :
computational complexity; document handling; TF-IDF algorithms; VSM; documents similarity computing; feature selection phase; mass documents processing; space complexity; time complexity; Application software; Computer applications; Computer science; Data mining; Entropy; Frequency; Information retrieval; Internet; Mutual information; Organizing; TF-IDF; VSM; documents similarity; feature selection;
Conference_Titel :
Computer Software and Applications, 2008. COMPSAC '08. 32nd Annual IEEE International
Conference_Location :
Turku
Print_ISBN :
978-0-7695-3262-2
Electronic_ISBN :
0730-3157
DOI :
10.1109/COMPSAC.2008.196