DocumentCode :
2764708
Title :
Document vector compression and its application in document clustering
Author :
Fox, T.W.
Author_Institution :
Intelligent Engines, Calgary Univ., Alta.
fYear :
2005
fDate :
1-4 May 2005
Firstpage :
2029
Lastpage :
2032
Abstract :
Document clustering organizes documents into groups such that each group contains documents with similar content. The majority of document clustering algorithms require a vector representation for each document. Each vector has well over 10,000 elements. Consequently, the memory required during clustering can be extremely high when clustering hundreds of thousands of documents. This paper introduces document vector compression, which is based on the discrete cosine transform (DCT). Document vector compression reduces the run-time memory requirements by as much as 60%. Document vector compression does not degrade the final cluster quality (total F-measure) as does other document vector reduction techniques
Keywords :
data compression; discrete cosine transforms; document image processing; image coding; image representation; DCT; discrete cosine transform; document clustering algorithms; document vector compression; run-time memory requirements; vector representation; Arithmetic; Clustering algorithms; Compaction; Data mining; Degradation; Discrete cosine transforms; Engines; Frequency; Information retrieval; Runtime;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Computer Engineering, 2005. Canadian Conference on
Conference_Location :
Saskatoon, Sask.
ISSN :
0840-7789
Print_ISBN :
0-7803-8885-2
Type :
conf
DOI :
10.1109/CCECE.2005.1557384
Filename :
1557384
Link To Document :
بازگشت