DocumentCode
3464472
Title
Significant term extraction by Higher Order SVD
Author
Manna, Sukanya ; Petres, Zoltán ; Gedeon, Tom
Author_Institution
Dept. of Comput. Sci., Australian Nat. Univ., Canberra, ACT
fYear
2009
fDate
30-31 Jan. 2009
Firstpage
63
Lastpage
68
Abstract
In this paper, we present a novel method for term importance, called tensor term indexing (TTI). This extracts significant terms from a document as well as a coherent collection of document set. The basic idea of this approach is to represent the whole document collection in a term-sentence-document tensor and employs higher-order singular value decomposition (HOSVD) for important term extraction. TTI uses the lower rank approximation technique to reduce noise by eliminating anecdotal terms, to mitigate synonymy by merging the dimensions associated with terms that have similar meanings, and to mitigates polysemy, since components of polysemous words that point in the ldquorightrdquo direction are added to the components of words that share a similar meaning. Our evaluation shows that that TTI model can extract significant terms relevant to a topic from a small number of documents which term frequency and inverse document frequency (tfidf) cannot.
Keywords
approximation theory; document handling; indexing; information retrieval; singular value decomposition; higher order singular value decomposition; inverse document frequency; rank approximation technique; significant term extraction; tensor term indexing; term frequency; term sentence document tensor; Automation; Computer science; Data mining; Databases; Frequency; Indexing; Information retrieval; Law; Singular value decomposition; Tensile stress;
fLanguage
English
Publisher
ieee
Conference_Titel
Applied Machine Intelligence and Informatics, 2009. SAMI 2009. 7th International Symposium on
Conference_Location
Herl´any
Print_ISBN
978-1-4244-3801-3
Electronic_ISBN
978-1-4244-3802-0
Type
conf
DOI
10.1109/SAMI.2009.4956610
Filename
4956610
Link To Document