• DocumentCode
    3464472
  • Title

    Significant term extraction by Higher Order SVD

  • Author

    Manna, Sukanya ; Petres, Zoltán ; Gedeon, Tom

  • Author_Institution
    Dept. of Comput. Sci., Australian Nat. Univ., Canberra, ACT
  • fYear
    2009
  • fDate
    30-31 Jan. 2009
  • Firstpage
    63
  • Lastpage
    68
  • Abstract
    In this paper, we present a novel method for term importance, called tensor term indexing (TTI). This extracts significant terms from a document as well as a coherent collection of document set. The basic idea of this approach is to represent the whole document collection in a term-sentence-document tensor and employs higher-order singular value decomposition (HOSVD) for important term extraction. TTI uses the lower rank approximation technique to reduce noise by eliminating anecdotal terms, to mitigate synonymy by merging the dimensions associated with terms that have similar meanings, and to mitigates polysemy, since components of polysemous words that point in the ldquorightrdquo direction are added to the components of words that share a similar meaning. Our evaluation shows that that TTI model can extract significant terms relevant to a topic from a small number of documents which term frequency and inverse document frequency (tfidf) cannot.
  • Keywords
    approximation theory; document handling; indexing; information retrieval; singular value decomposition; higher order singular value decomposition; inverse document frequency; rank approximation technique; significant term extraction; tensor term indexing; term frequency; term sentence document tensor; Automation; Computer science; Data mining; Databases; Frequency; Indexing; Information retrieval; Law; Singular value decomposition; Tensile stress;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applied Machine Intelligence and Informatics, 2009. SAMI 2009. 7th International Symposium on
  • Conference_Location
    Herl´any
  • Print_ISBN
    978-1-4244-3801-3
  • Electronic_ISBN
    978-1-4244-3802-0
  • Type

    conf

  • DOI
    10.1109/SAMI.2009.4956610
  • Filename
    4956610