• DocumentCode
    3466412
  • Title

    Random-Walk Term Weighting for Improved Text Classification

  • Author

    Hassan, Samer ; Mihalcea, Rada ; Banea, Carmen

  • Author_Institution
    Univ. of North Texas, Denton
  • fYear
    2007
  • fDate
    17-19 Sept. 2007
  • Firstpage
    242
  • Lastpage
    249
  • Abstract
    This paper describes a new approach for estimating term weights in a document, and shows how the new weighting scheme can be used to improve the accuracy of a text classifier. The method uses term co-occurrence as a measure of dependency between word features. A random-walk model is applied on a graph encoding words and co-occurrence dependencies, resulting in scores that represent a quantification of how a particular word feature contributes to a given context. Experiments performed on three standard classification datasets show that the new random-walk based approach outperforms the traditional term frequency approach of feature weighting.
  • Keywords
    text analysis; dataset classification; frequency approach; random-walk term weighting; text classification; text classifier; Computer science; Context modeling; Encoding; Frequency estimation; Text categorization; Text processing; Weight measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing, 2007. ICSC 2007. International Conference on
  • Conference_Location
    Irvine, CA
  • Print_ISBN
    978-0-7695-2997-4
  • Type

    conf

  • DOI
    10.1109/ICSC.2007.56
  • Filename
    4338355