• DocumentCode
    3258989
  • Title

    Automatic Keyword Extraction Using Linguistic Features

  • Author

    Hu, Xinghua ; Wu, Bin

  • Author_Institution
    Baskin Sch. of Eng., California Univ., Santa Cruz, CA
  • fYear
    2006
  • fDate
    Dec. 2006
  • Firstpage
    19
  • Lastpage
    23
  • Abstract
    This paper describes a novel keyword extraction algorithm position weight (PW) that utilizes linguistic features to represent the importance of the word position in a document. Topical terms and their previous-term and next-term co-occurrence collections are extracted. To measure the degree of correlation between a topical term and its co-occurrence terms, three methods are employed including term frequency inverse term frequency (TFITF), position weight inverse position weight (PWIPW), and CHI-square (chi2). The co-occurrence terms that have the highest degree of correlation and exceed a co-occurrence frequency threshold are combined together with the original topical term to form a final keyword. With the linear computational complexity of the algorithm, the vector space of documents in a large corpus or boundless Web can be quickly represented by sets of keywords, which makes it possible to retrieve large-scale information fast and effectively
  • Keywords
    computational complexity; data mining; document handling; information retrieval; CHI-square; automatic keyword extraction; boundless Web; cooccurrence collections; cooccurrence frequency threshold; cooccurrence terms; large corpus; linear computational complexity; linguistic features; position weight inverse position weight; term frequency inverse term frequency; topical terms; vector space; word position; Computational complexity; Content based retrieval; Data mining; Feature extraction; Frequency measurement; Information retrieval; Large-scale systems; Position measurement; Vectors; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2702-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2006.36
  • Filename
    4063591