• DocumentCode
    3571124
  • Title

    An Improved Approach to Bengali Keyphrase Extraction

  • Author

    Sarkar, Kamal

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Jadavpur Univ., Kolkata, India
  • fYear
    2014
  • Firstpage
    283
  • Lastpage
    288
  • Abstract
    This paper presents a new approach for automatically extracting key phrases from a Bengali document. Our proposed approach presented in this paper has two important steps: (1) a shallow parsing based candidate key phrase identification that uses lexical information and case markers for candidate key phrase identification and (2) choosing the best items from the set of the candidates using a ranking method that combines the statistical features and the linguistic features for ranking the candidates. The feature set includes term frequency, position of the phrase´s first occurrence, named entity information and lexical information. The proposed system has been tested on a collection of Bengali news documents. The experimental results show that it performs better than the existing approaches to which it is compared.
  • Keywords
    document handling; grammars; information analysis; natural language processing; Bengali document; Bengali keyphrase extraction; Bengali news document; candidate key phrase identification; case marker; lexical information; linguistic feature; named entity information; phrase first occurrence; ranking method; shallow parsing; statistical feature; term frequency; Boosting; Data mining; Feature extraction; Information technology; Ranking (statistics); Time-frequency analysis; Training; Bengali; Case markers; Keyphrase Extraction; Named entities; Shallow parsing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Emerging Applications of Information Technology (EAIT), 2014 Fourth International Conference of
  • Type

    conf

  • DOI
    10.1109/EAIT.2014.60
  • Filename
    7052060