An Improved Approach to Bengali Keyphrase Extraction

Author

Sarkar, Kamal

Author_Institution

Dept. of Comput. Sci. & Eng., Jadavpur Univ., Kolkata, India

fYear

2014

Firstpage

283

Lastpage

288

Abstract

This paper presents a new approach for automatically extracting key phrases from a Bengali document. Our proposed approach presented in this paper has two important steps: (1) a shallow parsing based candidate key phrase identification that uses lexical information and case markers for candidate key phrase identification and (2) choosing the best items from the set of the candidates using a ranking method that combines the statistical features and the linguistic features for ranking the candidates. The feature set includes term frequency, position of the phrase´s first occurrence, named entity information and lexical information. The proposed system has been tested on a collection of Bengali news documents. The experimental results show that it performs better than the existing approaches to which it is compared.

Keywords

document handling; grammars; information analysis; natural language processing; Bengali document; Bengali keyphrase extraction; Bengali news document; candidate key phrase identification; case marker; lexical information; linguistic feature; named entity information; phrase first occurrence; ranking method; shallow parsing; statistical feature; term frequency; Boosting; Data mining; Feature extraction; Information technology; Ranking (statistics); Time-frequency analysis; Training; Bengali; Case markers; Keyphrase Extraction; Named entities; Shallow parsing;

fLanguage

English

Publisher

ieee

Conference_Titel

Emerging Applications of Information Technology (EAIT), 2014 Fourth International Conference of

Type

conf

DOI

10.1109/EAIT.2014.60

Filename

7052060