Title :
An Improved Approach to Bengali Keyphrase Extraction
Author_Institution :
Dept. of Comput. Sci. & Eng., Jadavpur Univ., Kolkata, India
Abstract :
This paper presents a new approach for automatically extracting key phrases from a Bengali document. Our proposed approach presented in this paper has two important steps: (1) a shallow parsing based candidate key phrase identification that uses lexical information and case markers for candidate key phrase identification and (2) choosing the best items from the set of the candidates using a ranking method that combines the statistical features and the linguistic features for ranking the candidates. The feature set includes term frequency, position of the phrase´s first occurrence, named entity information and lexical information. The proposed system has been tested on a collection of Bengali news documents. The experimental results show that it performs better than the existing approaches to which it is compared.
Keywords :
document handling; grammars; information analysis; natural language processing; Bengali document; Bengali keyphrase extraction; Bengali news document; candidate key phrase identification; case marker; lexical information; linguistic feature; named entity information; phrase first occurrence; ranking method; shallow parsing; statistical feature; term frequency; Boosting; Data mining; Feature extraction; Information technology; Ranking (statistics); Time-frequency analysis; Training; Bengali; Case markers; Keyphrase Extraction; Named entities; Shallow parsing;
Conference_Titel :
Emerging Applications of Information Technology (EAIT), 2014 Fourth International Conference of
DOI :
10.1109/EAIT.2014.60