Title :
Automatic detection of subject/object drops in Bengali
Author :
Das, Aruneema ; Garain, U. ; Senapati, Apurbalal
Author_Institution :
CVPR Unit Indian Stat. Inst. Kolkata, Kolkata, India
Abstract :
This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominant drops in Bengali refer to subject, object and verb drops. Bengali is a pro-drop language and pro-drops fall under subject/object drops which this research concentrates on. The detection algorithm makes use of off-the-shelf Bengali NLP tools like POS tagger, chunker and a dependency parser. Simple linguistic rules are initially applied to quickly annotate a dataset of 8,455 sentences which are then manually checked. The corrected dataset is then used to train two classifiers that classify a sentence to either one with a drop or no drop. The features previously used by other researchers have been considered. Both the classifiers show comparable overall performance. As a by-product, the current study generates another (apart from the drop-annotated dataset) useful NLP resource, i.e. classification of Bengali verbs (all morphological variants of 881 root verbs) as per their transitivity which in turn used as a feature by the classifiers.
Keywords :
grammars; linguistics; natural language processing; pattern classification; word processing; Bengali; POS tagger; automatic drops detection; chunker; dependencv parser; linguistic rules; off-the-shelf Bengali NLP tools; pro-drop language; sentence classification; subject-object drops; verb drops; Bismuth; TV; Classification;; Dependency parsing; POS tagging; Subject/Object drop;
Conference_Titel :
Asian Language Processing (IALP), 2014 International Conference on
Conference_Location :
Kuching
DOI :
10.1109/IALP.2014.6973488