شماره ركورد كنفرانس :
3297
عنوان مقاله :
Arabic Named Entity Recognition Using Boosting Method
عنوان به زبان ديگر :
Arabic Named Entity Recognition Using Boosting Method
پديدآورندگان :
Bagher Sajadi Mohamad Department of Computer Engineering Central Tehran Branch Islamic Azad University Iran - Tehran , Minaei Behrooz Department of Robotics and Artificial Intelligence University of Science and Technology Iran - Tehran
كليدواژه :
Arabic Named Entity Recognition , Using Boosting Method , External Resources , NER Task , Feature Selection , Prediction Model , Corpus , (Natural Language Processing (NLP , MSA , ANERCorp
عنوان كنفرانس :
نوزدهمين سمپوزيوم بين المللي هوش مصنوعي و پردازش سيگنال
چكيده لاتين :
Abstract—In Natural Language Processing (NLP) studies,
developing resources and tools makes a contribution to extension
and Effectiveness of researches in each language. In recent years,
Arabic Named Entity Recognition (ANER) has been considered
by NLP researchers. While most of these researches are based
on Modern Standard Arabic (MSA), in this paper, we focus on
Classical Arabic (CA) literature. We propose a corpus called
NoorCorp with 200k labeled words for research purposes which is
annotated by expert human resources manually. We also collected
about 18k proper names from old Hadith books as gazetteer
which is called NoorGazet. Using ensemble learning, we develop
a new approach for extraction of named entities (NEs) including
person, location and organization. Adaboost.M2 algorithm, as
implementation of multiclass Boosting method, is applied to train
the prediction model. Results show that performance of the
method is better than decision tree as the base classifier. We have
used tokenizing, part of speech (POS) tagging, and base phrase
chunking (BPC) to overcome linguistic obstacles in Arabic. An
overall F-measure value of 96.04 is obtained. In addition, we
have studied the effect of preprocessing and external resources
on the system results. Finally, the proposed approach is applied
on ANERCorp as MSA corpus and we have compared the results
with NoorCorp.