شماره ركورد كنفرانس :
3297
عنوان مقاله :
Arabic Named Entity Recognition Using Boosting Method
عنوان به زبان ديگر :
Arabic Named Entity Recognition Using Boosting Method
پديدآورندگان :
Bagher Sajadi Mohamad Department of Computer Engineering Central Tehran Branch Islamic Azad University Iran - Tehran , Minaei Behrooz Department of Robotics and Artificial Intelligence University of Science and Technology Iran - Tehran
كليدواژه :
Arabic Named Entity Recognition , Using Boosting Method , External Resources , NER Task , Feature Selection , Prediction Model , Corpus , (Natural Language Processing (NLP , MSA , ANERCorp
سال انتشار :
آبان 1396
عنوان كنفرانس :
نوزدهمين سمپوزيوم بين المللي هوش مصنوعي و پردازش سيگنال
چكيده لاتين :
Abstract—In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and Effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers. While most of these researches are based on Modern Standard Arabic (MSA), in this paper, we focus on Classical Arabic (CA) literature. We propose a corpus called NoorCorp with 200k labeled words for research purposes which is annotated by expert human resources manually. We also collected about 18k proper names from old Hadith books as gazetteer which is called NoorGazet. Using ensemble learning, we develop a new approach for extraction of named entities (NEs) including person, location and organization. Adaboost.M2 algorithm, as implementation of multiclass Boosting method, is applied to train the prediction model. Results show that performance of the method is better than decision tree as the base classifier. We have used tokenizing, part of speech (POS) tagging, and base phrase chunking (BPC) to overcome linguistic obstacles in Arabic. An overall F-measure value of 96.04 is obtained. In addition, we have studied the effect of preprocessing and external resources on the system results. Finally, the proposed approach is applied on ANERCorp as MSA corpus and we have compared the results with NoorCorp.
كشور :
ايران
تعداد صفحه 2 :
8
از صفحه :
1
تا صفحه :
8
لينک به اين مدرک :
بازگشت