Title of article

Mehr: A Persian Coreference Resolution Corpus

Author/Authors

Haji Mohammadi ، Hassan Department of Computer Engineering - Islamic Azad University, North Tehran Branch , Talebpour ، Alireza Department of computer engineering - Shahid Beheshti University , Mahmoudi Aznaveh ، Ahamd Department of computer engineering - Shahid Beheshti University , Yazdani ، Samaneh Department of Computer Engineering - Islamic Azad University, North Tehran Branch

From page

407

To page

416

Abstract

Coreference resolution is one of the essential tasks of natural languageprocessing. This task identifies all in-text expressions that refer to thesame entity in the real world. Coreference resolution is used in otherfields of natural language processing, such as information extraction,machine translation, and question-answering.This article presents a new coreference resolution corpus in Persiannamed Mehr corpus. The article’s primary goal is to develop a Persiancoreference corpus that resolves some of the previous Persian corpus’sshortcomings while maintaining a high inter-annotator agreement. Thiscorpus annotates coreference relations for noun phrases, namedentities, pronouns, and nested named entities. Two baseline pronounresolution systems are developed, and the results are reported. Thecorpus size includes 400 documents and about 170k tokens. Corpusannotation is done by WebAnno preprocessing tool.

Keywords

Natural Language Processing , Mention , Anaphora resolution , Antecedent

Journal title

Journal of Artificial Intelligence and Data Mining

Journal title

Journal of Artificial Intelligence and Data Mining

Record number

2754447

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=2754447