Title :
JANA: An Arabic human-human dialogues corpus
Author :
Elmadany, AbdelRahim A. ; Abdou, Sherif M. ; Gheith, Mervat
Author_Institution :
Dept. of Comput. Sci., Cairo Univ., Cairo, Egypt
Abstract :
We present JANA, a multi-genre corpus of Arabic dialogues labeled for Arabic Dialogues Language Understanding (ADLU) at the utterance level. This paper describes progress in a development of the human-human dialogue corpus of Arabic spontaneous Spoken Dialogues (SD) and Instant Massages (IM). We collected dialogues from different genre call centers such as Banks, nights, and Mobile Network providers; these dialogues consist of transcribed phone calls and instant messages for inquiries regarding providing service from call centers. In addition, the annotation schema and manually turns segmentation are described. The collected data consist of approximately 3001 turns with average 6.7 words per turn, contains 4725 utterances with average 4.3 words per utterance, and 20311 words; and it will be made freely available to academic and nonprofit research.
Keywords :
natural language processing; ADLU; Arabic dialogues language understanding; Arabic human-human dialogues corpus; Arabic spontaneous spoken dialogues; IM; JANA; SD; annotation schema; human-human dialogue corpus; instant massages; multigenre corpus; transcribed phone calls; Decision support systems; Economic indicators; Annotated corpus; Arabic Dialgoues Corpus; Arabic Language Understanding; Dialogues Acts;
Conference_Titel :
Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
Conference_Location :
Kolkata
DOI :
10.1109/ReTIS.2015.7232903