شماره ركورد كنفرانس :
3297
عنوان مقاله :
SAZEH: A Wide Coverage Persian Constituency Tree Bank and Parser
عنوان به زبان ديگر :
SAZEH: A Wide Coverage Persian Constituency Tree Bank and Parser
پديدآورندگان :
Tabatabayi Seifi Shohreh RCDAT: Research Center for Development of Advanced Technologies Speech Group Tehran - Iran , Sarraf Rezaee Iman RCDAT: Research Center for Development of Advanced Technologies Speech Group Tehran - Iran
كليدواژه :
Natural Language Processing , Constituency Parser , Constituency Treebank
سال انتشار :
آبان 1396
عنوان كنفرانس :
نوزدهمين سمپوزيوم بين المللي هوش مصنوعي و پردازش سيگنال
چكيده لاتين :
Constituency parsing is one of the basic operations in many NLP tasks such as translation, Information Extraction, Abstractive Summarization and etc. We need wide coverage constituency treebank to train a probabilistic parser. SAZEH is the first large-volume Persian constituency treebank with more than 21000 parsed trees and 627000 tokens. The average length of its sentences is 30 words. They are chosen from Peykare Corpus which already has POS tags. Berkeley Lexical Parser is trained on SAZEH corpus and the best F-measure attained on the test part of the corpus is 81.65% using gold POS-tags.
كشور :
ايران
تعداد صفحه 2 :
4
از صفحه :
1
تا صفحه :
4
لينک به اين مدرک :
بازگشت