ساخت اضافه در زبان فارسي: بررسي پيكره‌بنياد

عنوان به زبان ديگر

The Corpus-Based Study of Ezafe Construction in Persian

پديد آورندگان

نساجيان، مينو دانشگاه صنعتي شريف - مركز زبان ها و زبان شناسي , شجاعي، راضيه دانشگاه تهران , بحراني، محمد دانشگاه علامه طباطبائي - گروه علوم رايانه

تعداد صفحه

از صفحه

161

تا صفحه

182

كليدواژه

نشانة اضافه , ساخت اضافه , دستور وابستگي , قواعد درج اضافه , پردازش متن فارسي

چكيده فارسي

ساخت اضافه همواره در نظريههاي مختلف زبانشناسي نظير آوايي، ساختواژي و نحوي حائز اهميت بوده است و زبانشناسان ايراني تاكنون تحليلهاي متفاوتي از اين ساخت به دست دادهاند. عدم تظاهر كسرۀ اضافه در نوشتار، ابهامات بسياري را در تحليل و درك متون فارسي موجب شده است و برنامه‌هاي مختلف پردازش زبان اعم از برچسب‌زن اجزاي كلام، تشخيص موجوديت‌هاي نام‌مند، تشخيص كلمات هم‌مرجع، تبديل متن به گفتار، ترجمة ماشيني، تجزية نحوي جملات و غيره را با چالش‌هاي بسياري روبرو ساخته است. به همين روي، شناسايي جايگاه اين عنصر از مهمترين چالشهاي پردازش متون زبان فارسي بهشمار ميرود. پژوهش حاضر ميكوشد تا به شيوهاي تحليلي و پيكرهبنياد و از منظر دستور وابستگي به بررسي ساخت اضافه بپردازد. از آنجا كه دستور وابستگي به لحاظ سادگي، استفاده كم از فضاي حافظه رايانه و تسريع در امر پردازش در مطالعات پردازش متن در حوزة زبانشناسي رايانشي از اهميت چشمگيري برخوردار است، بهترين پايگاه نظري را براي اين دست مطالعات فراهم مي‌آورد. به همين سبب، پژوهش حاضر در تلاش است تا با استفاده از اين دستور روشي قاعده‌مند جهت تشخيص كلمات حاوي نشانۀ كسرۀ اضافه در متون فارسي ارائه دهد. بدين منظور، با ارائة كلية ساخت‌هاي نمونه‌اي كه حاوي نشانة اضافه هستند و از پيكرة وابستگي زبان فارسي دانشگاه اوپسالا استخراج شده‌اند، در چارچوب دستور وابستگي به تجزيه و تحليل آن‌ها خواهيم پرداخت. از رهگذر اين بررسي، تنها هفت قاعدۀ منطقي براي درج كسرۀ اضافه در گروه‌هاي غيرفعلي اعم از گروههاي اسمي، صفتي، حرف اضافهاي، قيدي، گروه‌هايي با بيش از يك وابستة پسين، گروه‌هايي با بيش از يك وابسته از نوع گروه و ساخت‌هاي همپايه استخراج ميشود كه با استناد به آن‌ها ميتوان در انواع پيكره‌هاي وابستگي و سامانههاي رايانهاي مبتني بر تجزية وابستگي به شناسايي جايگاه نشانۀ اضافه پرداخت. افزون ‌بر اين، در اين پژوهش به جايگاه‌هاي نشانة اضافه‌‌اي نيز اشاره خواهد شد كه تاكنون در پژوهش‌هاي نظري و رايانه‌اي پيشين به‌ آن‌ها پرداخته نشده است.

چكيده لاتين

Ezafe construction is considered as one of the most important issues in various linguistic theories including phonetics, morphology and syntax and many Iranian linguists have analyzed this phenomenon from these different aspects. Ezafe marker is usually not written in Persian text. So, not only does it result in a high degree of ambiguity in reading, analyzing, and understanding Persian documents, but also it causes serious difficulties for a large number of natural language processing tasks (NLP) such as part-of-speech (POS) tagging, Named-Entity Recognition (NER), Co-reference Resolution, Converting Text to Speech, Machine Translation, syntactic parsing and so on. As a result, determining the positions of Ezafe in a given sentence is viewed as a controversial and challenging issue especially in these applications. Using a corpus-based analysis and dependency grammar, the current paper sets to study Ezafe positions. Due to the fact that dependency grammar applies a simple parsing, uses low memory and speeds up computer operations, this grammar is regarded as one of the important and practical grammars in the field of computational linguistics. Accordingly, this study will use a rule-based method within this framework to recognize Ezafe positions. For this purpose, all Ezafe constructions which are provided in Uppsala Persian Dependency Corpus (UPDC) are analyzed based on dependency relations. In the next step, only seven Ezafe rules are formulated consisting of such non-verbal phrases as noun phrases, adjective phrases, prepositional phrases, adverb phrases, phrases with more than one post-modifier, phrases with more than one post-modifier as a phrase and co-ordinations. The proposed rules can be used in Persian dependency corpora and a great number of language processing tasks which are based on dependency relations. In addition, in the present research, Ezafe positions which have not been mentioned in previous theoretical and computational studies will be elaborated.

سال انتشار

1398

عنوان نشريه

پژوهشهاي زباني

فايل PDF

7585723

عنوان نشريه

پژوهشهاي زباني

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=8&DC=1056002