مرکز منطقه ای اطلاع رساني علوم و فناوري - مدل دو مرحله‎اي شكاف-گلچين براي نمايه‎سازي خودكار متون فارسي

شماره ركورد :

750791

عنوان مقاله :

مدل دو مرحله‎اي شكاف-گلچين براي نمايه‎سازي خودكار متون فارسي

عنوان فرعي :

Two Steps Break-Cull Model for Automatic Indexing of Persian Texts

پديد آورندگان :

توكلي‎زاده راوري، محمد نويسنده استاديار گروه علم اطلاعات و دانش‎شناسي ,

اطلاعات موجودي :

فصلنامه سال 1394 شماره 80

رتبه نشريه :

علمي پژوهشي

تعداد صفحه :

از صفحه :

تا صفحه :

كليدواژه :

نمايه‎سازي خودكار , زبان فارسي , مدل شكاف – گلچين

چكيده فارسي :

هدف: به علت خاص بودن برخي از مسايل زباني، لازم است كه مدل‎‎هاي بومي نمايه‎سازي خودكار را با توجه به ويژگي‎هاي هر زبان طراحي كرد. اين مدل‎ها بايد به‎گونه‎اي طراحي شود كه جامعيت و مانعيت نمايه‎سازي مورد توجه باشد. هدف اين مقاله معرفي و سنجش توانمندي مدل دو مرحله‎اي شكاف – گلچين براي نمايه‎سازي خودكار مقالات فارسي است. ابتدا الگوريتم كار به تفصيل توضيح داده مي‎شود و سپس همخواني نتايج حاصل از اين الگوريتم با كليدواژه‎هاي نويسنده سنجيده خواهد شد. روش: مدل نمايه‎سازي خودكار فارسي به‎همراه توضيح مراحل و مسايل مرتبط با آن معرفي خواهد شد. ارزيابي مدل از طريق شاخص دربردارندگي انجام مي‌شود كه براي تعيين درصد همخواني بين نمايه‎سازان مورد استفاده قرار مي‎گيرد. براي اين كار، ميزان همخواني اصطلاحات نمايه‌اي كه از پياده‎سازي الگوريتم اين مدل حاصل شده‎اند، با كليدواژه‎هاي نويسندگان مقالات بررسي مي‎گردد. يافته‌ها: يافته‎ها نشان داد كه در 90 درصد از موارد، اصطلاحي كه اين مدل در يك مقاله به‎عنوان پروزن‎ترين اصطلاح تشخيص داده است، مشابه اولين كليدواژه نويسنده آن مقاله است. در كل، بين نتايج اين مدل و كليدواژه‎هاي نويسندگان 76 درصد همخواني وجود داشت كه در مقايسه با كارهاي قبلي، قابل قبول به نظر مي‎رسد. اصالت/ارزش: ارزش اوليه اين كار پرداختن به نمايه‎سازي خودكار با توجه به ويژگي‎هاي زبان فارسي است. براي پياده‎سازي مدل ارايه شده، فرض بر استفاده از زبان عبارات الگودار است كه توسط بسياري از زبان‎هاي برنامه‎نويسي پشتيباني مي‎شود و نياز به نصب و استفاده از جدول‎هاي بانك اطلاعاتي را براي پردازش متن كاهش مي‎دهد. همچنين، مشكل تعيين آستانه بالايي اصطلاحات اصلي را حل مي‎كند. علاوه بر آن، با الگوريتمي خاص، حد پاييني را نيز تعيين مي‎كند؛ به‎گونه‎اي كه ديگر تعداد اصطلاحات گلچين شده به طول متن بستگي ندارد. اين امكان، جامعيت و مانعيت نمايه‎سازي را تضمين مي‎كند.

چكيده لاتين :

Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing. This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to cull the most appropriate ones through a special method of term weighting. Methodology: The introduction method of the automatic indexing model is performed through showing the steps and the possible problems for running them. Evaluation is based on the inclusion index. This index is used for determination the inter-indexer consistency. Therefore, the consistency of resulted index terms (from this model) and author keywords is determined. Findings: Findings show that 90% of articlesʹ most weighted terms are similar to their first author keywords. The overall consistency between the results of running the model and author keywords is 76%. Compared with the prior works, the performance of the model is acceptable. Originality/Value: The initial value of this paper is concerning the automatic indexing with regard of Persian language problems. The model is well suited for using regular expression language which is supported by many programming languages. This diminishes the need to create database tables for text manipulation and processing. In addition, the model solves the problem of upper threshold for determination of final terms. Another algorithm makes it possible to determine the lower one. Finally, the number of culled terms does not depend on the text length. This guaranties the exhaustificity and specificity of indexing.

سال انتشار :

1394

عنوان نشريه :

تحقيقات اطلاع رساني و كتابخانه هاي عمومي

عنوان نشريه :

تحقيقات اطلاع رساني و كتابخانه هاي عمومي

اطلاعات موجودي :

فصلنامه با شماره پیاپی 80 سال 1394

كلمات كليدي :

#تست#آزمون###امتحان

لينک به اين مدرک :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=8&DC=750791