عنوان مقاله :
موضوع بندي متن هاي متراكم
عنوان به زبان ديگر :
Topic Specific of the Dense Documents
پديد آورندگان :
قاسم آقايي ، ناصر نويسنده ,
كليدواژه :
متن متراكم , متن ها , پردازش متن , Pronoun referencing , روابط وابستگي , مرجع يابي ضمي , Topic specific , موضوع بندي متن هاي متراكم , Dependency relations , Dense text , Text processing
چكيده لاتين :
This paper investigates text documents regarding their topic density. It has divided them into two groups: dense and sparse documents. Dense documents are texts with a wide domain of topics. They have a high topic density (for example religious books, encyclopedia, magazine archives, etc ). We have shown that a) traditional methods can not be used for topic specific of dense texts, and b) we can benefit from employing the efficiency of the proposed method (Nasir) for dense texts.
In this research, we have used dependency relations, paths, triple databases and statistical text processing methods to extract important words and to insert them into a clustering index. Also a method was described to find the reference of pronouns in dense texts.
In addition, based on the suggested methods, a prototype system called Nasir was implemented. The result of the implementation on Persian dense texts shows that the quality of indexing and searching improved significantly.
كلمات كليدي :
#تست#آزمون###امتحان