DocumentCode :
2261043
Title :
A method for stemming and eliminating common words for Persian text summarization
Author :
Berenjkoob, Marzieh ; Mehri, Razieh ; Khosravi, Hadi ; Nematbakhsh, Mohammad Ali
Author_Institution :
Dept. of Comput. Eng., Univ. of Isfahan., Isfahan, Iran
fYear :
2009
fDate :
24-27 Sept. 2009
Firstpage :
1
Lastpage :
6
Abstract :
With high increasing documents and electronic texts in Persian language, the use of fast methods to achieve texts through huge sets of documents is highly crucial. Persian text summarization which shows the main concept of a text in minimum size is an effective solution. One of the steps in Persian text summarization is to stem and eliminate common words. The aim of this research is to stem words from Persian documents to make their use more efficient in text summarization, the present method is to eliminate words and stem keywords. The compound of existing techniques in the words network was used to create a Persian database using the Dehkhoda dictionary. The algorithm used for summarization is based on statistical techniques. In this method each sentence is given an important weight, sentences with higher weight are used for summarization. By comparing the results of other algorithms on Persian texts we concluded that our technique extracts the root of the existing words with more precision.
Keywords :
natural language processing; statistical analysis; text analysis; Dehkhoda dictionary; Persian language; Persian text summarization; common words elimination; common words stemming; statistical technique; Data mining; Databases; Dictionaries; Frequency measurement; Information retrieval; Natural language processing; Ontologies; Statistical analysis; Text recognition; Database; Text Summarization; common words; stemming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-4538-7
Electronic_ISBN :
978-1-4244-4540-0
Type :
conf
DOI :
10.1109/NLPKE.2009.5313836
Filename :
5313836
Link To Document :
بازگشت