DocumentCode :
3017140
Title :
Categorized Text Document Summarization in the Kannada Language by sentence ranking
Author :
Jayashree, R. ; Srikanta, M.K. ; Anami, Basavaraj S.
Author_Institution :
Dept. of Comput. Sci., PES Inst. of Technol., Bangalore, India
fYear :
2012
fDate :
27-29 Nov. 2012
Firstpage :
776
Lastpage :
781
Abstract :
The growth of internet has given rise to the need for better Information Retrieval (IR) techniques which help in obtaining relevant information at a faster rate. Text Summarization is one such technique which aims at producing a quick and concise summary of the Text. Of late, Key word based summary has drawn wide attention of researchers in Natural Language Processing community. The algorithm we have developed extracts key words from Kannada text documents, for which we combine GSS (Galavotti, Sebastiani, Simi)[13] coefficients and IDF(Inverse Document Frequency) methods along with TF(Term Frequency) for extracting key words and later uses these for summarization. The important objective our work is to assign a weight to each word in a sentence, the weight of a sentence is the sum of weights of all words, based on the scoring of sentences; we choose top `m´ sentences. A document from a given category is selected from our database custom built for this purpose. The files are obtained from Kannada Webdunia. Kannada Webdunia is a Kannada Portal which offers Political News, Cinema News, Sports news, Shopping and Jokes. Depending on the number of sentences given by the user, a summary is generated. Finally we make comparison of machine generated summary with that of human summary. Yet another objective of this work is to perform feature extraction through removal of stop words. For removing stop words we have presented a novel technique which finds structurally similar words in a document.
Keywords :
Internet; classification; feature extraction; natural language processing; portals; relevance feedback; text analysis; GSS; Galavotti-Sebastiani-Simi coefficients; IDF method; IR techniques; Internet; Kannada Webdunia; Kannada language; Kannada portal; Kannada text documents; categorized text document summarization; cinema news; feature extraction; inverse document frequency method; jokes; keyword extraction; keyword-based summary; machine generated summary; natural language processing; political news; relevant information retrieval; sentence ranking; shopping; sports news; stop word removal; term frequency; Data mining; Encoding; Humans; GSS coefficient; IDF; Keywords; Ranking; TF; ranking; sentence; summary; word weight;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems Design and Applications (ISDA), 2012 12th International Conference on
Conference_Location :
Kochi
ISSN :
2164-7143
Print_ISBN :
978-1-4673-5117-1
Type :
conf
DOI :
10.1109/ISDA.2012.6416635
Filename :
6416635
Link To Document :
بازگشت