Title :
A summarizer system based on a semantic analysis of web documents
Author :
Florence, Angelin ; Padmadas, Vijaya
Author_Institution :
Thadomal Shahani Eng. Coll., Mumbai, India
Abstract :
The availability of web and search engines has made the search easier nowadays. Information overload is one of the major problems which require algorithms and tools for faster access. Electronic documents are one of the major sources of information for business and academic information. In order to fully utilizing these on-line documents effectively, it is crucial to be able to extract the summary of these documents. Summarization system will be one of the solutions to the above problem. This project proposes a summarizer system which will be able to perform summarization of multiple documents. The input text documents are analyzed through a parser which parses the input documents and generates parse tree for each sentence. RDF triples are extracted from each sentence by analyzing the typed dependencies generated from the parser in the form of subject, verb and object. Semantic distance is computed between each pair of sentences and a matrix containing the semantic distance for sentences are computed. The measure adopted to compute semantic distance is Wu and Palmer distance. A clustering algorithm is applied to the extracted subject, verb and object space and the extracted RDF triples are grouped into clusters. The important sentences are selected for final summary are extracted using sentence selection algorithm.
Keywords :
document handling; information analysis; information retrieval systems; pattern clustering; RDF triples; Web documents; Wu-Palmer distance; academic information; business information; clustering algorithm; document extraction; electronic documents; information overload; resource description framework; search engines; semantic analysis; semantic distance; sentence selection algorithm; summarizer system; Algorithm design and analysis; Clustering algorithms; Fires; Resource description framework; Semantics; Sustainable development; Text analysis; NLP; RDF; Semantic analysis; Summarization; parse tree;
Conference_Titel :
Technologies for Sustainable Development (ICTSD), 2015 International Conference on
Conference_Location :
Mumbai
DOI :
10.1109/ICTSD.2015.7095851