DocumentCode
526118
Title
Croatian web text summarizer (CroWebSum)
Author
Preradovic, Nives Mikelic ; Ljubesic, Nikola ; Boras, Damir
Author_Institution
Dept. of Inf. Sci., Univ. of Zagreb, Zagreb, Croatia
fYear
2010
fDate
21-24 June 2010
Firstpage
109
Lastpage
114
Abstract
The paper describes automatic summarization of newspaper texts in Croatian language. The goal of the CroWebSum is to generate high-quality extracts that are both coherent and keep relevant information from the original text. The preliminary evaluation shows that extracts in the size of 10 % of the original text have good coherence, while the extract in the size of 5 % of the original text still conveys the most relevant information. Also, while cutting down news to SMS size (maximum 160 characters), CroWebSum performed quite well. The research brought us to conclusion that we should develop a technique that uses context vectors to calculate the semantic similarity between the terms in the document as well as pronoun resolution algorithm in order to improve the text summarization for Croatian language.
Keywords
natural language processing; text analysis; CroWebSum; Croatian Web text summarizer; Croatian language; newspaper text summarization; Coherence; Data mining; Equations; Feature extraction; Frequency measurement; Mathematical model; Strontium; Croatian language; Newspaper text summarizer; SweSum; extract; inflected language;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology Interfaces (ITI), 2010 32nd International Conference on
Conference_Location
Cavtat/Dubrovnik
ISSN
1330-1012
Print_ISBN
978-1-4244-5732-8
Type
conf
Filename
5546374
Link To Document