DocumentCode
2734230
Title
Intelligence Gathering from Online NEWS Documents
Author
Suryanarayanan, Mahalakshmi G. ; Selvaraju, Sendhilkumar
Author_Institution
Dept. of Comput. Sci. & Eng., Anna Univ., Chennai
fYear
2006
fDate
6-6 Dec. 2006
Firstpage
436
Lastpage
441
Abstract
Web documents are increasingly expanding everyday, which inhibits the practice of surfing the contents, especially news articles of one´s own interest. An automatic information retrieval (IR) system specially designed for retrieving Web-based NEWS articles of one´s personal interest would be of much use. This paper proposes a web-based focused IR system for collecting news articles about any chosen celebrity thereby reducing their precious time over surfing the net. A static domain-specific ontological framework about the focus of retrieval is fed to the system. The implemented system retrieves the focused information from the tagged news corpus by automatically classifying into various categories based on certain heuristics. The classified corpus is then subjected to information filtration by which the relevant semantic information present in the article is organized and projected as output. Precision and recall statistics are tabulated for politics and sports domain. By feeding alternate domain ontology, domain independent news information retrieval shall be assured.
Keywords
Internet; document handling; information retrieval; information retrieval systems; ontologies (artificial intelligence); statistical analysis; Web documents; Web-based news retrieval; automatic information retrieval; domain ontology; information retrieval; intelligence gathering; online news documents; semantic information; static domain-specific ontological framework; Computer science; Databases; Filtration; Indexing; Information filtering; Information filters; Information retrieval; Ontologies; Search engines; Statistics;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Information Management, 2006 1st International Conference on
Conference_Location
Bangalore
Print_ISBN
1-4244-0682-X
Type
conf
DOI
10.1109/ICDIM.2007.369234
Filename
4221926
Link To Document