DocumentCode
2407864
Title
WISDOM from Light-Weight Information Retrieval
Author
Bracewell, David B. ; Gustafson, Steven ; Moitra, Abha ; Steuben, Gregg
Author_Institution
GE Global Res., Niskayuna, NY, USA
fYear
2010
fDate
20-22 Aug. 2010
Firstpage
347
Lastpage
354
Abstract
This paper presents a light-weight information retrieval and analysis architecture that addresses the complex task of gathering, combining, and storing documents to enable indepth analysis. The growing interest in mining the Internet for conversation topics, opinions, and influencers has resulted in many free and commercial products. At the heart of such capability are two core technologies: information retrieval and text mining. While search engines and technologies like RSS make gathering information easier, they, like text mining, still require a significant amount of consideration when applying them in mission critical situations. For example, different search engines retrieve irrelevant results, and it is difficult to impossible to know that all relevant results have been found. Also, doing significant analysis of such documents will usually require the fusion of other information sources - a task that most search engines, at least, do not support. We have developed a system and architecture for light-weight document and information retrieval to enable focused and deep analysis of text, authors and publishers, and the networks that they form between each other through citations and other reference and co-occurrence analysis. While it is both intuitive and obvious that such a system is necessary for in-depth analysis, it is nontrivial as to how to construct such a system out of the many moving pieces, data sources and technologies. We show both the architecture, discuss the decisions steps, and demonstrate analysis that are enabled by the system.
Keywords
data mining; information retrieval; search engines; text analysis; Internet; WISDOM; cooccurrence analysis; light-weight document retrieval; light-weight information retrieval; search engines; text mining; Data mining; Feeds; Google; Information services; Internet; Search engines; Web sites; Information Retrieval; Natural Language Processing; Open Source Intelligence Gathering; Text Mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Social Computing (SocialCom), 2010 IEEE Second International Conference on
Conference_Location
Minneapolis, MN
Print_ISBN
978-1-4244-8439-3
Electronic_ISBN
978-0-7695-4211-9
Type
conf
DOI
10.1109/SocialCom.2010.57
Filename
5591252
Link To Document