• DocumentCode
    2407864
  • Title

    WISDOM from Light-Weight Information Retrieval

  • Author

    Bracewell, David B. ; Gustafson, Steven ; Moitra, Abha ; Steuben, Gregg

  • Author_Institution
    GE Global Res., Niskayuna, NY, USA
  • fYear
    2010
  • fDate
    20-22 Aug. 2010
  • Firstpage
    347
  • Lastpage
    354
  • Abstract
    This paper presents a light-weight information retrieval and analysis architecture that addresses the complex task of gathering, combining, and storing documents to enable indepth analysis. The growing interest in mining the Internet for conversation topics, opinions, and influencers has resulted in many free and commercial products. At the heart of such capability are two core technologies: information retrieval and text mining. While search engines and technologies like RSS make gathering information easier, they, like text mining, still require a significant amount of consideration when applying them in mission critical situations. For example, different search engines retrieve irrelevant results, and it is difficult to impossible to know that all relevant results have been found. Also, doing significant analysis of such documents will usually require the fusion of other information sources - a task that most search engines, at least, do not support. We have developed a system and architecture for light-weight document and information retrieval to enable focused and deep analysis of text, authors and publishers, and the networks that they form between each other through citations and other reference and co-occurrence analysis. While it is both intuitive and obvious that such a system is necessary for in-depth analysis, it is nontrivial as to how to construct such a system out of the many moving pieces, data sources and technologies. We show both the architecture, discuss the decisions steps, and demonstrate analysis that are enabled by the system.
  • Keywords
    data mining; information retrieval; search engines; text analysis; Internet; WISDOM; cooccurrence analysis; light-weight document retrieval; light-weight information retrieval; search engines; text mining; Data mining; Feeds; Google; Information services; Internet; Search engines; Web sites; Information Retrieval; Natural Language Processing; Open Source Intelligence Gathering; Text Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Social Computing (SocialCom), 2010 IEEE Second International Conference on
  • Conference_Location
    Minneapolis, MN
  • Print_ISBN
    978-1-4244-8439-3
  • Electronic_ISBN
    978-0-7695-4211-9
  • Type

    conf

  • DOI
    10.1109/SocialCom.2010.57
  • Filename
    5591252