• DocumentCode
    2711239
  • Title

    Iterative Set Expansion of Named Entities Using the Web

  • Author

    Wang, Richard C. ; Cohen, William W.

  • Author_Institution
    Language Technol. Inst., Carnegie Mellon Univ. Pittsburgh, Pittsburgh, PA
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    1091
  • Lastpage
    1096
  • Abstract
    Set expansion refers to expanding a partial set of "seed" objects into a more complete set. One system that does set expansion is SEAL (set expander for any language), which expands entities automatically by utilizing resources from the Web in a language independent fashion. In a previous study, SEAL showed good set expansion performance using three seed entities; however, when given a larger set of seeds (e.g., ten), SEAL\´s expansion method performs poorly. In this paper, we present iterative SEAL (iSEAL), which allows a user to provide many seeds. Briefly, iSEAL makes several calls to SEAL, each call using a small number of seeds. We also show that iSEAL can be used in a "bootstrapping" manner, where each call to SEAL uses a mixture of user-provided and self-generated seeds. We show that the bootstrapping version of iSEAL obtains better results than SEAL even when using fewer user-provided seeds. In addition, we compare the performance of various ranking algorithms used in iSEAL, and show that the choice of ranking method has a small effect on performance when all seeds are user-provided, but a large effect when iSEAL is bootstrapped. In particular, we show that random walk with restart is nearly as good as Bayesian sets with user-provided seeds, and performs best with bootstrapped seeds.
  • Keywords
    Bayes methods; Internet; iterative methods; Bayesian sets; Web; bootstrapped seeds; bootstrapping version; iSEAL; iterative SEAL; iterative set expansion; named entities; random walk; seed entities; self-generated seeds; set expander; Bayesian methods; Data mining; HTML; Markup languages; Motion pictures; Natural languages; Seals; TV; USA Councils; Watches; bootstrapping; named entities; seal; set expansion;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3502-9
  • Type

    conf

  • DOI
    10.1109/ICDM.2008.145
  • Filename
    4781230