DocumentCode
2711239
Title
Iterative Set Expansion of Named Entities Using the Web
Author
Wang, Richard C. ; Cohen, William W.
Author_Institution
Language Technol. Inst., Carnegie Mellon Univ. Pittsburgh, Pittsburgh, PA
fYear
2008
fDate
15-19 Dec. 2008
Firstpage
1091
Lastpage
1096
Abstract
Set expansion refers to expanding a partial set of "seed" objects into a more complete set. One system that does set expansion is SEAL (set expander for any language), which expands entities automatically by utilizing resources from the Web in a language independent fashion. In a previous study, SEAL showed good set expansion performance using three seed entities; however, when given a larger set of seeds (e.g., ten), SEAL\´s expansion method performs poorly. In this paper, we present iterative SEAL (iSEAL), which allows a user to provide many seeds. Briefly, iSEAL makes several calls to SEAL, each call using a small number of seeds. We also show that iSEAL can be used in a "bootstrapping" manner, where each call to SEAL uses a mixture of user-provided and self-generated seeds. We show that the bootstrapping version of iSEAL obtains better results than SEAL even when using fewer user-provided seeds. In addition, we compare the performance of various ranking algorithms used in iSEAL, and show that the choice of ranking method has a small effect on performance when all seeds are user-provided, but a large effect when iSEAL is bootstrapped. In particular, we show that random walk with restart is nearly as good as Bayesian sets with user-provided seeds, and performs best with bootstrapped seeds.
Keywords
Bayes methods; Internet; iterative methods; Bayesian sets; Web; bootstrapped seeds; bootstrapping version; iSEAL; iterative SEAL; iterative set expansion; named entities; random walk; seed entities; self-generated seeds; set expander; Bayesian methods; Data mining; HTML; Markup languages; Motion pictures; Natural languages; Seals; TV; USA Councils; Watches; bootstrapping; named entities; seal; set expansion;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location
Pisa
ISSN
1550-4786
Print_ISBN
978-0-7695-3502-9
Type
conf
DOI
10.1109/ICDM.2008.145
Filename
4781230
Link To Document