Title :
Statistical Entity Extraction From the Web
Author :
Nie, Zaiqing ; Wen, Ji-Rong ; Ma, Wei-Ying
Author_Institution :
Microsoft Research Asia, Beijing, P. R. China
Abstract :
There are various kinds of valuable semantic information about real-world entities embedded in webpages and databases. Extracting and integrating these entity information from the Web is of great significance. Comparing to traditional information extraction problems, web entity extraction needs to solve several new challenges to fully take advantage of the unique characteristic of the Web. In this paper, we introduce our recent work on statistical extraction of structured entities, named entities, entity facts and relations from Web. We also briefly introduce iKnoweb, an interactive knowledge mining framework for entity information integration. We will use two novel web applications, Microsoft Academic Search (aka Libra) and EntityCube, as working examples.
Keywords :
Data mining; Feature extraction; Information retrieval; Knowledge representation; Layout; Search engines; Semantics; Visualization; Web pages; Crowdsourcing; entity extraction; entity relationship mining; entity search; interactive knowledge mining; named entity extraction; natural language processing; web page segmentation;
Journal_Title :
Proceedings of the IEEE
DOI :
10.1109/JPROC.2012.2191369