DocumentCode :
420228
Title :
Finding authoritative people from the Web
Author :
Harada, Masanori ; Sato, Shin-ya ; Kazama, Kazuhiro
Author_Institution :
Network Innovation Labs., Nippon Telegraph & Telephone Corp., Tokyo, Japan
fYear :
2004
fDate :
7-11 June 2004
Firstpage :
306
Lastpage :
313
Abstract :
Today´s Web is so huge and diverse that it arguably reflects the real world. For this reason, searching the Web is a promising approach to find things in the real world. We present NEXAS, an extension to Web search engines that attempts to find real-world entities relevant to a topic. Its basic idea is to extract proper names from the Web pages retrieved for the topic. A main advantage of this approach is that users can query any topic and learn about relevant real-world entities without dedicated databases for the topic. In particular, we focus on an application for finding authoritative people from the Web. This application is practically important because once personal names are obtained; they can lead users from the Web to managed information stored in digital libraries. To explore effective ways of finding people, we first examine the distribution of Japanese personal names by analyzing about 50 million Japanese Web pages. We observe that personal names appear frequently on the Web, but the distribution is highly influenced by automatically generated texts. To remedy the bias and find widely acknowledged people accurately, we utilize the number of Web servers containing a name instead of the number of Web pages. We show its effectiveness by an experiment covering a wide range of topics. Finally, we demonstrate several examples and suggest possible applications.
Keywords :
Internet; data mining; digital libraries; file servers; information filters; information storage; natural languages; query processing; search engines; text analysis; Japanese Web pages; Japanese personal names; NEXAS Web search engine extension; Web authoritative people searching; Web mining; Web servers; dedicated databases; digital libraries; information storage management; proper name extraction; question answering; real-world entities; text analysis; Books; Content based retrieval; Information retrieval; Motion pictures; Permission; Search engines; Software libraries; Telegraphy; Telephony; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Libraries, 2004. Proceedings of the 2004 Joint ACM/IEEE Conference on
Print_ISBN :
1-58113-832-6
Type :
conf
DOI :
10.1109/JCDL.2004.1336140
Filename :
1336140
Link To Document :
بازگشت