Title of article :
A case for automated large-scale semantic annotation
Author/Authors :
Dill، نويسنده , , Stephen and Eiron، نويسنده , , Nadav and Gibson، نويسنده , , David and Gruhl، نويسنده , , Daniel and Guha، نويسنده , , R. and Jhingran، نويسنده , , Anant and Kanungo، نويسنده , , Tapas and McCurley، نويسنده , , Kevin S. and Rajagopalan، نويسنده , , Sridhar and Tomkins، نويسنده , , Andrew and Tomlin، نويسنده , , John A. and Zien، نويسنده , , Jason Y.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2003
Pages :
18
From page :
115
To page :
132
Abstract :
This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date. cribe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large-scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.
Keywords :
information retrieval , Large text datasets , DATA MINING , Text analytics , Automated semantic tagging
Journal title :
Web Semantics Science,Services and Agents on the World Wide Web
Serial Year :
2003
Journal title :
Web Semantics Science,Services and Agents on the World Wide Web
Record number :
1447061
Link To Document :
بازگشت