DocumentCode :
473325
Title :
RAD: A Scalable Framework for Annotator Development
Author :
Khaitan, Sanjeet ; Ramakrishnan, Ganesh ; Joshi, Sachindra ; Chalamalla, Anup
Author_Institution :
India Subsidiary, InfoSpace Inc., Bangalore
fYear :
2008
fDate :
7-12 April 2008
Firstpage :
1624
Lastpage :
1627
Abstract :
Developments in semantic search technology have motivated the need for efficient and scalable entity annotation techniques. We demonstrate RAD: a tool for Rapid Annotator Development on a document collection. RAD builds on a recent approach (Ramakrishnan et al., 2006) that translates entity annotation rules into equivalent operations on the inverted index of the collection, to directly generate an annotation index (which can be used in search applications). To make the framework scalable, we use an industrial strength indexer, Lucene (http://lucene.apache.org) and introduce some modifications to its API. The index also serves as a suitable representation for making quick comparisons with an indexed ground truth of annotations on the same collection to evaluate precision and recall of the annotations. RAD achieves at least an order of magnitude speedup over the standard approach of annotating a document-at-a-time as adopted by GATE (Cunnignham et al., 2002). The speedup factor increases with increase in the size of the collection, making RAD scalable. We cache intermediate results from the index operations, enabling quick update of the annotation index as well as speedy evaluation when rules are modified. This makes RAD suitable for rapid and interactive development of annotators.
Keywords :
document handling; indexing; semantic Web; Lucene; annotation index; document collection; document-at-a-time; entity annotation; inverted index; magnitude speedup; rapid annotator development; semantic search technology; Computational complexity; Costs; Dictionaries; Gold; Large-scale systems; Measurement standards; Performance evaluation; Speech; User interfaces; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4244-1836-7
Electronic_ISBN :
978-1-4244-1837-4
Type :
conf
DOI :
10.1109/ICDE.2008.4497637
Filename :
4497637
Link To Document :
بازگشت