DocumentCode :
2080669
Title :
Probabilistic declarative information extraction
Author :
Wang, Daisy Zhe ; Michelakis, Eirinaios ; Franklin, Michael J. ; Garofalakis, Minos ; Hellerstein, Joseph M.
Author_Institution :
EECS, Univ. of California, Berkeley, CA, USA
fYear :
2010
fDate :
1-6 March 2010
Firstpage :
173
Lastpage :
176
Abstract :
Unstructured text represents a large fraction of the world´s data. It often contains snippets of structured information (e.g., people´s names and zip codes). Information Extraction (IE) techniques identify such structured information in text. In recent years, database research has pursued IE on two fronts: declarative languages and systems for managing IE tasks, and probabilistic databases for querying the output of IE. In this paper, we make the first step to merge these two directions, without loss of statistical robustness, by implementing a state-of-the-art statistical IE model - Conditional Random Fields (CRF) - in the setting of a Probabilistic Database that treats statistical models as first-class data objects. We show that the Viterbi algorithm for CRF inference can be specified declaratively in recursive SQL. We also show the performance benefits relative to a standalone open-source Viterbi implementation. This work opens up the optimization opportunities for queries involving both inference and relational operators over IE models.
Keywords :
SQL; inference mechanisms; maximum likelihood estimation; probability; relational databases; CRF inference; Viterbi algorithm; conditional random fields; database research; declarative information extraction; declarative languages; inference operators; open-source Viterbi implementation; probabilistic databases; recursive SQL; relational operators; unstructured text; Data mining; Database languages; Database systems; Learning systems; Merging; Open source software; Relational databases; Robustness; Uncertainty; Viterbi algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
Type :
conf
DOI :
10.1109/ICDE.2010.5447844
Filename :
5447844
Link To Document :
بازگشت