DocumentCode
2079616
Title
GenerIE: Information extraction using database queries
Author
Tari, Luis ; Phan Huy Tu ; Hakenberg, Jörg ; Chen, Yi ; Son, Tran Cao ; Gonzalez, Graciela ; Baral, Chitta
Author_Institution
Dept. of Comput. Sci. & Eng., Arizona State Univ., Tempe, AZ, USA
fYear
2010
fDate
1-6 March 2010
Firstpage
1121
Lastpage
1124
Abstract
Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be re-applied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this demonstration proposal, we describe a novel paradigm for information extraction: we store the parse trees output by text processing in a database, and then express extraction needs using queries, which can be evaluated and optimized by databases. Compared with the existing approaches, database queries for information extraction enable generic extraction and minimize reprocessing. However, such an approach also poses a lot of technical challenges, such as language design, optimization and automatic query generation. We will present the opportunities and challenges that we met when building GenerIE, a system that implements this paradigm.
Keywords
query processing; text analysis; tree data structures; GenerlE; automatic query generation; database queries; generic extraction; information extraction; information extraction systems; text processing; Biomedical engineering; Biomedical informatics; Computer science; Data engineering; Data mining; Database languages; Pipelines; Proposals; Tagging; Text processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location
Long Beach, CA
Print_ISBN
978-1-4244-5445-7
Electronic_ISBN
978-1-4244-5444-0
Type
conf
DOI
10.1109/ICDE.2010.5447773
Filename
5447773
Link To Document