DocumentCode
2057998
Title
Generating Semantics for the Life Sciences via Text Analytics
Author
Buyko, Ekaterina ; Hahn, Udo
Author_Institution
Jena Univ. Language & Inf. Eng. (JULIE) Lab., Friedrich-Schiller-Univ. Jena, Jena, Germany
fYear
2011
fDate
18-21 Sept. 2011
Firstpage
193
Lastpage
196
Abstract
The life sciences have a strong need for carefully curated, semantically rich fact repositories. Knowledge harvesting from unstructured textual sources is currently performed by highly skilled curators who manually feed semantics into such databases as a result of deep understanding of the documents chosen to populate such repositories. As this is a slow and costly process, we here advocate an automatic approach to the generation of database contents which is based on JREX, a high performance relation extraction system. As a real-life example, we target REGULONDB, the world´s largest manually curated reference database for the transcriptional regulation network of E. coli. We investigate in our study the performance of automatic knowledge capture from various literature sources, such as PUBMED abstracts and associated full text articles. Our results show that we can, indeed, automatically re-create a considerable portion of the REGULONDB database by processing the relevant literature sources. Hence, this approach might help curators widen the knowledge acquisition bottleneck in this field.
Keywords
biology computing; database management systems; knowledge acquisition; text analysis; JREX; REGULONDB database; automatic knowledge capture; database content generation; knowledge acquisition bottleneck; knowledge harvesting; life sciences; manual curated reference database; relation extraction system; semantic generation; text analytics; unstructured textual sources; Abstracts; Databases; Gene expression; Radio frequency; Semantics; Syntactics; Biomedical Text Mining; Event Extraction; Information Extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Semantic Computing (ICSC), 2011 Fifth IEEE International Conference on
Conference_Location
Palo Alto, CA
Print_ISBN
978-1-4577-1648-5
Electronic_ISBN
978-0-7695-4492-2
Type
conf
DOI
10.1109/ICSC.2011.75
Filename
6061353
Link To Document