• DocumentCode
    2057998
  • Title

    Generating Semantics for the Life Sciences via Text Analytics

  • Author

    Buyko, Ekaterina ; Hahn, Udo

  • Author_Institution
    Jena Univ. Language & Inf. Eng. (JULIE) Lab., Friedrich-Schiller-Univ. Jena, Jena, Germany
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    193
  • Lastpage
    196
  • Abstract
    The life sciences have a strong need for carefully curated, semantically rich fact repositories. Knowledge harvesting from unstructured textual sources is currently performed by highly skilled curators who manually feed semantics into such databases as a result of deep understanding of the documents chosen to populate such repositories. As this is a slow and costly process, we here advocate an automatic approach to the generation of database contents which is based on JREX, a high performance relation extraction system. As a real-life example, we target REGULONDB, the world´s largest manually curated reference database for the transcriptional regulation network of E. coli. We investigate in our study the performance of automatic knowledge capture from various literature sources, such as PUBMED abstracts and associated full text articles. Our results show that we can, indeed, automatically re-create a considerable portion of the REGULONDB database by processing the relevant literature sources. Hence, this approach might help curators widen the knowledge acquisition bottleneck in this field.
  • Keywords
    biology computing; database management systems; knowledge acquisition; text analysis; JREX; REGULONDB database; automatic knowledge capture; database content generation; knowledge acquisition bottleneck; knowledge harvesting; life sciences; manual curated reference database; relation extraction system; semantic generation; text analytics; unstructured textual sources; Abstracts; Databases; Gene expression; Radio frequency; Semantics; Syntactics; Biomedical Text Mining; Event Extraction; Information Extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing (ICSC), 2011 Fifth IEEE International Conference on
  • Conference_Location
    Palo Alto, CA
  • Print_ISBN
    978-1-4577-1648-5
  • Electronic_ISBN
    978-0-7695-4492-2
  • Type

    conf

  • DOI
    10.1109/ICSC.2011.75
  • Filename
    6061353