Title :
Using combinatory categorial grammar to extract biomedical information
Author_Institution :
Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol., Yusong-Gu, South Korea
Abstract :
Extracting information from biology databases manually can be an overwhelming task. GenBank, the US National Institutes of Health database containing all publicly available DNA sequences, has more than 14 billion bases in 13 million genetic-sequence records. Medline, a literature database available through PubMed, has over 11 million journal citations. In a May 2001 search request for "cytokine" (regulatory proteins in the immune system), PubMed returned 296556 articles. Given the quantity and complexity of biomedical literature, demands for computational tools to extract specific information are increasing. The author reviews biomedical information extraction methods and presents research done by KAIST\´s natural language processing group on a system that shows encouraging performance using combinatory categorial grammar as a natural language grammar formalism.
Keywords :
bibliographic systems; category theory; grammars; information retrieval; medical information systems; natural languages; GenBank; KAIST; Medline; PubMed; bioinformatics; biology databases; biomedical information extraction; combinatory categorial grammar; computational tools; genetic-sequence records; literature database; natural language grammar formalism; natural language processing; publicly available DNA sequences; Amino acids; Biomedical measurements; DNA; Data mining; Databases; Electric shock; Muscles; Natural language processing; Natural languages; Proteins;
Journal_Title :
Intelligent Systems, IEEE
DOI :
10.1109/5254.972092