Title :
Fine-Grained Protein Mutation Extraction from Biological Literature
Author :
Wang, Rui ; Siu, Shirley W I ; Bockmann, R.A.
Author_Institution :
Comput. Linguistics, Saarland Univ., Saarbrucken
Abstract :
Automatic extraction of experimental data on protein mutants from large volumes of biological texts can help building corresponding databases to facilitate research in relevant studies. Mutation extraction cannot be fully solved by the surface pattern matching but requires linguistic analysis of the plain text. Based on the existing regular expression method, we improved the mutation extraction by applying the dependency parsing technique from natural language processing (NLP). Furthermore, we extract valuable data about experimental measurements from the texts and relate them to the identified mutations. Our method was evaluated on MedLine abstracts. The results show great potential for future exploration.
Keywords :
bioinformatics; data mining; grammars; natural language processing; pattern matching; proteins; text analysis; MedLine abstract; automatic data extraction; biological text mining; dependency parsing technique; fine-grained protein mutation extraction; linguistic analysis; natural language processing; regular expression method; surface pattern matching; Biology computing; Biomembranes; Computational biology; Computational linguistics; Data mining; Databases; Genetic mutations; Natural language processing; Protein engineering; Stability; bioinformatics; mutation extraction; natural language processing; text mining;
Conference_Titel :
Electronic Computer Technology, 2009 International Conference on
Conference_Location :
Macau
Print_ISBN :
978-0-7695-3559-3
DOI :
10.1109/ICECT.2009.10