Title :
Extracting Protein Interactions from Text with the Unified AkaneRE Event Extraction System
Author :
Sætre, R. ; Yoshida, Kazuhiro ; Miwa, Makoto ; Matsuzaki, Takuya ; Kano, Yoshinobu ; Tsujii, Junichi
Author_Institution :
Dept. of Inf. Sci., Univ. of Tokyo, Tokyo, Japan
Abstract :
Currently, relation extraction (RE) and event extraction (EE) are the two main streams of biological information extraction. In 2009, the majority of these RE and EE research efforts were centered around the BioCreative II.5 Protein-Protein Interaction (PPI) challenge and the “BioNLP event extraction shared task.” Although these challenges took somewhat different approaches, they share the same ultimate goal of extracting bio-knowledge from the literature. This paper compares the two challenge task definitions, and presents a unified system that was successfully applied in both these and several other PPI extraction task settings. The AkaneRE system has three parts: A core engine for RE, a pool of modules for specific solutions, and a configuration language to adapt the system to different tasks. The core engine is based on machine learning, using either Support Vector Machines or Statistical Classifiers and features extracted from given training data. The specific modules solve tasks like sentence boundary detection, tokenization, stemming, part-of-speech tagging, parsing, named entity recognition, generation of potential relations, generation of machine learning features for each relation, and finally, assignment of confidence scores and ranking of candidate relations. With these components, the AkaneRE system produces state-of-the-art results, and the system is freely available for academic purposes at http://www-tsujii.is.s.u-tokyo.ac.jp/satre/akane/.
Keywords :
bioinformatics; feature extraction; learning (artificial intelligence); natural language processing; pattern classification; support vector machines; text analysis; AkaneRE system; BioCreative II.5 protein protein interaction challenge; BioNLP event extraction shared task; Unified AkaneRE Event Extraction System; biological information extraction; entity recognition; machine learning; protein interactions extraction; relation extraction; statistical classifiers; support vector machines; text analysis; Bioinformatics; Data mining; Databases; Engines; Feature extraction; Machine learning; Proteins; Support vector machine classification; Support vector machines; Training data; Text mining; bioinformatics (genome or protein) databases.; language parsing and understanding; machine learning; Algorithms; Computational Biology; Data Mining; Databases, Genetic; Information Storage and Retrieval; Natural Language Processing; Protein Interaction Mapping;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2010.46