Title :
Information extraction from nanotoxicity related publications
Author :
Lemin Xiao ; Kaizhi Tang ; Xiong Liu ; Hui Yang ; Zheng Chen ; Xu, Ruimin
Author_Institution :
Intell. Autom. Inc., Rockville, MD, USA
Abstract :
High-quality experimental data are important when developing predictive models for studying nanomaterial environmental impact (NEI). Given that raw data from experimental laboratories and manufacturing workplaces are usually proprietary and small-scaled, extracting information from publications is an attractive alternative for collecting data. We developed an information extraction system that can extract useful information from full-text nanotoxicity related publications. This information extraction system consists of five components: raw data transformation into machine readable format, data preprocessing, ontology-based named entity recognition, rule-based numerical attribute extraction from both tables and unstructured text, and relation extraction among entities and attributes. The information extraction system is applied on a dataset made of 94 publications, and results in an acceptable accuracy. By storing extracted data into a table according to relations among the data, a dataset that can be used to predict nanomaterial environmental impact is obtained. Such a system is unique in current nanomaterial community, and can help nanomaterial scientists and practitioners quickly locate useful information they need without spending lots of time reading articles.
Keywords :
data mining; medical computing; nanomedicine; numerical analysis; toxicology; data preprocessing; full-text nanotoxicity; information extraction system; machine readable format; nanomaterial community; nanomaterial environmental impact; nanotoxicity related publications; ontology-based named entity recognition; predictive models; raw data transformation; rule-based numerical attribute extraction; Data mining; Information retrieval; Nanoparticles; Ontologies; Pattern matching; Shape; XML; Nanoinformatics; data mining; information extraction; named entity recognition; nanotoxicity; relation extraction;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location :
Shanghai
DOI :
10.1109/BIBM.2013.6732723