Title :
Classification and characterization of clinical finding expressions in medical literature
Author :
Okumura, Takashi ; Tateisi, Yuka ; Aramaki, Eiji
Author_Institution :
Nat. Inst. of Public Health, Japan
Abstract :
Text mining of clinical findings has been employed to extract clinical information contained in “electronic medical records” without the need for labor intensive work by medical experts. However, the automated building of disease ontology necessitates knowledge acquisition of clinical findings documented in “medical literature” that requires an independent strategy. This study performs a preliminary analysis of clinical finding expressions in medical literature to enable the automated acquisition of disease knowledge. To this end, we selected descriptions of 20 diseases in a free-text format and annotated the texts to extract expressions of clinical findings. This resulted in 1368 expressions with varying lengths and syntactic features, and 161 annotator comments. The comments suggested that certain types of expressions, which were further classified into 10 categories. Also, in-depth analyses of their syntactic and semantic characteristics were performed, resulting in the following observations. First, expressions of clinical findings have certain patterns, syntactic and semantic, which can be exploited for appropriate knowledge acquisition. Second, clinical knowledge may guide the knowledge acquisition process in a top-down manner. Third, natural language processing of medical literature requires specific considerations compared with the processing of health records, namely, i) distinction of subjects, ii) handling of generalized knowledge, and iii) processing of expressions for examination results. This preliminary survey on the expressions in medical literature provides helpful insights for future corpus design.
Keywords :
data mining; diseases; electronic health records; natural language processing; ontologies (artificial intelligence); pattern classification; text analysis; annotator comments; automated disease knowledge acquisition; clinical finding expression characterization; clinical finding expression classification; clinical information extraction; clinical knowledge; corpus design; disease ontology; electronic medical records; expression extraction; expression processing; free-text format; generalized knowledge handling; medical literature; natural language processing; semantic characteristics; syntactic characteristics; text mining; Diseases; Informatics; Knowledge acquisition; Medical diagnostic imaging; Microscopy; Semantics; Syntactics;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location :
Shanghai
DOI :
10.1109/BIBM.2013.6732552