DocumentCode :
1784845
Title :
Developing a linguistically annotated corpus of Chinese electronic medical record
Author :
Zhipeng Jiang ; Fangfang Zhao ; Yi Guan
Author_Institution :
Dept. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
fYear :
2014
fDate :
2-5 Nov. 2014
Firstpage :
307
Lastpage :
310
Abstract :
Electronic Medical Record (EMR) is the material base of smart healthcare, its automatic analysis is dependent on nature language processing (NLP) technologies. Syntactic analysis, as the basic technology of NLP, can be used to convert the free text of EMR to structured text. However, research on syntactic analysis, even Chinese word segmentation and part-of-speech (POS) tagging on Chinese electronic Medical record (CEMR), is currently at a blank stage because of the lack of annotated corpus on CEMR. To resolve this problem, we propose the annotated scheme from Chinese word segmentation to syntactic analysis, and built the first syntactically annotated corpus of CEMR. Through analyzing the annotated CEMR, we find it has stronger grammatical regularity and particular statistical distribution. These finds are taken advantage to improve the Stanford parser and develop a state-of-the-art Chinese word segmentation and POS tagging system for CEMR. The evaluation results show a substantial benefit to statistical machine learning models from the annotated CEMR.
Keywords :
electronic health records; grammars; health care; learning (artificial intelligence); natural language processing; statistical distributions; text analysis; CEMR; Chinese electronic medical record; Chinese word segmentation; NLP technologies; POS tagging system; Stanford parser; free text conversion; grammatical regularity; linguistically annotated corpus development; nature language processing technologies; part-of-speech tagging; smart healthcare; statistical distribution; statistical machine learning models; Electronic medical records; Guidelines; Informatics; Syntactics; Tagging; Training; CEMR; Chinese word segmentation; part-of-speech tagging; syntactic analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
Type :
conf
DOI :
10.1109/BIBM.2014.6999174
Filename :
6999174
Link To Document :
بازگشت