• DocumentCode
    1784845
  • Title

    Developing a linguistically annotated corpus of Chinese electronic medical record

  • Author

    Zhipeng Jiang ; Fangfang Zhao ; Yi Guan

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
  • fYear
    2014
  • fDate
    2-5 Nov. 2014
  • Firstpage
    307
  • Lastpage
    310
  • Abstract
    Electronic Medical Record (EMR) is the material base of smart healthcare, its automatic analysis is dependent on nature language processing (NLP) technologies. Syntactic analysis, as the basic technology of NLP, can be used to convert the free text of EMR to structured text. However, research on syntactic analysis, even Chinese word segmentation and part-of-speech (POS) tagging on Chinese electronic Medical record (CEMR), is currently at a blank stage because of the lack of annotated corpus on CEMR. To resolve this problem, we propose the annotated scheme from Chinese word segmentation to syntactic analysis, and built the first syntactically annotated corpus of CEMR. Through analyzing the annotated CEMR, we find it has stronger grammatical regularity and particular statistical distribution. These finds are taken advantage to improve the Stanford parser and develop a state-of-the-art Chinese word segmentation and POS tagging system for CEMR. The evaluation results show a substantial benefit to statistical machine learning models from the annotated CEMR.
  • Keywords
    electronic health records; grammars; health care; learning (artificial intelligence); natural language processing; statistical distributions; text analysis; CEMR; Chinese electronic medical record; Chinese word segmentation; NLP technologies; POS tagging system; Stanford parser; free text conversion; grammatical regularity; linguistically annotated corpus development; nature language processing technologies; part-of-speech tagging; smart healthcare; statistical distribution; statistical machine learning models; Electronic medical records; Guidelines; Informatics; Syntactics; Tagging; Training; CEMR; Chinese word segmentation; part-of-speech tagging; syntactic analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
  • Conference_Location
    Belfast
  • Type

    conf

  • DOI
    10.1109/BIBM.2014.6999174
  • Filename
    6999174