• DocumentCode
    1800043
  • Title

    Development of large-scale TCM corpus using hybrid named entity recognition methods for clinical phenotype detection: An initial study

  • Author

    Lizhi Feng ; Xuezhong Zhou ; Haixun Qi ; Runshun Zhang ; YingHui Wang ; Baoyan Liu

  • Author_Institution
    Sch. of Comput. & Inf. Technol. & Beijing Key Lab. of Traffic Data Anal. & Min., Beijing Jiaotong Univ., Beijing, China
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    Clinical data is one of the core data repositories in traditional Chinese medicine (TCM) because TCM is a clinically based medicine. However, most clinical data like electronic medical record in TCM is still in free text. Due to the lack of large-scale annotation corpus in TCM field, in this paper, we aim to develop an annotation system for TCM clinical text corpus. To reduce the manual labors, we implement three named entity recognition methods like supervised machine learning method, unsupervised method and structured data comparison, to assist the batch annotations of clinical records before manual checking. We developed the system using Java and have curated more than 2,000 records of chief complaint in an effective way.
  • Keywords
    Java; electronic health records; natural language processing; text analysis; unsupervised learning; Java; TCM clinical text corpus; annotation system; batch annotations; clinical data; clinical phenotype detection; clinical records; clinically based medicine; core data repositories; electronic medical record; large-scale annotation corpus; manual checking; named entity recognition methods; structured data comparison; supervised machine learning method; traditional Chinese medicine; unsupervised method; Data mining; Databases; Hidden Markov models; Manuals; Medical diagnostic imaging; Standards; Training; annotation system; clinical records; named entity recognition; traditional Chinese medicine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Big Data (CIBD), 2014 IEEE Symposium on
  • Conference_Location
    Orlando, FL
  • Type

    conf

  • DOI
    10.1109/CIBD.2014.7011532
  • Filename
    7011532