• DocumentCode
    2545668
  • Title

    Ontology-Based Temporal Relation Modeling with MapReduce Latent Dirichlet Allocations for Big EHR Data

  • Author

    Dingcheng Li ; Cui Tao ; Hongfang Liu ; Chute, C.

  • Author_Institution
    Biomed. Stat. & Inf., Mayo Clinic, Rochester, NY, USA
  • fYear
    2012
  • fDate
    1-3 Nov. 2012
  • Firstpage
    708
  • Lastpage
    715
  • Abstract
    In this paper, we propose a model called Temporal & Co reference Topic Modeling (TCTM) to do automatic annotation with respect to the Time Event Ontology (TEO) for the big-size Electronic Health Record (EHR). TCTM, based on Latent Dirichlet Allocations (LDA) and integrated into MapReduce framework, inherently addresses the twin problem of data sparseness and high dimensionality. As a non-parametric Bayesian model, it can flexibly add new attributes or features. Side information associated with corpora, such as section header, timestamp, sentence distance, event distance or disease category in clinical notes makes latent topics more interpretable and more biased toward co referring events. Furthermore, TCTM integrates Hidden Markov Model LDA (HMM-LDA) to obtain the power of both sequential modeling and exchangeability. A MapReduce based variational method is employed to do parameter estimation and inferences, thus enabling TCTM to overcome the bottleneck brought by big data.
  • Keywords
    hidden Markov models; inference mechanisms; medical information systems; ontologies (artificial intelligence); parallel processing; MapReduce framework; MapReduce latent Dirichlet allocation; big EHR data; clinical note; data dimensionality; data sparseness; disease category information; electronic health record; event distance information; hidden Markov model LDA; inference; nonparametric Bayesian model; ontology-based temporal relation modeling; parameter estimation; section header information; sentence distance information; sequential exchangeability; sequential modeling; temporal-and-company reference topic modeling; time event ontology; timestamp information; Computational modeling; Data handling; Data storage systems; Hidden Markov models; Information management; Tin; Wireless sensor networks; MapReduce; event coreference resolution; latent Dirichlet allocations; temporal relation annotation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud and Green Computing (CGC), 2012 Second International Conference on
  • Conference_Location
    Xiangtan
  • Print_ISBN
    978-1-4673-3027-5
  • Type

    conf

  • DOI
    10.1109/CGC.2012.112
  • Filename
    6382894