• DocumentCode
    177869
  • Title

    Regularizing Topic Discovery in EMRs with Side Information by Using Hierarchical Bayesian Models

  • Author

    Cheng Li ; Rana, S. ; Dinh Phung ; Venkatesh, S.

  • Author_Institution
    Center for Pattern Recognition & Data Analytics, Deakin Univ., Melbourne, VIC, Australia
  • fYear
    2014
  • fDate
    24-28 Aug. 2014
  • Firstpage
    1307
  • Lastpage
    1312
  • Abstract
    We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.
  • Keywords
    Bayes methods; Markov processes; Monte Carlo methods; diseases; document handling; electronic health records; tree data structures; EMRs dataset; HDP; MCMC technique; PolyVascular disease; diagnosis codes; document corpus; electronic medical records; hierarchical Bayesian models; hierarchical Dirichlet process; inference method; real world medical dataset; semantic tree structure; semantically-coherent disease topic discovery; side information; topic discovery regularization; wd-dCRF; word-distance-dependent Chinese restaurant franchise; word-to-word distances; Bayes methods; Correlation; Diseases; Hospitals; Indexes; Medical diagnostic imaging; Predictive models; medical application; readmission; side information; topic analysis; tree structure; words;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2014 22nd International Conference on
  • Conference_Location
    Stockholm
  • ISSN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2014.234
  • Filename
    6976944