DocumentCode :
177869
Title :
Regularizing Topic Discovery in EMRs with Side Information by Using Hierarchical Bayesian Models
Author :
Cheng Li ; Rana, S. ; Dinh Phung ; Venkatesh, S.
Author_Institution :
Center for Pattern Recognition & Data Analytics, Deakin Univ., Melbourne, VIC, Australia
fYear :
2014
fDate :
24-28 Aug. 2014
Firstpage :
1307
Lastpage :
1312
Abstract :
We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.
Keywords :
Bayes methods; Markov processes; Monte Carlo methods; diseases; document handling; electronic health records; tree data structures; EMRs dataset; HDP; MCMC technique; PolyVascular disease; diagnosis codes; document corpus; electronic medical records; hierarchical Bayesian models; hierarchical Dirichlet process; inference method; real world medical dataset; semantic tree structure; semantically-coherent disease topic discovery; side information; topic discovery regularization; wd-dCRF; word-distance-dependent Chinese restaurant franchise; word-to-word distances; Bayes methods; Correlation; Diseases; Hospitals; Indexes; Medical diagnostic imaging; Predictive models; medical application; readmission; side information; topic analysis; tree structure; words;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location :
Stockholm
ISSN :
1051-4651
Type :
conf
DOI :
10.1109/ICPR.2014.234
Filename :
6976944
Link To Document :
بازگشت