• DocumentCode
    268647
  • Title

    Extending Association Rule Summarization Techniques to Assess Risk of Diabetes Mellitus

  • Author

    Simon, György J. ; Caraballo, Pedro J. ; Therneau, Terry M. ; Cha, Steven S. ; Castro, M. Regina ; Li, Peter W.

  • Author_Institution
    Inst. for Health Inf., Univ. of Minnesota, Minneapolis, MN, USA
  • Volume
    27
  • Issue
    1
  • fYear
    2015
  • fDate
    Jan. 2015
  • Firstpage
    130
  • Lastpage
    141
  • Abstract
    Early detection of patients with elevated risk of developing diabetes mellitus is critical to the improved prevention and overall clinical management of these patients. We aim to apply association rule mining to electronic medical records (EMR) to discover sets of risk factors and their corresponding subpopulations that represent patients at particularly high risk of developing diabetes. Given the high dimensionality of EMRs, association rule mining generates a very large set of rules which we need to summarize for easy clinical use. We reviewed four association rule set summarization techniques and conducted a comparative evaluation to provide guidance regarding their applicability, strengths and weaknesses. We proposed extensions to incorporate risk of diabetes into the process of finding an optimal summary. We evaluated these modified techniques on a real-world prediabetic patient cohort. We found that all four methods produced summaries that described subpopulations at high risk of diabetes with each method having its clear strength. For our purpose, our extension to the Buttom-Up Summarization (BUS) algorithm produced the most suitable summary. The subpopulations identified by this summary covered most high-risk patients, had low overlap and were at very high risk of diabetes.
  • Keywords
    data mining; electronic health records; patient diagnosis; risk management; BUS algorithm; EMR; association rule mining; association rule summarization techniques; buttom-up summarization; diabetes mellitus risk assessment; electronic medical records; high-risk patients; prediabetic patient cohort; Clustering; Data mining; Database Applications; Database Management; Information Technology and Systems; Mathematics of Computing; Probability and Statistics; Statistical computing; Survival analysis; and association rules; association rule summarization; association rules; classification; survival analysis;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2013.76
  • Filename
    6514877