Title :
A cost-effective, case-control study on the association between breast cancer and pregnancy through web mining
Author :
Hong-Jun Yoon ; Songhua Xu ; Tourassi, Georgia
Author_Institution :
Comput. Sci. & Eng. Div., Oak Ridge Nat. Lab., Oak Ridge, TN, USA
Abstract :
We report a case-control epidemiological study through mining people´s stories from the Internet. Our overarching goal is to test whether mining openly available, personal stories from the Internet is a cost-effective way for reliable epidemiological discoveries. As a case study, we focus on the association between breast cancer risk and pregnancy, which is clearly established through controlled clinical survey studies. Specifically, we automatically collected and mined 30,000 online obituary articles via a series of tailored cyber-informatics tools we developed. Replicating a case-control study design, we analyzed the collected data confirming with statistical significance that parity is associated with lower breast cancer risk. Our web mining study demonstrates promising preliminary evidence that online content mining can be a cost-effective and reliable way for epidemiological knowledge discovery.
Keywords :
Internet; cancer; data mining; epidemics; medical computing; Internet; Web mining; breast cancer risk; case-control epidemiological study; case-control study design; epidemiological discoveries; epidemiological knowledge discovery; online content mining; online obituary articles mining; people stories mining; pregnancy; tailored cyber-informatics tools; Breast cancer; History; Internet; Obituaries; Pregnancy; Web mining; breast cancer; case-control study; epidemiology; obituary; web mining;
Conference_Titel :
Biomedical Sciences and Engineering Conference (BSEC), 2013
Conference_Location :
Oak Ridge, TN
Print_ISBN :
978-1-4799-2118-8
DOI :
10.1109/BSEC.2013.6618493