Title :
Data Mining in MEDLINE for Disease-Disease Associations Via Second Order Co-Occurrence
Author :
Modest Von Korff;Bernard Deffarges;Thomas Sander
Author_Institution :
Res. Inf. Manage., Actelion Pharmaceuticals Ltd., Allschwil, Switzerland
Abstract :
DDMiner, a new method for mining disease-disease associations in MEDLINE, is presented together with its first results. DDMiner searches for co-occurrences of gene names and disease terms, and finds relationships between diseases by word vector-similarity calculations. All records in PubMed were labeled with around 40,000 gene and protein names, and around 4,000 disease terms. Each disease term was described by a word vector from which the length equals the number of gene names. Each field in the vector represented a gene or a protein. The value in the field was derived from the number of publications in which this gene occurred together with the disease term. Disease-disease associations were calculated by vector-similarity calculation. Five diseases were examined together with their closest neighbor diseases to show the validity of our approach. All five examples showed only disease-disease associations that could be validated by medical literature. These results show that mining for disease-disease associations by second order co-occurrence is a powerful tool for medical science.
Keywords :
"Proteins","Alzheimer´s disease","Indexing","Filtering"
Conference_Titel :
Computational Intelligence, 2015 IEEE Symposium Series on
Print_ISBN :
978-1-4799-7560-0
DOI :
10.1109/SSCI.2015.54