Title :
Statistical Approaches to Identifying Androgen Response Elements
Author :
Li, Li ; Heber, Steffen ; Zhang, Qiang ; Andersen, Melvin E.
Abstract :
DNA-binding transcription factors play an integral role in regulating gene expression. Transcription factor binding sites (TFBS) in the gene promoter regions can be predicted by using computational methods, such as Support Vector Machine (SVM), Hidden Markov Model (HMM), and Random Forest (RF), all of which summarize sequence patterns of experimentally determined TFBSs. Androgen receptor (AR), a ligand-dependent transcription factor, plays an important role in male reproductive functions by regulating gene transcription through directly binding to androgen response elements (ARE) in target gene promoters. The aim of this study is to use data mining tools to identify and characterize AREs based on sequence information. Three statistical methods were explored to strengthen the prediction of putative AREs in the human genome. Cross-validation results indicated that all of the three models provided good sensitivity and specificity in identifying AREs, with an accuracy of at least 80%. It is the first time that HMM, SVM and RF have all been applied to constructing ARE prediction models.
Keywords :
Bioinformatics; Data mining; Gene expression; Genomics; Hidden Markov models; Humans; Radio frequency; Radiofrequency identification; Statistical analysis; Support vector machines;
Conference_Titel :
Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
Print_ISBN :
978-0-7695-3019-2
Electronic_ISBN :
978-0-7695-3033-8
DOI :
10.1109/ICDMW.2007.81