DocumentCode :
2960255
Title :
Averaging measurement strategies for identifying single nucleotide polymorphisms from redundant data sets
Author :
Wang, Tai-Chun ; Taheri, Javid ; Zomaya, Albert Y.
Author_Institution :
Centre for Distrib. & High Performance Comput., Univ. of Sydney, Sydney, NSW, Australia
fYear :
2011
fDate :
27-30 Dec. 2011
Firstpage :
67
Lastpage :
74
Abstract :
Single nucleotide polymorphisms (SNPs) studies have been an active topic of research in the life sciences in recent years. Because SNPs are abundant, stable and sometimes can be related to specific diseases, they have been widely selected as biomarkers for multi-purpose research. As traditional methods for identifying SNPs are time-consuming and expensive, discovering SNPs from expressed sequence tags (ESTs) has became an alternative efficient way. As most EST databases do not store quality/trace files together with EST reads, several methods, like Phard, which requires corresponding sequences quality files, will not be suitable for further research purpose. Thus, computational methods that are able to obtain reliable SNPs without the need for trace/quality information are still essential. We have developed a pipeline framework, called PFSNP, to reveal reliable SNPs from EST data sets without the association of trace/quality files. PFSNP deploys several strategies, like modified neighborhood quality standard measurement and fuzzy logic, in this framework. Also, it automatically adjusts the slide window to efficiently fit different conditions of data sets. PFSNP is demonstrated by identifying SNPs from two subgroups of Oryza sativa with two different strategies as well as zebrafish. Based on our experimental results, PFSNP can obtain higher reliable results when compared to existing methods.
Keywords :
bioinformatics; data mining; diseases; fuzzy logic; pipeline processing; Oiγza sativa; PFSNP; expressed sequence tags; fuzzy logic; life sciences; modified neighborhood quality standard measurement; multi-purpose research; pipeline framework; redundant data sets; sequences quality files; single nucleotide polymorphism identification; slide window; Data mining; Databases; Fuzzy logic; Fuzzy systems; Genomics; Pipelines; Reliability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Systems and Applications (AICCSA), 2011 9th IEEE/ACS International Conference on
Conference_Location :
Sharm El-Sheikh
ISSN :
2161-5322
Print_ISBN :
978-1-4577-0475-8
Electronic_ISBN :
2161-5322
Type :
conf
DOI :
10.1109/AICCSA.2011.6126593
Filename :
6126593
Link To Document :
بازگشت