مرکز منطقه ای اطلاع رساني علوم و فناوري - Bio-M: Data mining on HCV genotype 1 core sequences

DocumentCode :

3108665

Title :

Bio-M: Data mining on HCV genotype 1 core sequences

Author :

Rakshmy, C.S. ; Abdul Nazeer, K.A. ; Chandra, S. S. Vinod

Author_Institution :

Dept. of Comput. Sci., Nat. Inst. of Technol., Calicut, India

fYear :

2012

fDate :

18-20 July 2012

Firstpage :

Lastpage :

Abstract :

Hepatitis C Virus (HCV) has become a major risk factor for the development of Hepatocellular Carcinoma (HCC). A framework has been developed to identify genomic markers associated with HCC of HCV sequences, which comprises of clustering, feature selection and classification. A new method for feature extraction for genomic sequences rooted in Hash tables has been proposed. It requires less memory compared to Generalized Suffix Tree based methods. Biomarkers are selected as features and Random Forest (RF) Classifier is learned by means of these biomarkers. RF is used to classify HCV sequences with and without HCC. Using the HCV sequence data available from the European HCV Database (euHCVdb) and Los Alamos National Laboratory, we show that performance of RF is comparable with SVM classifier.

Keywords :

biology computing; data mining; diseases; feature extraction; genomics; medical computing; microorganisms; pattern classification; pattern clustering; support vector machines; tree data structures; trees (mathematics); Bio-M; European HCV Database; HCV genotype 1 core sequences; Los Alamos National Laboratory; SVM classifier; data mining; euHCVdb; feature extraction; generalized suffix tree based methods; genomic marker identification; genomic sequences; hash tables; hepatitis C virus; hepatocellular carcinoma; random forest classifier; Accuracy; Bioinformatics; Feature extraction; Genomics; Radio frequency; Vegetation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Science & Engineering (ICDSE), 2012 International Conference on

Conference_Location :

Cochin, Kerala

Print_ISBN :

978-1-4673-2148-8

Type :

conf

DOI :

10.1109/ICDSE.2012.6282307

Filename :

6282307

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3108665