• DocumentCode
    721218
  • Title

    A novel semi-supervised approach for protein sequence classification

  • Author

    Chaturvedi, Bharti ; Patil, Nagamma

  • Author_Institution
    Dept. of Inf. Technol., Nat. Inst. of Technol. Karnataka, Mangalore, India
  • fYear
    2015
  • fDate
    12-13 June 2015
  • Firstpage
    1158
  • Lastpage
    1162
  • Abstract
    Bioinformatics is an emerging research area. Classification of protein sequence dataset is the biggest challenge for researcher. This paper deals with supervised and semi-supervised classification of human protein sequence. Amino acid composition (AAC) used for feature extraction of the protein sequence. The classification techniques like Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbour (KNN), Random Forest, Decision Tree are using for classification of protein sequence dataset. Amongst these classifiers SVM reported the best result with higher accuracy. The limitation with SVM is that it works only with supervised(labeled dataset). It doesn´t work with unsupervised or semi-supervised dataset (unlabeled dataset or large amount of unlabeled dataset among small amount of labeled dataset). A novel semi-supervised support vector machine (SSVM) classifier is proposed which works with combination of labled and unlabled dataset. In results it observed that the proposed approach gives higher accuracy with semi-supervised dataset. Principal component analysis (PCA) used for feature reduction of protein sequence. The proposed semi-supervised support vector machine (SSVM) using PCA gives increased accuracy of about 5 to 10%.
  • Keywords
    Bayes methods; bioinformatics; decision trees; feature extraction; pattern classification; principal component analysis; proteins; support vector machines; AAC; KNN; PCA; SSVM classifier; amino acid composition; bioinformatics; decision tree; feature extraction; feature reduction; human protein sequence; k-nearest neighbour; naive Bayes; principal component analysis; protein sequence classification; protein sequence dataset classification; random forest; semisupervised classification; semisupervised support vector machine classifier; Accuracy; Amino acids; Feature extraction; Principal component analysis; Protein sequence; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advance Computing Conference (IACC), 2015 IEEE International
  • Conference_Location
    Banglore
  • Print_ISBN
    978-1-4799-8046-8
  • Type

    conf

  • DOI
    10.1109/IADCC.2015.7154885
  • Filename
    7154885