Title :
Semi-supervised Learning of Alternatively Spliced Exons Using Co-training
Author :
Tangirala, Karthik ; Caragea, Doina
Author_Institution :
CIS Dept., Kansas State Univ., Manhattan, KS, USA
Abstract :
Alternative splicing is a phenomenon that gives rise to multiple mRNA transcripts from a single gene. It is believed that a large number of genes undergoes alternative splicing. Predicting alternative splicing events is a problem of great interest, as it can help the understanding of transcript diversity. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to learn accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi- supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider two sets of features for constructing views for the problem of predicting alternatively spliced exons: exonic splicing enhancers and intronic regulatory sequences. We use the Naive Bayes Multinomial algorithm as a base classifier in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone.
Keywords :
Bayes methods; bioinformatics; genetics; learning (artificial intelligence); pattern classification; polynomials; alternative splicing event prediction; alternatively spliced exons prediction; classifier; co-training algorithm; data labeling; exonic splicing enhancer; genes; genomic data; intronic regulatory sequence; mRNA transcripts; naive Bayes multinomial algorithm; semisupervised learning; sequencing technology; supervised machine learning approach; Bioinformatics; Genomics; Kernel; Machine learning; Prediction algorithms; Splicing; Training; alternative splicing; alternatively spliced and constitutive exons; co-training; semi-supervised learning;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4577-1799-4
DOI :
10.1109/BIBM.2011.87