• DocumentCode
    3388848
  • Title

    Directed-Information Based Feature-Selection for Tissue-Specific Sequences

  • Author

    Rao, Arvind ; Hero, Alfred O., III ; States, David J. ; Engel, James Douglas

  • Author_Institution
    Departments of EECS, Bioinformatics and Cell and Developmental Biology, University of Michigan, Ann Arbor
  • fYear
    2007
  • fDate
    26-29 Aug. 2007
  • Firstpage
    210
  • Lastpage
    214
  • Abstract
    Motif discovery for the identification of functional regulatory elements underlying gene expression is a challenging problem. Sequence inspection often provides valuable clues to discovery of novel motifs (including transcription factor sites) with uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for gene expression in a certain cell type. This has important implications in understanding fundamental biological processes such as development and disease progression. In this work we present an approach to the identification of motifs (not necessarily transcription factors) and examine its application to several questions in current bioinformatics research. These motifs are seen to discriminate tissue-specific genomic regions from those that are not tissue-specific. We propose the use of directed information for such classification constrained feature selection, and then, use the selected features with a support vector machine (SVM) classifier to characterize the tissue-specificity of any sequence of interest. This analysis yields several novel interesting motifs that merit further experimental characterization. The last part of this paper presents a framework for exploring the relationship between such discriminatory transcription factor motifs, and the corresponding tissue-specificity, using both sequence and expression modalities.
  • Keywords
    Bioinformatics; Cells (biology); Databases; Gene expression; Genomics; Hydrogen; Proteins; Sequences; Support vector machine classification; Support vector machines; Directed Information; comparative genomics; tissue-specific genes; transcriptional regulation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Statistical Signal Processing, 2007. SSP '07. IEEE/SP 14th Workshop on
  • Conference_Location
    Madison, WI, USA
  • Print_ISBN
    978-1-4244-1198-6
  • Electronic_ISBN
    978-1-4244-1198-6
  • Type

    conf

  • DOI
    10.1109/SSP.2007.4301249
  • Filename
    4301249