• DocumentCode
    761590
  • Title

    Nonparametric Estimation of the Number of Unique Sequences in Biological Samples

  • Author

    Xu, Changjiang ; Xu, Luzhou ; Yu, Fahong ; Tan, Weihong ; Moroz, Leonid L. ; Li, Jian

  • Author_Institution
    Dept. of Telecommun. Eng., Nanjing Univ. of Posts & Telecommun.
  • Volume
    54
  • Issue
    10
  • fYear
    2006
  • Firstpage
    3759
  • Lastpage
    3767
  • Abstract
    Large-scale determination of uniquely expressed genes (or mRNAs) in specific cells and tissues is a challenging problem in computational and functional genomics. We consider nonparametric approaches for estimating the number of unique, nonredundant sequences in biological samples. By introducing the moments of species´ abundance in a population, we analyze the relative abundance of species in the population and present a lower bound estimator and so-called medial estimator for the number of distinct species in the population. The lower bound estimate is applicable to populations with small coefficients of variation (CV). The medial estimator works well for the populations with relatively large CV, especially gene expression data. Simulation analysis shows that the medial estimator performs better than existing methods. Finally, we apply our nonparametric approaches to estimate the number of expressed mRNAs in a normal colon epithelial tissue as well as unique clones in an amplified cDNA sample prepared from the CNS of the sea-slug Aplysia
  • Keywords
    DNA; biological tissues; genetics; sequences; statistical analysis; amplified cDNA sample; biological samples; coefficients of variation; computational genomics; functional genomics; gene expression data; lower bound estimator; mRNA; medial estimator; nonparametric estimation; normal colon epithelial tissue; sea-slug Aplysia; specie abundance; unique sequences; uniquely expressed genes; Analytical models; Bioinformatics; Biology computing; Cloning; Colon; Data analysis; Gene expression; Genomics; Large-scale systems; Performance analysis; Aplysia; expressed sequence tags; genomics; nonparametric estimation; relative abundance of species; transcriptome;
  • fLanguage
    English
  • Journal_Title
    Signal Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1053-587X
  • Type

    jour

  • DOI
    10.1109/TSP.2006.880211
  • Filename
    1703845