• DocumentCode
    18495
  • Title

    Perfect Phylogeny Problems with Missing Values

  • Author

    Kirkpatrick, Barbara ; Stevens, Kristian

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Miami, Coral Gables, FL, USA
  • Volume
    11
  • Issue
    5
  • fYear
    2014
  • fDate
    Sept.-Oct. 1 2014
  • Firstpage
    928
  • Lastpage
    941
  • Abstract
    The perfect phylogeny problem is of central importance to both evolutionary biology and population genetics. Missing values are a common occurrence in both sequence and genotype data, but they make the problem of finding a perfect phylogeny NPhard even for binary characters. We introduce new and efficient perfect phylogeny algorithms for broad classes of binary and multistate data with missing values. Specifically, we address binary missing data consistent with the rich data hypothesis (RDH) introduced by Halperin and Karp and give an efficient algorithm for enumerating phylogenies. This algorithm is useful for computing the probability of data with missing values under the coalescent model. In addition, we use the partition intersection (PI) graph and chordal graph theory to generalize the RDH to multi-state characters with missing values. For a bounded number of states, we provide a fixed parameter tractable algorithm for the perfect phylogeny problem with missing data. Utilizing the PI graph, we are able to show that under multiple biologically motivated models for character data, our generalized RDH holds with high probability, and we evaluate our results with extensive empirical analysis.
  • Keywords
    evolution (biological); genetics; graph theory; binary characters; binary missing data; binary-state data; chordal graph theory; evolutionary biology; extensive empirical analysis; fixed parameter tractable algorithm; genotype data; multiple biologically motivated models; multistate data; partition intersection graph; perfect phylogeny algorithms; perfect phylogeny problems; population genetics; rich data hypothesis; sequence; Bioinformatics; Biological system modeling; Computational biology; Data models; Genomics; Phylogeny; Perfect phylogeny; missing data; partition intersection graph; rich data hypothesis;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2316005
  • Filename
    6819844