DocumentCode
1080299
Title
Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics
Author
Song, Y.S. ; Lyngso, R. ; Hein, J.
Author_Institution
Dept. of Comput. Sci., California Univ., Davis, CA
Volume
3
Issue
3
fYear
2006
Firstpage
239
Lastpage
251
Abstract
Given a set D of input sequences, a genealogy for D can be constructed backward in time using such evolutionary events as mutation, coalescent, and recombination. An ancestral configuration (AC) can be regarded as the multiset of all sequences present at a particular point in time in a possible genealogy for D. The complexity of computing the likelihood of observing D depends heavily on the total number of distinct ACs of D and, therefore, it is of interest to estimate that number. For D consisting of binary sequences of finite length, we consider the problem of enumerating exactly all distinct ACs. We assume that the root sequence type is known and that the mutation process is governed by the infinite-sites model. When there is no recombination, we construct a general method of obtaining closed-form formulas for the total number of ACs. The enumeration problem becomes much more complicated when recombination is involved. In that case, we devise a method of enumeration based on counting contingency tables and construct a dynamic programming algorithm for the approach. Last, we describe a method of counting the number of ACs that can appear in genealogies with less than or equal to a given number R of recombinations. Of particular interest is the case in which R is close to the minimum number of recombinations for D
Keywords
biology computing; cellular biophysics; dynamic programming; genetics; molecular biophysics; molecular configurations; physiological models; ancestral configurations; coalescent; contingency tables; dynamic programming algorithm; evolutionary events; genealogy; infinite-sites model; mutation; population genetics; recombination; sample sequences; AC generators; Binary sequences; Binary trees; Dynamic programming; Genetic mutations; Heuristic algorithms; Mathematical model; Stochastic processes; Tree graphs; Ancestral configurations; coalescent; contingency table; enumeration.; recombination; Biological Evolution; Chromosome Mapping; Evolution, Molecular; Genetic Variation; Genetics, Population; Models, Genetic; Models, Statistical; Pedigree; Phylogeny; Sample Size; Sequence Alignment; Sequence Analysis, DNA;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2006.31
Filename
1668023
Link To Document