Author_Institution :
Cheriton Sch. of Comput. Sci., Univ. of Waterloo, Waterloo, ON, Canada
Abstract :
In kinship inference, genealogical relationships among organisms, typically in naturally-occurring populations, based on genetic marker information are identified. This task is crucial to conservation of endangered species and to understand the diversity of populations. Some of the simplest problems in this domain are sib group and half-sibgroup discover. Natural objectives in this domain are statistical ones (such as maximum likelihood) and combinatorial one (such as parsimony). Unfortunately, even with error-free data, the simplest combinatorial objective, minimizing the number of matings, is NP-hard to approximate; the statistical objectives are even more challenging. Here, a simple combinatorial approach for the problem is shown. By enumerating triplets of population members that could be siblings and that could not be siblings, putative sibgroups are greedily constructed, merging them until no further mergings can occur. The simple algorithm performs comparably to or better than integer programming methods for the problem, in a tiny fraction of the runtime. Moreover, with high probability, these methods find the correct sibgroups, under a straightforward and standard probabilistic model of inheritance and mating. Hence, the NP-hardness of the original problem is ameliorated in "typical" instances of the problem. This phenomenon is common to a large variety of bioinformatics problems, so a discussion of how to respond to this observation is presented.
Keywords :
bioinformatics; cellular biophysics; combinatorial mathematics; computational complexity; genetics; inference mechanisms; integer programming; maximum likelihood estimation; molecular biophysics; probability; NP-hardness; bioinformatics; combinatorial methods; endangered species; genealogical relationships; genetic marker information; half-sibgroup; inheritance; integer programming; kinship discovery; kinship inference; mating; maximum likelihood method; naturally-occurring populations; parsimony; population diversity; probabilistic model; sib group; statistical methods; Bioinformatics; Conferences; Genetics; Inference algorithms; Linear programming; Merging; Probabilistic logic; Chernoff bounds; conservation biology; population genetics; probabilistic analysis of algorithms;