مرکز منطقه ای اطلاع رساني علوم و فناوري - Computational Synteny Block: A framework to identify evolutionary events

Abstract :

Motivation: The identification and accurate description of large genomic rearrangements is crucial for the study of evolutionary events among species and implicitly defining breakpoints. Although there is a number of software tools available to perform this task, they usually either a) require a collection of pre-computed non-conflicting High Scoring Pairs (HSPs) and gene annotations or involve working at protein level (which excludes non-coding regions); or b) need many parameters to adjust software behavior and performance; or c) imply working with duplications, repeats and tandem repeats, which complicates the identification of rearrangements task. Although there are many programs specialized in the detection of these repetitions, they are not designed for the identification of main genomic rearrangements. Methods: The methodology we envisage starts with the detection of all HSPs by pairwise genome comparison. The second step involves solving conflicts generated by fragments that overlap in both sequences (double-overlapped fragments) to end yielding a collection of gapped fragments. In the third step, the quality measures (length, score, identities) of the gapped fragment are refined by using a modified dynamic programming approach. This collection of refined gapped fragments represents the input of a recursive process in which we identify blocks of gapped fragments that maintain co-localization, regardless of them occurring in coding or non-coding regions. The identification of interspersed repeats is an important step in the subsequent refinement of these blocks. This step allows for the separation of repetitions and the correct identification in turns of longer blocks. Finally, groups of interspersed repeats, duplications, inversions and translocations are identified. Results: The set of algorithms presented in this manuscript is able to detect and identify blocks of large rearrangements -tacking into account interspersed repeats, tandem repeats and duplications starting with the simple collection of un gapped local alignments. To the best of our knowledge, this is the first method to approach the whole process as a coherent workflow -thus outperforming current state-of-the-art software tools while it allows to classify the type of rearrangement. The results obtained are an important source of information for breakpoint refinement and featuring, as well as for the estimation of the evolutionary event frequencies to be used in inter-genome distance proposals, etc. Data sets and Supplementary Material are available at: http://bitlab-es.comlgecko-csb/.