DocumentCode :
1791716
Title :
TIDE: Inter-chromosomal translocation and insertion detection using embeddings
Author :
Vetro, Rosarme ; Farhoodi, Roshanak ; Kotla, Rohith ; Haspel, Nurit ; Weisman, David ; Rosen, Jacob ; Simovici, Dan
Author_Institution :
Dept. of Comput. Sci., Univ. of Massachusetts Boston, Boston, MA, USA
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
64
Lastpage :
70
Abstract :
Structural variations (SVs) are deletions, duplications and rearrangements of medium to large segments (>100 base pairs (bp)) of the genome. Such genomic mutations are often described as being the primary cause of many diseases, including cancer. Breakpoint detection using next-generation sequencing (NGS) platforms still remains an open problem since computational methods to detect SVs face the big challenge of accurately predicting the precise location of breakpoints, which are typically spanned by a very small number of reads among millions that are generated during the sequencing process. In this work, we propose a method called TIDE to identify reads from paired-end sequencing data containing inter-chromosomal translocation or insertion breakpoints, which are specific types of SVs involving different chromosomes. To achieve this, we use discordant read pairs to narrow the search space and split prospective breakpoint-spanning reads into windows that are subsequently represented by a sequence of k-mers indexes, which we call fingerprints. We then apply a distance-preserving embedding algorithm to solve the approximate nearest neighbor problem of pairing the most similar fingerprints originated from the sample and reference genome. Experimental results show the efficacy of the method to find reads containing breakpoints characterizing the PAX8-PPARγ rearrangement found in thyroid cancer samples. We also compare our results with the ones provided by two recently published algorithms for detecting structural variation in clinical data.
Keywords :
cancer; cellular biophysics; genomics; medical computing; NGS platforms; PAX8-PPARγ rearrangement; SV; TIDE; approximate nearest neighbor problem; breakpoint detection; breakpoint-spanning reads; chromosomes; discordant read pairs; diseases; distance-preserving embedding algorithm; fingerprints; genome; genomic mutations; insertion breakpoints; insertion detection; inter-chromosomal translocation; k-mers indexes; next-generation sequencing; paired-end sequencing data; reads identification; sequencing process; structural variations; thyroid cancer; Bioinformatics; Biological cells; Cancer; Genomics; Indexes; Sequential analysis; Tides; Insertion; Locality sensitive hashing (LSH); Next-generation sequencing (NGS); Structural variations (SVs); Translocation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004395
Filename :
7004395
Link To Document :
بازگشت