Title :
Using The Soybean Genome Database (SoyGD) To Display and Analyze a 1 Gbp Genome Sequence
Author :
Lightfoot, David ; Zhu, Mengxia Michelle
Author_Institution :
Southern Illinois Univ., Carbondale
Abstract :
Genomes like Glycine max (soybean) that have been highly conserved following increases in ploidy (by duplication or hybridization) present challenges for bioinformatics and genome analysis. At http://soybeangenome.siu.edu the Soybean Genome Database (SoyGD) genome browser has, since 2002, integrated and served the publicly available soybean physical map, BAC fingerprint database and genetic map associated genomic data (1). Duplicated regions have been identified and catalogued with a-d suffix to marker anchor names and contig names that communicate ploidy (ctg>8000 are tetraploid, ctg>9000 are octoploid). DNA sequence data has been used to separate DNA marker anchors from homologs of DNA marker anchors in BAC pools. About 200 gene families were mapped by EST hybridization. About 23,000 minimum tiling path (MTP) BIBAC clones provided BAC end sequences (BES) to decorate the physical map and were added to the database as separate tracks. Predicted gene models were developed for about 15% of the BES. From these models candidate genes underlying disease resistance, seed yield and seed protein, oil or isoflavone content were detected and fine-mapped. In recent additions 1 Gbp of genome sequence was made available in about 1500 scaffolds by DOE. Methods for display were improved by cross-referencing the BES and WGS with Arabidopsis (3). In genome evolution analyses more than a thousand additional microsatellite marker anchors were developed for contigs, 353 on the map and about 700 still in new microsatellite markers on the genetic map with contigs and associated features. About half of the markers mapped to regions of the genome that formed gaps in earlier maps suggesting marker clustering biases. SoyGD represents the new build 5 for the physical map with 800 contigs from the 76,749 fingerprinted clones publicly available. New QTL data has been incorporated from the newly release ´Essex´ by ´Forrest´ and ´Flyer´ by ´Hartwig´ RIL populations. Gene expressio- n data has been added to the gene models represented in SoyGD. This work was supported by NSF project #9878635 and USB 2218-6218.
Keywords :
DNA; biology computing; cellular biophysics; diseases; genetics; molecular biophysics; molecular configurations; proteins; BAC end sequences; BAC fingerprint database; DNA sequence; EST hybridization; Glycine max; bioinformatics; contigs; disease resistance; gene expression; genetic map; genome analysis; genome sequence; isoflavone content; oil content; seed protein; seed yield; soybean genome database; Amino acids; Bioinformatics; Cloning; DNA; Databases; Displays; Fingerprint recognition; Genetics; Genomics; Sequences;
Conference_Titel :
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-1509-0
DOI :
10.1109/BIBE.2007.4375771