مرکز منطقه ای اطلاع رساني علوم و فناوري - Database of repetitive elements in complete genomes and data mining using transcription factor binding sites

DocumentCode :

1216129

Title :

Database of repetitive elements in complete genomes and data mining using transcription factor binding sites

Author :

Horng, Jorng-Tzong ; Lin, F.M. ; Lin, J.H. ; Huang, H.D. ; Liu, B.J.

Author_Institution :

Dept. of Comput. Sci. & lnfoimation Eng., Nat. Central Univ., Chung-li, Taiwan

Volume :

Issue :

fYear :

2003

fDate :

6/1/2003 12:00:00 AM

Firstpage :

Lastpage :

100

Abstract :

Approximately 43% of the human genome is occupied by repetitive elements. Even more, around 51% of the rice genome is occupied by repetitive elements. The analysis presented here indicates that repetitive elements in complete genomes may have been very important in the evolutionary genomics. In this study, a database, called the Repeat Sequence Database, is first designed and implemented to store complete and comprehensive repetitive sequences. See http://rsdb.csie.ncu.edu.tw for more information. The database contains direct, inverted and palindromic repetitive sequences, and each repetitive sequence has a variable length ranging from seven to many hundred nucleotides. The repetitive sequences in the database are explored using a mathematical algorithm to mine rules on how combinations of individual binding sites are distributed among repetitive sequences in the database. Combinations of transcription factor binding sites in the repetitive sequences are obtained and then data mining techniques are applied to mine association rules from these combinations. The discovered associations are further pruned to remove insignificant associations and obtain a set of associations. The mined association rules facilitate efforts to identify gene classes regulated by similar mechanisms and accurately predict regulatory elements. Experiments are performed on several genomes including C. elegans, human chromosome 22, and yeast.

Keywords :

DNA; biology computing; data mining; C. elegans; Repeat Sequence Database; association rules; binding sites; data mining; direct repetitive sequences; evolutionary genomics; human chromosome 22; human genome; inverted repetitive sequences; palindromic repetitive sequences; repetitive elements; rice genome; yeast; Association rules; Bioinformatics; Data mining; Distributed databases; Evolution (biology); Genetics; Genomics; Humans; Sequences; Transaction databases; Algorithms; Animals; Binding Sites; Caenorhabditis elegans; Chromosome Mapping; Chromosomes, Human, Pair 22; Conserved Sequence; DNA; Database Management Systems; Databases, Genetic; Evolution, Molecular; Gene Expression Profiling; Gene Expression Regulation; Genome; Humans; Information Storage and Retrieval; Repetitive Sequences, Nucleic Acid; Sequence Alignment; Sequence Analysis, DNA; Species Specificity; Transcription Factors; Yeasts;

fLanguage :

English

Journal_Title :

Information Technology in Biomedicine, IEEE Transactions on

Publisher :

ieee

ISSN :

1089-7771

Type :

jour

DOI :

10.1109/TITB.2003.811878

Filename :

1203137

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1216129