Title :
Gene Ontology Automatic Annotation Using a Domain Based Gene Product Similarity Measure
Author :
Popescu, Mihail ; Keller, James M. ; Mitchell, Joyce A.
Author_Institution :
Dept. of Health Manage. & Informatics, Missouri Univ., Columbia, MO
Abstract :
Recent years have seen an explosive growth in the amount of biological data available for analysis. The large volume of data collected makes it necessary to automatically classify and sort such data on a very large scale. Typically, investigators use computational sequence analysis tools to assign functions to newly found gene products. The problem is to find the functions of a (unknown) gene product given its amino acid sequence. In this work we search for functional similarity between gene products by matching the functional domains that they contain. The domain-based approach addresses the main problem of sequence-based similarity, i.e., when the region of a gene product that is matched by a query sequence is not related to the function of that gene product. We use the hidden Markov representation of a gene product domain as described in the PFAM database, and then infer annotations that come from the Gene Ontology. To compute domain similarity between two gene products we introduce a fuzzy Jaccard similarity measure. We tested our domain-based similarity for the functional annotation of a set of 194 gene products extracted from the ENSEMBL Web site. We compared the domain similarity approach to the traditional way of performing functional annotation using a sequence-based similarity (BLAST and Smith-Waterman). The annotation was performed in all cases using a fuzzy K-nearest neighbor algorithm. We found that our domain-based annotation was better than the most common BLAST approach, but not as good as complex Smith-Waterman technique. The domain-based annotation has about 70% correct annotation rate at 17% false annotation rate
Keywords :
biology computing; data analysis; genetics; hidden Markov models; ontologies (artificial intelligence); sorting; BLAST; ENSEMBL Web site; Gene Ontology automatic annotation; Smith-Waterman technique; amino acid sequence; computational sequence analysis tool; domain based gene product similarity measure; domain similarity approach; domain-based annotation; functional similarity; fuzzy Jaccard similarity measure; fuzzy K-nearest neighbor algorithm; hidden Markov representation; query sequence; sequence-based similarity; Amino acids; Bioinformatics; Biology computing; Biomedical informatics; Databases; Electric variables measurement; Engineering management; Genomics; Ontologies; Proteins;
Conference_Titel :
Fuzzy Systems, 2005. FUZZ '05. The 14th IEEE International Conference on
Conference_Location :
Reno, NV
Print_ISBN :
0-7803-9159-4
DOI :
10.1109/FUZZY.2005.1452377