Title :
Evaluating topology-based metrics for GO term similarity measures
Author :
Jong Cheol Jeong ; Xue-wen Chen
Author_Institution :
Center for Bioinf., Univ. of Kansas, Lawrence, KS, USA
Abstract :
Defining semantic functional similarity measures provides effective means to validate protein function prediction methods and to retrieve biologically relevant information from big biological data. It also improves understanding of interrelationship between genes and gene products (GPs). Currently, one of the most commonly used tools for functionally annotating genes and GPs is the Gene Ontology (GO), which describes genes/GPs using a machine-readable language. To measure the semantic similarity between two GO terms, many studies that are based on GO topology have recently been reported. However, a comprehensive assessment and general guidelines for validating these methods are lacking. In this paper, we collect a large dataset to evaluate five often-used semantic similarity measure methods by estimating sequence similarity, phylogenetic profile similarity, and structural similarity. We further compare the measures in terms of their clustering performance using domains extracted from SCOP database. We describe some key aspects of these measure methods and discuss how the limitations may be addressed as well as some open problems.
Keywords :
bioinformatics; evolution (biological); genetics; genomics; information retrieval; learning (artificial intelligence); ontologies (artificial intelligence); pattern clustering; proteins; semantic networks; GO term similarity measures; GO topology; Gene Ontology; SCOP database; big biological data; biologically relevant information retrieval; clustering performance; comprehensive assessment; functional GP annotation; functional gene annotation; gene interrelationship; gene product; general guidelines; machine-readable language; often-used semantic similarity measure methods; phylogenetic profile similarity; protein function prediction methods; semantic functional similarity measures; sequence similarity; structural similarity; topology-based metrics; Biological information theory; Correlation; Current measurement; Integrated circuits; Phylogeny; Proteins; gene ontology; gene products; semantic functional similarity;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location :
Shanghai
DOI :
10.1109/BIBM.2013.6732457