Author_Institution :
Sch. of Mech. & Electron. Eng., Jing-De-Zhen Ceramic Inst., Jing-De-Zhen, China
Abstract :
In the protein universe, many proteins are composed of two or more polypeptide chains, generally referred to as subunits, which associate through noncovalent interactions and, occasionally, disulfide bonds. With the number of protein sequences entering into data banks rapidly increasing, we are confronted with a challenge: how to develop an automated method to identify the quaternary attribute for a new polypeptide chain (i.e., whether it is formed just as a monomer, or as a dimer, trimer, or any other oligomer). This is important, because the functions of proteins are closely related to their quaternary attribute. In this report, using machine learning approach, the nearest neighbor algorithm (NNA) and covariant-discriminant algorithm (CDA), we developed a prediction system for protein quaternary structural type in which we incorporated functional domain composition (FunD) and pseudo-amino acid composition (PseAA). To compare, we adopted a benchmark dataset, which had been studied time after time. The overall accuracy achieved by this system is more than 89% in the Jack-knife test. Such a technique should improve the success rate of structural biology projects.
Keywords :
bioinformatics; covariance analysis; learning (artificial intelligence); molecular biophysics; molecular configurations; pattern classification; proteins; statistical testing; Jack-knife test; automated method; covariant-discriminant algorithm; data banks; disulfide bonds; functional domain; functional domain composition; machine learning approach; nearest neighbor classifier algorithm; noncovalent interactions; polypeptide chain; polypeptide chains; protein quaternary structural type prediction; protein sequences; pseudo amino acid composition; Amino acids; Benchmark testing; Ceramics; In vivo; Machine learning; Machine learning algorithms; Nearest neighbor searches; Protein engineering; Sequences; System testing;