Title :
Significance analysis of clustering high throughput biological data
Author :
Otu, Hasan H. ; Kolia, Shakirahmed ; Jones, Jon ; Osman, Osman ; Libermann, Towia A.
Author_Institution :
Beth Israel Deaconess Med. Center Genomics Center & Bioinformatics Core, Beth Israel Deaconess Med. Center & Harvard Med. Sch., Boston, MA
Abstract :
In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer´s (UICC) TNM classification
Keywords :
DNA; biology computing; cancer; genetics; molecular biophysics; pattern classification; pattern clustering; sequences; time of flight mass spectroscopy; tumours; Affymetrix HG-U133A chips; RCC tumors; Renal Cell Cancer; adult kidney malignancy; biological data clustering; bootstrap technique; gene expression data; genome sequences; genomics data; metastatic RCC patients; protein profiles; proteomics data; significance analysis; surface-enhanced laser desorption/ionization time-of-flight mass spectrometry; unsupervised classification; unsupervised clustering; Availability; Bioinformatics; Cancer; Genomics; Medical treatment; Proteins; Proteomics; Self organizing feature maps; Sequences; Throughput;
Conference_Titel :
Electro Information Technology, 2005 IEEE International Conference on
Conference_Location :
Lincoln, NE
Print_ISBN :
0-7803-9232-9
DOI :
10.1109/EIT.2005.1627001