Title :
Data-oriented research for bioresource utilization: A case study to investigate water uptake in cellulose using Principal Components
Author :
Liu Yi Ling ; Driemeier, C. ; Cesar, Roberto M.
Author_Institution :
CTBE/CNPEM, Brazilian Bioethanol Sci. & Technol. Lab., Campinas, Brazil
Abstract :
Bioresource utilization represents an important interdisciplinary research that integrates academic and industrial expertise across diverse scientific domains, including physics, chemistry, biology, and engineering. The present paper describes a cyber-infrastructure being created at the Brazilian Bioethanol Science and Technology Laboratory (CTBE) to assist scientists working on the field. One key element of the infrastructure is the LignoCel Platform, a tailor-made database for upload, curation, and sharing of lignocellulose data. Particularly, LignoCel allows querying the data and exporting subsets that are analyzed for knowledge extraction. In the present paper, a case-study is described, in which scientists want to investigate the dimensions that relate cellulose structure and water uptake. Data analysis and dimensionality reduction using Principal Component Analysis (PCA) is employed. Different PCA-based measurements are extracted and visualized through automatically-generated HTML pages available for the domain scientists. In this case study, the workflow successfully provided dimensionality reduction from a data matrix originated from a heterogeneous set of materials. PCA scores and loadings are explored for data analysis and visualization. PCA reduced the 11 measured features (obtained from three different experimental techniques, 55 possible combinations of size 2) into a two-dimensional PC1PC2 loadings plot representing 89% of data variance. Examples of the output produced by the system are available at http://data.bioetanol.org. br/~liu.ling/pca-lignocel/.
Keywords :
biofuel; data analysis; data visualisation; hypermedia markup languages; knowledge acquisition; organic compounds; principal component analysis; query processing; Brazilian Bioethanol Science and Technology Laboratory; CTBE; LignoCel platform; PCA loadings; PCA scores; PCA-based measurements; automatic HTML page generation; bioresource utilization; cellulose structure; cyber-infrastructure; data analysis; data matrix; data querying; data variance; data visualization; data-oriented research; dimensionality reduction; interdisciplinary research; knowledge extraction; lignocellulose data curation; lignocellulose data sharing; lignocellulose data upload; principal component analysis; scientific workflow; tailor-made database; two-dimensional PC1PC2 loadings; water uptake; Correlation; Data visualization; Databases; Laboratories; Loading; Materials; Principal component analysis; Principal Component Analysis; Scientific workflow; bioethanol; lignocellulose;
Conference_Titel :
E-Science (e-Science), 2012 IEEE 8th International Conference on
Conference_Location :
Chicago, IL
Print_ISBN :
978-1-4673-4467-8
DOI :
10.1109/eScience.2012.6404485