Title :
How biological source capabilities may affect the data collection process
Author :
Lacroix, Zoé ; Edupuganti, Vidyadhari
Author_Institution :
Arizona State Univ., Tempe, AZ, USA
Abstract :
Scientific discovery relies partially on the collection of information related to multiple scientific objects (e.g., "retrieve all genes involved in brain cancer", "retrieve all citations related to diabetes"). Scientists are interested in exploring multiple data sources in order to explore relationships between scientific objects. Each data source provides specific capabilities that allow scientists to access, navigate, and analyze the data. This work addresses the impact of resource selection (data source and capability) in the data collection process as it may affect significantly the quality and completeness of the data. We present preliminary research that demonstrates that the data collection process depends on two orthogonal variables: the data sources involved in the process, and the selection of capabilities available at these resources. We report the results for four commonly used biological resources: the NCBl Nucleotide, Protein, PubMed and OMIM databases.
Keywords :
biology computing; data analysis; information resources; information retrieval; NCBl Nucleotide databases; OMIM databases; Protein databases; PubMed databases; biological resources; biological source capabilities; data access; data analysis; data collection; data completeness; data navigation; data quality; multiple data sources; resource selection; scientific discovery; Access protocols; Cancer; Data analysis; Databases; Diabetes; Diseases; Information retrieval; Navigation; Proteins; User interfaces;
Conference_Titel :
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN :
0-7695-2194-0
DOI :
10.1109/CSB.2004.1332511