مرکز منطقه ای اطلاع رساني علوم و فناوري - Using data mining techniques to learn layouts of flat-file biological datasets

DocumentCode :

2583023

Title :

Using data mining techniques to learn layouts of flat-file biological datasets

Author :

Sinha, Kaushik ; Zhang, Xuan ; Jin, Ruoming ; Agrawal, Gagan

Author_Institution :

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear :

2005

fDate :

19-21 Oct. 2005

Firstpage :

177

Lastpage :

184

Abstract :

One of the major problems in biological data integration is that many data sources are stored as atlasses, with a variety of different layouts. Integrating data from such sources can be an extremely time-consuming task. We have been developing data mining techniques to help learn the layout of a dataset in a semi-automatic way. In this paper, we focus on the problem of identifying delimiters for optional fields. Since these fields do not occur in every record, frequency based methods are not able to identify the corresponding delimiters. We present a method which uses contrast analysis on the frequency of sequences to identify such delimiters and help complete the layout descriptions. We demonstrate the effectiveness of this technique using three atlasses biological datasets.

Keywords :

biology computing; data mining; molecular biophysics; molecular configurations; atlas; atlas biological datasets; biological data integration; contrast analysis; data mining; delimiters; fields; flat-file biological datasets; sequence frequency; Bioinformatics; Biological system modeling; Biology; Computer science; Data analysis; Data engineering; Data mining; Databases; Frequency; Sequences;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Bioinformatics and Bioengineering, 2005. BIBE 2005. Fifth IEEE Symposium on

Print_ISBN :

0-7695-2476-1

Type :

conf

DOI :

10.1109/BIBE.2005.56

Filename :

1544464

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2583023