DocumentCode
3187696
Title
E-Clean: A Data Cleaning Framework for Patient Data
Author
Mohamed, Hasimah Hj ; Kheng, Tee Leong ; Collin, Chee ; Lee, Ong Siong
Author_Institution
Sch. of Comput. Sci., Univ. Sains Malaysia, Pulau, Malaysia
fYear
2011
fDate
12-14 Dec. 2011
Firstpage
63
Lastpage
68
Abstract
We need to prepare quality data by pre-processing the raw data. Data cleaning, also called data cleansing or scrubbing, deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data. Data cleaning system are needed to support any changes in the structure, representation or content of data. There are three parts in the cleaning process, i.e. extract the invalid value, matching attributes with valid values and data cleaning algorithm. Our system uses the extract, transform and load model as the system main process model to serve as a guideline for the implementation of the system. Besides that, parsing techniques is also use for the identification of dirty data. The method that we choose for matching attributes is regular expression. Among those data cleaning algorithms, k-Nearest Neighbor algorithm is selected for the data cleaning part of this project because it is simple to understand and easy to implement.
Keywords
attribute grammars; data handling; medical administrative data processing; E-Clean; data cleaning algorithm; data cleansing; data inconsistency; data scrubbing; dirty data identification; error detection; error removal; k-nearest neighbor algorithm; matching attributes; parsing techniques; patient data; raw data pre-processing; Classification algorithms; Cleaning; Data mining; Databases; Knowledge based systems; Load modeling; Transforms; data cleaning; k-Nearest Neighbor; regular expression;
fLanguage
English
Publisher
ieee
Conference_Titel
Informatics and Computational Intelligence (ICI), 2011 First International Conference on
Conference_Location
Bandung
Print_ISBN
978-1-4673-0091-9
Type
conf
DOI
10.1109/ICI.2011.21
Filename
6141651
Link To Document