The impact of different fold for cross validation of missing values imputation method on hepatitis dataset

Author

Tri Astuti;Hanung Adi Nugroho;Teguh Bharata Adji

Author_Institution

Department of Informatics Engineering, STMIK Amikom Purwokerto, Indonesia

fYear

2015

Firstpage

51

Lastpage

55

Abstract

Hepatitis is a liver disease caused by hepatitis viruses. Nowadays, hepatitis is a global health problem, including in Indonesia. Chronic hepatitis can lead to cirrhosis and liver cancer, therefore early diagnosis is needed. Several research works on development of computer aided systems have been conducted to improve the diagnosis process of hepatitis disease. California Irvine (UCI) machine-learning repository provides hepatitis disease dataset which can be publicly accessed; however, the dataset contains many missing values. The existing of missing values in the dataset may affect the quality of the results analysis. Therefore, it needs to be conducted for handling the missing values. This paper analyses the performance of applying varied number of fold for cross validation of missing values imputation methods. The imputation method is combined with the feature selection method and machine-learning algorithm on the hepatitis dataset. The results that varied fold in k-fold cross validation which applied in the imputation method does not reveal significant advantages.

Keywords

"Viruses (medical)","Computational modeling"

Publisher

ieee

Conference_Titel

Quality in Research (QiR), 2015 International Conference on

Print_ISBN

978-1-4799-6550-2

Type

conf

DOI

10.1109/QiR.2015.7374894

Filename

7374894