Title of article :
Repairing fractures between data using genetic programming-based feature extraction: A case study in cancer diagnosis
Author/Authors :
Jose G. Moreno-Torres، نويسنده , , Xavier Llorà، نويسنده , , David E. Goldberg، نويسنده , , Rohit Bhargava، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2013
Abstract :
There is an underlying assumption on most model building processes: given a learned classifier, it should be usable to explain unseen data from the same given problem. Despite this seemingly reasonable assumption, when dealing with biological data it tends to fail; where classifiers built out of data generated using the same protocols in two different laboratories can lead to two different, non-interchangeable, classifiers. There are usually too many uncontrollable variables in the process of generating data in the lab and biological variations, and small differences can lead to very different data distributions, with a fracture between data.
This paper presents a genetics-based machine learning approach that performs feature extraction on data from a lab to help increase the classification performance of an existing classifier that was built using the data from a different laboratory which uses the same protocols, while learning about the shape of the fractures between data that motivated the bad behavior.
The experimental analysis over benchmark problems together with a real-world problem on prostate cancer diagnosis show the good behavior of the proposed algorithm.
Keywords :
Genetic programming , feature extraction , Fractures between data , cancer diagnosis , Different laboratories , Biological data
Journal title :
Information Sciences
Journal title :
Information Sciences