• DocumentCode
    1705677
  • Title

    A noise filtering method using neural networks

  • Author

    Zeng, Xinchuan ; Martinez, Tony

  • Author_Institution
    Dept. of Comput. Sci., Brigham Young Univ., Provo, UT, USA
  • fYear
    2003
  • fDate
    5/17/2003 12:00:00 AM
  • Firstpage
    26
  • Lastpage
    31
  • Abstract
    During the data collecting and labeling process it is possible for noise to be introduced into a data set. As a result, the quality of the data set degrades and experiments and inferences derived from the data set become less reliable. In this paper we present an algorithm, called ANR (automatic noise reduction), as a filtering mechanism to identify and remove noisy data items whose classes have been mislabeled. The underlying mechanism behind ANR is based on a framework of multi-layer artificial neural networks. ANR assigns each data item a soft class label in the form of a class probability vector, which is initialized to the original class label and can be modified during training. When the noise level is reasonably small (< 30%), the non-noisy data is dominant in determining the network architecture and its output, and thus a mechanism for correcting mislabeled data can be provided by aligning class probability vector with the network output. With a learning procedure for class probability vector based on its difference from the network output, the probability of a mislabeled class gradually becomes smaller while that of the correct class becomes larger, which eventually causes a correction of mislabeled data after sufficient training. After training, those data items whose classes have been relabeled are then treated as noisy data and removed from the data set. We evaluate the performance of the ANR based on 12 data sets drawn from the UCI data repository. The results show that ANR is capable of identifying a significant portion of noisy data. An average increase in accuracy of 24.5% can be achieved at a noise level of 25% by using ANR as a training data filter for a nearest neighbor classifier, as compared to the one without using ANR.
  • Keywords
    digital filters; feedforward neural nets; formal verification; learning (artificial intelligence); multilayer perceptrons; pattern classification; probability; ANR algorithm; UCI data repository; artificial neural network; automatic noise reduction; class probability vector; data collection; data labeling; data noise; data set reliability; learning procedure; multilayer neural network; nearest neighbor classifier; network architecture; network output; noise filtering; noise identification; noise level; noise removal; performance evaluation; Artificial neural networks; Degradation; Filtering algorithms; Filters; Inference algorithms; Labeling; Neural networks; Noise level; Noise reduction; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Soft Computing Techniques in Instrumentation, Measurement and Related Applications, 2003. SCIMA 2003. IEEE International Workshop on
  • Print_ISBN
    0-7803-7711-7
  • Type

    conf

  • DOI
    10.1109/SCIMA.2003.1215926
  • Filename
    1215926