• DocumentCode
    2488442
  • Title

    Preliminary approach on synthetic data sets generation based on class separability measure

  • Author

    Macià, Núria ; Bernadó-Mansilla, Ester ; Orriols-Puig, Albert

  • Author_Institution
    Arquitectura La Salle, Univ. Ramon Llull, Barcelona
  • fYear
    2008
  • fDate
    8-11 Dec. 2008
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Usually, performance of classifiers is evaluated on real-world problems that mainly belong to public repositories. However, we ignore the inherent properties of these data and how they affect classifier behavior. Also, the high cost or the difficulty of experiments hinder the data collection, leading to complex data sets characterized by few instances, missing values, and imprecise data. The generation of synthetic data sets solves both issues and allows us to build problems with a minor cost and whose characteristics are predefined. This is useful to test system limitations in a controlled framework. This paper proposes to generate synthetic data sets based on data complexity. We rely on the length of the class boundary to build the data sets, obtaining a preliminary set of benchmarks to assess classifier accuracy. The study can be further matured to identify regions of competence for classifiers.
  • Keywords
    computational complexity; pattern classification; class separability measure; data classifier; data complexity; synthetic data set generation; Algorithm design and analysis; Benchmark testing; Character generation; Classification algorithms; Computational efficiency; Control systems; Costs; Data privacy; Guidelines; System testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
  • Conference_Location
    Tampa, FL
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-2174-9
  • Electronic_ISBN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2008.4761770
  • Filename
    4761770