• DocumentCode
    416104
  • Title

    A frequency-based approach for mining coverage statistics in data integration

  • Author

    Nie, Zaiqing ; Kambhampati, Subbarao

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Arizona State Univ., Tempe, AZ, USA
  • fYear
    2004
  • fDate
    30 March-2 April 2004
  • Firstpage
    387
  • Lastpage
    398
  • Abstract
    Query optimization in data integration requires source coverage and overlap statistics. Gathering and storing the required statistics presents many challenges, not the least of which is controlling the amount of statistics learned. We introduce StatMiner, a novel statistics mining approach which automatically generates attribute value hierarchies, efficiently discovers frequently accessed query classes based on the learned attribute value hierarchies, and learns statistics only with respect to these classes. We describe the details of our method, and present experimental results demonstrating the efficiency and effectiveness of our approach. Our experiments are done in the context of BibFinder, a publicly fielded bibliography mediator.
  • Keywords
    bibliographic systems; data integrity; data mining; query processing; statistical databases; user interfaces; BibFinder; StatMiner; bibliography mediator system; data integration; frequency-based data mining; query optimization; statistical database; user interface; Data engineering; Frequency; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2004. Proceedings. 20th International Conference on
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-2065-0
  • Type

    conf

  • DOI
    10.1109/ICDE.2004.1320013
  • Filename
    1320013