• DocumentCode
    1160205
  • Title

    Synthesizing high-frequency rules from different data sources

  • Author

    Wu, Xindong ; Zhang, Shichao

  • Author_Institution
    Dept. of Comput. Sci., Vermont Univ., Burlington, VT, USA
  • Volume
    15
  • Issue
    2
  • fYear
    2003
  • Firstpage
    353
  • Lastpage
    367
  • Abstract
    Many large organizations have multiple data sources, such as different branches of an interstate company. While putting all data together from different sources might amass a huge database for centralized processing, mining association rules at different data sources and forwarding the rules (rather than the original raw data) to the centralized company headquarter provides a feasible way to deal with multiple data source problems. In the meanwhile, the association rules at each data source may be required for that data source in the first instance, so association analysis at each data source is also important and useful. However, the forwarded rules from different data sources may be too many for the centralized company headquarter to use. This paper presents a weighting model for synthesizing high-frequency association rules from different data sources. There are two reasons to focus on high-frequency rules. First, a centralized company headquarter is interested in high-frequency rules because they are supported by most of its branches for corporate profitability. Second, high-frequency rules have larger chances to become valid rules in the union of all data sources. In order to extract high-frequency rules efficiently, a procedure of rule selection is also constructed to enhance the weighting model by coping with low-frequency rules. Experimental results show that our proposed weighting model is efficient and effective.
  • Keywords
    data mining; learning (artificial intelligence); very large databases; association rules; large databases; mining association rules; multiple data sources; rule selection; weighting model; Association rules; Data analysis; Data mining; Distributed databases; Information analysis; Pattern analysis; Profitability; Transaction databases;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2003.1185839
  • Filename
    1185839