• DocumentCode
    769267
  • Title

    Discovering Frequent Closed Partial Orders from Strings

  • Author

    Pei, Jian ; Wang, Haixun ; Liu, Jian ; Wang, Ke ; Wang, Jianyong ; Yu, Philip S.

  • Author_Institution
    Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC
  • Volume
    18
  • Issue
    11
  • fYear
    2006
  • Firstpage
    1467
  • Lastpage
    1481
  • Abstract
    Mining knowledge about ordering from sequence data is an important problem with many applications, such as bioinformatics, Web mining, network management, and intrusion detection. For example, if many customers follow a partial order in their purchases of a series of products, the partial order can be used to predict other related customers´ future purchases and develop marketing campaigns. Moreover, some biological sequences (e.g., microarray data) can be clustered based on the partial orders shared by the sequences. Given a set of items, a total order of a subset of items can be represented as a string. A string database is a multiset of strings. In this paper, we identify a novel problem of mining frequent closed partial orders from strings. Frequent closed partial orders capture the nonredundant and interesting ordering information from string databases. Importantly, mining frequent closed partial orders can discover meaningful knowledge that cannot be disclosed by previous data mining techniques. However, the problem of mining frequent closed partial orders is challenging. To tackle the problem, we develop Frecpo (for frequent closed partial order), a practically efficient algorithm for mining the complete set of frequent closed partial orders from large string databases. Several interesting pruning techniques are devised to speed up the search. We report an extensive performance study on both real data sets and synthetic data sets to illustrate the effectiveness and the efficiency of our approach
  • Keywords
    data mining; pattern clustering; string matching; very large databases; data mining techniques; frequent closed partial order mining; knowledge discovery; large string database; pruning techniques; Application software; Bioinformatics; Computer Society; Computer network management; Data mining; Databases; Intrusion detection; Knowledge management; Loans and mortgages; Web mining; Frequent patterns; closed patterns; data mining.; partial orders; strings;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2006.172
  • Filename
    1704800