• DocumentCode
    2399083
  • Title

    WSpan: Weighted Sequential pattern mining in large sequence databases

  • Author

    Yun, Unil ; Leggett, John J.

  • fYear
    2006
  • fDate
    Sept. 2006
  • Firstpage
    512
  • Lastpage
    517
  • Abstract
    Sequential pattern mining algorithms have been developed which mine the set of frequent subsequences satisfying a minimum support constraint in a sequence database. However, previous sequential mining algorithms treat sequential patterns uniformly while sequential patterns have different importance. Another main problem in most of the sequence mining algorithms is that they still generate an exponentially large number of sequential patterns when a minimum support is lowered and they do not provide alternative ways to adjust the number of sequential patterns other than increasing the minimum support. In this paper, we propose a weighted sequential pattern mining algorithm called WSpan. Our main approach is to push the weight constraints into the sequential pattern growth approach while maintaining the downward closure property. A weight range is defined to maintain the downward closure property and items are given different weights within the weight range. In scanning a sequence database, a maximum weight in the sequence database is used to prune weighted infrequent sequential patterns and in the mining step, maximum weights of projected sequence databases are used. By doing so, the downward closure property can be maintained. WSpan generates fewer but important weighted sequential patterns in large databases, particularly dense databases with a low minimum support, by adjusting a weight range
  • Keywords
    data mining; very large databases; WSpan; data mining; downward closure property; large sequence databases; sequential mining algorithms; sequential pattern mining algorithms; weighted infrequent sequential patterns; weighted sequential pattern mining; DNA; Data analysis; Deductive databases; Diseases; Feedback; Intelligent systems; Runtime; Sequences; Web sites; Data Mining; downward closure property; sequential pattern mining; weight constraints;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems, 2006 3rd International IEEE Conference on
  • Conference_Location
    London
  • Print_ISBN
    1-4244-01996-8
  • Electronic_ISBN
    1-4244-01996-8
  • Type

    conf

  • DOI
    10.1109/IS.2006.348472
  • Filename
    4155479