• DocumentCode
    124205
  • Title

    Using Extended Random Set to Find Specific Patterns

  • Author

    Albathan, Mubarak ; Yuefeng Li ; Yue Xu

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., Queensland Univ. of Technol., Brisbane, QLD, Australia
  • Volume
    2
  • fYear
    2014
  • fDate
    11-14 Aug. 2014
  • Firstpage
    30
  • Lastpage
    37
  • Abstract
    With the overwhelming increase in the amount of data on the web and data bases, many text mining techniques have been proposed for mining useful patterns in text documents. Extracting closed sequential patterns using the Pattern Taxonomy Model (PTM) is one of the pruning methods to remove noisy, inconsistent, and redundant patterns. However, PTM model treats each extracted pattern as whole without considering included terms, which could affect the quality of extracted patterns. This paper propose an innovative and effective method that extends the random set to accurately weigh patterns based on their distribution in the documents and their terms distribution in patterns. Then, the proposed approach will find the specific closed sequential patterns (SCSP) based on the new calculated weight. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms other state-of-the-art methods in different popular measures.
  • Keywords
    data mining; text analysis; PTM; RCV1 data collection; Reuters Corpus Volume 1 data collection; SCSP; TREC topics; World Wide Web; closed sequential pattern extraction; data bases; extended random set; inconsistent pattern removal; noisy pattern removal; pattern taxonomy model; pruning methods; redundant pattern removal; specific closed sequential patterns; text documents; text mining techniques; useful pattern mining; Feature extraction; Mathematical model; Noise measurement; Probability; Taxonomy; Text mining; Extended Random Set; Information retrieval; Select top-k Patterns; Specific Closed Sequential Patterns; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
  • Conference_Location
    Warsaw
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2014.77
  • Filename
    6927604