مرکز منطقه ای اطلاع رساني علوم و فناوري - Mining Probabilistically Frequent Sequential Patterns in Large Uncertain Databases

DocumentCode :

60982

Title :

Mining Probabilistically Frequent Sequential Patterns in Large Uncertain Databases

Author :

Zhou Zhao ; Da Yan ; Ng, Wilfred

Author_Institution :

Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China

Volume :

Issue :

fYear :

2014

fDate :

May-14

Firstpage :

1171

Lastpage :

1184

Abstract :

Data uncertainty is inherent in many real-world applications such as environmental surveillance and mobile tracking. Mining sequential patterns from inaccurate data, such as those data arising from sensor readings and GPS trajectories, is important for discovering hidden knowledge in such applications. In this paper, we propose to measure pattern frequentness based on the possible world semantics. We establish two uncertain sequence data models abstracted from many real-life applications involving uncertain sequence data, and formulate the problem of mining probabilistically frequent sequential patterns (or p-FSPs) from data that conform to our models. However, the number of possible worlds is extremely large, which makes the mining prohibitively expensive. Inspired by the famous PrefixSpan algorithm, we develop two new algorithms, collectively called U-PrefixSpan, for p-FSP mining. U-PrefixSpan effectively avoids the problem of “possible worlds explosion”, and when combined with our four pruning and validating methods, achieves even better performance. We also propose a fast validating method to further speed up our U-PrefixSpan algorithm. The efficiency and effectiveness of U-PrefixSpan are verified through extensive experiments on both real and synthetic datasets.

Keywords :

data mining; database management systems; probability; PrefixSpan algorithm; U-PrefixSpan; data uncertainty; hidden knowledge discovery; large uncertain databases; p-FSP mining; pattern frequentness measurement; probabilistically frequent sequential pattern mining; pruning method; uncertain sequence data models; validating method; world semantics; worlds explosion avoidance; Context; Data mining; Data models; Databases; Equations; Probabilistic logic; Temperature sensors; Data mining; Frequent patterns; Mining methods and algorithms; approximate algorithm; possible world semantics; uncertain databases;

fLanguage :

English

Journal_Title :

Knowledge and Data Engineering, IEEE Transactions on

Publisher :

ieee

ISSN :

1041-4347

Type :

jour

DOI :

10.1109/TKDE.2013.124

Filename :

6570722

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=60982