DocumentCode :
945705
Title :
Mining Impact-Targeted Activity Patterns in Imbalanced Data
Author :
Cao, Longbing ; Zhao, Yanchang ; Zhang, Chengqi
Author_Institution :
Dept. of Software Eng., Univ. of Technol., Sydney, NSW
Volume :
20
Issue :
8
fYear :
2008
Firstpage :
1053
Lastpage :
1066
Abstract :
Impact-targeted activities are rare but lead to significant impact on the society, e.g., isolated terrorism activities may lead to a disastrous event threatening national security. Similar issues can also be seen in many other areas. Therefore, it is important to identify such particular activities before they lead to significant impact to the world. However, it is challenging to mine impact-targeted activity patterns due to its imbalanced structure. This paper develops techniques for discovering such activity patterns. First, the complexities of mining imbalanced impact-targeted activities are analyzed.We then discuss strategies for constructing impact-targeted activity sequences. Algorithms are developed to mine frequent positive-impact (P rarr T) and negative-impact (P rarr Tmacr macr) oriented activity patterns, sequential impact-contrasted activity patterns (P is frequently associated with both pattern P rarr T and P rarr Tmacr macr in separated data sets), and sequential impact-reversed activity patterns (both P rarr T and PQ rarr T macr are frequent). Activity impact modelling is also studied to quantify pattern impact on business outcomes. Social security debt-related activity data is used to test the proposed approaches. The outcomes show that they are promising for ISI applications to identify impact-targeted activity patterns in imbalanced data.
Keywords :
data mining; national security; security of data; imbalanced data; impact-targeted activity pattern mining; information security; national security; pattern discovery; social security; Clustering; and association rules; classification; data mining;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2007.190635
Filename :
4358938
Link To Document :
بازگشت