Title :
MR-TRIAGE: Scalable multi-criteria clustering for big data security intelligence applications
Author :
Yun Shen ; Thonnard, Olivier
Author_Institution :
Symantec Res. Labs., Dublin, Ireland
Abstract :
Security companies have recently realised that mining massive amounts of security data can help generate actionable intelligence and improve their understanding of Internet attacks. In particular, attack attribution and situational understanding are considered critical aspects to effectively deal with emerging, increasingly sophisticated Internet attacks. This requires highly scalable analysis tools to help analysts classify, correlate and prioritise security events, depending on their likely impact and threat level. However, this security data mining process typically involves a considerable amount of features interacting in a non-obvious way, which makes it inherently complex. To deal with this challenge, we introduce MR-TRIAGE, a set of distributed algorithms built on MapReduce that can perform scalable multi-criteria data clustering on large security data sets and identify complex relationships hidden in massive datasets. The MR-TRIAGE workflow is made of a scalable data summarisation, followed by scalable graph clustering algorithms in which we integrate multi-criteria evaluation techniques. Theoretical computational complexity of the proposed parallel algorithms are discussed and analysed. The experimental results demonstrate that the algorithms can scale well and efficiently process large security datasets on commodity hardware. Our approach can effectively cluster any type of security events (e.g., spam emails, spear-phishing attacks, etc) that are sharing at least some commonalities among a number of predefined features.
Keywords :
Big Data; computer crime; data mining; graph theory; parallel algorithms; pattern clustering; Big Data security intelligence applications; Internet attacks; MR-TRIAGE workflow; MapReduce; attack attribution; commodity hardware; computational complexity; distributed algorithms; large security data sets; large security datasets; multicriteria evaluation techniques; parallel algorithms; scalable data summarisation; scalable graph clustering algorithms; scalable multicriteria data clustering; security companies; security data mining; security events; situational understanding; threat level; Algorithm design and analysis; Clustering algorithms; Data mining; Electronic mail; Open wireless architecture; Prototypes; Security;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004285