Title :
Mining sequences for patterns with non-repeating symbols
Author :
Walicki, Michal ; Ferreira, Diogo R.
Author_Institution :
Inst. of Inf., Univ. of Bergen, Bergen, Norway
Abstract :
Finding the case id in unlabeled event logs is arguably one of the hardest challenges in process mining research. While this problem can be addressed with greedy approaches, these usually converge to sub-optimal solutions. In this paper, we describe an approach to perform complete search over the search space. We formulate the problem as a matter of finding the minimal set of patterns contained in a sequence, where patterns can be interleaved but do not have repeating symbols. We show that for practical purposes it is possible to reduce the search space to maximal disjoint occurrences of these patterns. Experimental results suggest that, whenever this approach finds a solution, it usually finds a minimal one.
Keywords :
data mining; pattern recognition; sequences; system monitoring; complete search; greedy approach; nonrepeating symbols; process mining; search space; sequence pattern mining; unlabeled event logs;
Conference_Titel :
Evolutionary Computation (CEC), 2010 IEEE Congress on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6909-3
DOI :
10.1109/CEC.2010.5585995