Title :
About softness for inductive querying on sequence databases
Author :
Mitasiunaite, Ieva ; Boulicaut, Jean-François
Author_Institution :
INSA Lyon, LIRIS CNRS UMR, Villeurbanne
Abstract :
In many application domains (e.g., WWW usage mining, telecommunication data analysis, molecular biology), large sequence databases are available and yet under-exploited. The inductive database framework assumes that both such databases and the various patterns holding within them might be queryable. In this setting, queries which return patterns are called inductive queries and solving them is one of the main topics in database mining research. Indeed, constraint-based mining techniques on sequence databases have been studied extensively the last few years and efficient algorithms enable to compute complete collections of patterns (e.g., sequences) which satisfy conjunctions of monotonic and/or anti-monotonic constraints in potentially large sequence databases (e.g., minimal and maximal frequency constraints). Studying new applications of these techniques, we consider that fault-tolerance and softness are extremely important issues for tackling real-life data analysts. In this paper, we address some of the open problems when computing soft occurrences of patterns within database sequences instead of the classical exact matching ones. Such an extension is not trivial since it prevents the clever use of monotonicity for pruning the search space. We describe our proposal and we provide an experimental validation on real-life clickstream data which confirms the added value of this approach
Keywords :
constraint handling; data analysis; data mining; fault tolerant computing; query processing; very large databases; antimonotonic constraints; clickstream data; constraint-based mining; data analysis; database mining; fault-tolerance; inductive database; inductive querying; large sequence databases; monotonic constraint; pattern soft occurrences computation; search space pruning; Data analysis; Data mining; Databases; Electronic commerce; Fault tolerance; Frequency; Pattern matching; Proposals; Sequences; World Wide Web;
Conference_Titel :
Databases and Information Systems, 2006 7th International Baltic Conference on
Conference_Location :
Vilnius
Print_ISBN :
1-4244-0345-6
DOI :
10.1109/DBIS.2006.1678478