Sparsity Learning Formulations for Mining Time-Varying Data

Author

Rongjian Li ; Wenlu Zhang ; Yao Zhao ; Zhenfeng Zhu ; Shuiwang Ji

Author_Institution

Dept. of Comput. Sci., Old Dominion Univ., Norfolk, VA, USA

Volume

27

Issue

5

fYear

2015

fDate

May 1 2015

Firstpage

1411

Lastpage

1423

Abstract

Traditional clustering and feature selection methods consider the data matrix as static. However, the data matrices evolve smoothly over time in many applications. A simple approach to learn from these time-evolving data matrices is to analyze them separately. Such strategy ignores the time-dependent nature of the underlying data. In this paper, we propose two formulations for evolutionary co-clustering and feature selection based on the fused Lasso regularization. The evolutionary co-clustering formulation is able to identify smoothly varying hidden block structures embedded into the matrices along the temporal dimension. Our formulation is very flexible and allows for imposing smoothness constraints over only one dimension of the data matrices. The evolutionary feature selection formulation can uncover shared features in clustering from time-evolving data matrices. We show that the optimization problems involved are non-convex, non-smooth and non-separable. To compute the solutions efficiently, we develop a two-step procedure that optimizes the objective function iteratively. We evaluate the proposed formulations using the Allen Developing Mouse Brain Atlas data. Results show that our formulations consistently outperform prior methods.

Keywords

data mining; evolutionary computation; feature selection; learning (artificial intelligence); matrix algebra; optimisation; pattern clustering; Allen Developing Mouse Brain Atlas data; evolutionary coclustering formulation; evolutionary feature selection formulation; fused Lasso regularization; optimization problems; shared features; smoothly varying hidden block structures; smoothness constraints; sparsity learning formulation; temporal dimension; time-evolving data matrices; time-varying data mining; Approximation methods; Data mining; Gene expression; Linear programming; Optimization; Sparse matrices; Vectors; Sparsity learning; bioinformatics; co-clustering; feature selection; neuroinformatics; optimization; time-varying data;

fLanguage

English

Journal_Title

Knowledge and Data Engineering, IEEE Transactions on

Publisher

ieee

ISSN

1041-4347

Type

jour

DOI

10.1109/TKDE.2014.2373411

Filename

6963408