DocumentCode
3717473
Title
A scalable solution for group feature selection
Author
Priya Govindan;Ruobing Chen;Katya Scheinberg;Soundararajan Srinivasan
Author_Institution
Rutgers University
fYear
2015
Firstpage
2846
Lastpage
2848
Abstract
In many applications, we may want to build a classifier with high confidence, while reducing the number of features. We consider the case where features are assigned to predefined groups and cannot be removed individually. An additional and important constraint is that the datasets may be very large and may not fit in memory. We use logistic regression with group penalty, which results in sparse solutions at the group level. In our implementation, we apply L-BFGS to approximate the quadratic loss function of logistic regression and use Block Co-ordinate Descent to solve for each group. Our contributions can be summarized as follows: (1) we discuss different scalable approaches, depending on characteristics of the dataset, such as, large number of data points or large number of features or large number of groups; (2) for datasets with large number of data points and few groups of features, we identify the bottlenecks for scalability; (3) we present Spark solutions in Python and discuss the advantages of our solution over alternate solutions; (4) we present the experiments and results on synthetic data and real data from manufacturing applications.
Keywords
"Sparks","Logistics","Runtime","Sparse matrices","Approximation methods","Machine learning algorithms","Big data"
Publisher
ieee
Conference_Titel
Big Data (Big Data), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/BigData.2015.7364098
Filename
7364098
Link To Document