DocumentCode
3756622
Title
Bloom Features
Author
Ashok Cutkosky;Kwabena Boahen
Author_Institution
Comput. Sci. Dept., Stanford Univ., Stanford, CA, USA
fYear
2015
Firstpage
547
Lastpage
552
Abstract
We introduce a method for function-fitting that achieves high accuracy with a low memory footprint. For d-dimensional data and any user-specified m, we define a feature map from d to m dimensional Euclidean space with memory footprint O(m) that scales as follows: As m increases, the space of linear functions on our m-dimensional features approximates any MAX (or boolean OR) function on the d-dimensional inputs with expected error inversely proportional to m. Our method is the only one in existence with this scaling that can simultaneously run in O(m) time, process real-value inputs, and approximate non-linear functions, properties respectively not achieved by random Fourier features, b-bit Minwise Hashing, and Vowpal Wabbit, three competing methods. We achieve all three properties by using hashing (O(m) space) to implement a sparse-matrix multiply (O(m) time) with addition replaced by MAX (non-linear approximation). As these techniques are inspired by the Bloom filter, we call the vectors produced by our mapping Bloom features. We demonstrate that the scaling prefactors are reasonable by testing our method on simulated (Dirichlet distributions) and real (MNIST and webspam) datasets.
Keywords
"Sparse matrices","Computer science","Testing","Memory management","Loss measurement","Linear regression","Bars"
Publisher
ieee
Conference_Titel
Computational Science and Computational Intelligence (CSCI), 2015 International Conference on
Type
conf
DOI
10.1109/CSCI.2015.144
Filename
7424153
Link To Document