Bloom Features

Author

Ashok Cutkosky;Kwabena Boahen

Author_Institution

Comput. Sci. Dept., Stanford Univ., Stanford, CA, USA

fYear

2015

Firstpage

547

Lastpage

552

Abstract

We introduce a method for function-fitting that achieves high accuracy with a low memory footprint. For d-dimensional data and any user-specified m, we define a feature map from d to m dimensional Euclidean space with memory footprint O(m) that scales as follows: As m increases, the space of linear functions on our m-dimensional features approximates any MAX (or boolean OR) function on the d-dimensional inputs with expected error inversely proportional to m. Our method is the only one in existence with this scaling that can simultaneously run in O(m) time, process real-value inputs, and approximate non-linear functions, properties respectively not achieved by random Fourier features, b-bit Minwise Hashing, and Vowpal Wabbit, three competing methods. We achieve all three properties by using hashing (O(m) space) to implement a sparse-matrix multiply (O(m) time) with addition replaced by MAX (non-linear approximation). As these techniques are inspired by the Bloom filter, we call the vectors produced by our mapping Bloom features. We demonstrate that the scaling prefactors are reasonable by testing our method on simulated (Dirichlet distributions) and real (MNIST and webspam) datasets.

Keywords

"Sparse matrices","Computer science","Testing","Memory management","Loss measurement","Linear regression","Bars"

Publisher

ieee

Conference_Titel

Computational Science and Computational Intelligence (CSCI), 2015 International Conference on

Type

conf

DOI

10.1109/CSCI.2015.144

Filename

7424153