DocumentCode :
579972
Title :
Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas
Author :
Valiant, Gregory
Author_Institution :
UC Berkeley, Berkeley, CA, USA
fYear :
2012
fDate :
20-23 Oct. 2012
Firstpage :
11
Lastpage :
20
Abstract :
Given a set of n d-dimensional Boolean vectors with the promise that the vectors are chosen uniformly at random with the exception of two vectors that have Pearson-correlation ρ (Hamming distance d · 1-ρ/2), how quickly can one find the two correlated vectors? We present an algorithm which, for any constants ε, ρ >; 0 and d >;>; logn/ρ2 , finds the correlated pair with high probability, and runs in time O(n 3ω/4 + ϵ) <; O(n1.8), where w <; 2.38 is the exponent of matrix multiplication. Provided that d is sufficiently large, this runtime can be further reduced. These are the first subquadratic-time algorithms for this problem for which ρ does not appear in the exponent of n, and improves upon O(n2-O(ρ)), given by Paturi et al. [15], Locality Sensitive Hashing (LSH) [11] and the Bucketing Codes approach [6]. Applications and extensions of this basic algorithm yield improved algorithms for several other problems: ApproximateClosest Pair: For any sufficiently small constant ϵ >; 0, given n vectors in Rd, our algorithm returns a pair of vectors whose Euclidean distance differs from that of the closest pair by a factor of at most 1+ϵ, and runs in time O(n2-Θ(√ϵ)). The best previous algorithms (including LSH) have runtime O(n2-O(ϵ)). Learning Sparse Parity with Noise: Given samples from an instance of the learning parity with noise problem where each example has length n, the true parity set has size at most k <;<; n, and the noise rate is η, our algorithm identifies the set of k indices in time n ω+ϵ/3 k poly(1/1-2η) <; n0.8kpoly(1/1-2η). This is the first algorithm with no depenJence on η in the exponent of n, aside from the trivial brute-force algorithm. Learning k-Juntas wi- h Noise: Given uniformly random length n Boolean vectors, together with a label, which is some function of just k <;<; n of the bits, perturbed by noise rate η, return the set of relevant indices. Leveraging the reduction of Feldman et al. [7] our result for learning k-parities implies an algorithm for this problem with runtime n ω+ϵ/3 k poly(1/1-2η) <; n0.8k poly(1/1-2η), 2 which improves on the previous best of >; nk(1-2/2k)poly( 1/1-2η ), from [8]. Learning k-Juntas without Noise:1 Our results for learning sparse parities with noise imply an algorithm for learning juntas without noise with runtime n ω+ϵ/4k poly(n) <; n0.6 kpoly(n), which improves on the runtime n ω+1/ω poly(n) ≈ n0.7k poly(n) of Mossel n et al. [13].
Keywords :
Boolean algebra; computational complexity; computational geometry; cryptography; learning (artificial intelligence); matrix multiplication; vectors; Euclidean distance; Hamming distance; LSH; Pearson-correlation; approximate closest pair; bucketing codes; correlated vectors; correlation finding; d-dimensional Boolean vectors; learning k-juntas; learning k-parities; learning sparse parity; locality sensitive hashing; matrix multiplication; noise problem; noise rate; subquadratic-time algorithms; Approximation algorithms; Chebyshev approximation; Correlation; Noise; Noise measurement; Runtime; Vectors; Correlation; closest pair; learning juntas; learning parity with noise; locality sensitive hashing; metric embedding; nearest neighbor;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on
Conference_Location :
New Brunswick, NJ
ISSN :
0272-5428
Print_ISBN :
978-1-4673-4383-1
Type :
conf
DOI :
10.1109/FOCS.2012.27
Filename :
6375277
Link To Document :
بازگشت