DocumentCode
180808
Title
Preventing False Discovery in Interactive Data Analysis Is Hard
Author
Hardt, Marcus ; Ullman, Jonathan
fYear
2014
fDate
18-21 Oct. 2014
Firstpage
454
Lastpage
463
Abstract
We show that, under a standard hardness assumption, there is no computationally efficient algorithm that given n samples from an unknown distribution can give valid answers to n3+o(1) adaptively chosen statistical queries. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is valid if it is "close" to the correct expectation over the distribution. Our result stands in stark contrast to the well known fact that exponentially many statistical queries can be answered validly and efficiently if the queries are chosen non-adaptively (no query may depend on the answers to previous queries). Moreover, Dwork et al. [1], showed how to accurately answer exponentially many adaptively chosen statistical queries via a computationally inefficient algorithm. They also gave efficient algorithm that can answer nearly n2 adaptively chosen queries, which shows our result is almost quantitatively tight. Conceptually, our result demonstrates that achieving statistical validity alone can be a source of computational intractability in adaptive settings. For example, in the modern large collaborative research environment, data analysts typically choose a particular approach based on previous findings. False discovery occurs if a research finding is supported by the data but not by the underlying distribution. While the study of preventing false discovery in Statistics is decades old, to the best of our knowledge our result is the first to demonstrate a computational barrier. In particular, our result suggests that the perceived difficulty of preventing false discovery in today\´s collaborative research environment may be inherent.
Keywords
data analysis; query processing; statistics; computational intractability; false discovery prevention; interactive data analysis; statistical queries; Accuracy; Adaptation models; Algorithm design and analysis; Encryption; Polynomials; Privacy; Standards;
fLanguage
English
Publisher
ieee
Conference_Titel
Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on
Conference_Location
Philadelphia, PA
ISSN
0272-5428
Type
conf
DOI
10.1109/FOCS.2014.55
Filename
6979030
Link To Document