Abstract
Suppose Alice has collected a small number of samples from an unknown distribution, and would like to learn about the distribution. Bob, an untrusted data analyst, claims that he ran a sophisticated data analysis on the distribution, and makes assertions about its properties. Can Alice efficiently verify Bob's claims using fewer resources (say in terms of samples and computation) than would be needed to run the analysis herself? We construct an interactive proof system for any distribution property for which the distance from the property can be computed by (log-space) uniform polynomial size circuits of depth D, where the circuit gets a complete description of the distribution. Taking N to be an upper bound on the size of the distribution's support, the verifier's sample complexity, the running time, and the communication complexity are all sublinear in N: they are bounded by O(N1-a+D) for a constant a > 0. The honest prover runs in poly(N) time and has quasi-linear sample complexity. Moreover, the proof system is tolerant: it can be used to approximate the distribution's distance from the property. We show similar results for any distribution property for which the distance from the property can be approximated by a bounded-space Turing machine (that gets as input a complete description of the distribution). We remark that even for simple properties, deciding the property without a prover requires quasi-linear sample complexity and running time. Prior work [Herman and Rothblum, FOCS 2023] demonstrated sublinear interactive proof systems, but only for the much more restricted class of label-invariant distribution properties.