View publication

Fingerprinting codes are a crucial tool for proving lower bounds in differential privacy. They have been used to prove tight lower bounds for several fundamental questions, especially in the "low accuracy" regime. Unlike reconstruction/discrepancy approaches however, they are more suited for proving worst-case lower bounds, for query sets that arise naturally from the fingerprinting codes construction. In this work, we propose a general framework for proving fingerprinting type lower bounds, that allows us to tailor the technique to the geometry of the query set. Our approach allows us to prove several new results.

First, we show that any (sample- and population-)accurate algorithm for answering QQ arbitrary adaptive counting queries over a universe X\mathcal{X} to accuracy α\alpha needs Ω(logXlogQα3)\Omega(\frac{\sqrt{\log |\mathcal{X}|}\cdot \log Q}{\alpha^3}) samples. This shows that the approaches based on differential privacy are optimal for this question, and improves significantly on the previously known lower bounds of logQα2\frac{\log Q}{\alpha^2} and min(Q,logX)/α2\min(\sqrt{Q}, \sqrt{\log |\mathcal{X}|})/\alpha^2. Seconly, we show that any (ε,δ)(\varepsilon,\delta)-DP algorithm for answering QQ counting queries to accuracy α\alpha needs Ω(dlog(1/δ)logQεα2)\Omega\left( \frac{\sqrt{d \log(1/\delta)} \log Q}{\varepsilon \alpha^2} \right) samples. Our framework allows for directly proving this bound and improves by log(1/δ)\sqrt{\log(1/\delta)} the bound proved by Bun, Ullman and Vadhan (2013) using composition. Thirdly, we characterize the sample complexity of answering a set of random 0-1 queries under approximate differential privacy. To achieve this, we give new upper and lower bounds that combined with existing bounds allow us to complete the picture.

Figure 1: Behavior of sample complexity vs. error trade-off for dd random linear queries (left) and worst-case queries (right) over a universe X\mathcal{X} (log\log-log\log scale). The sample complexity for random queries is discontinuous at αlogXd\alpha \approx \frac{\sqrt{\log |\mathcal{X}|}}{\sqrt{d}}. The dependence on the privacy parameters and logd\log d terms are suppressed for clarity.

Related readings and updates.

Private Stochastic Convex Optimization: Optimal Rates in ℓ1 Geometry

Stochastic convex optimization over an ℓ1ℓ_1ℓ1​-bounded domain is ubiquitous in machine learning applications such as LASSO but remains poorly understood when learning with differential privacy. We show that, up to logarithmic factors the optimal excess population loss of any (ε,δ)(\varepsilon, \delta)(ε,δ)-differentially private optimizer is log⁡(d)/n  +\sqrt{\log(d)/n}\; +log(d)/n​+ d/εn.\sqrt{d}/\varepsilon n.d​/εn. The upper bound is based on…
See paper details

Lower Bounds for Locally Private Estimation via Communication Complexity

We develop lower bounds for estimation under local privacy constraints—including differential privacy and its relaxations to approximate or Rényi differential privacy—by showing an equivalence between private estimation and communication-restricted estimation problems. Our results apply to arbitrarily interactive privacy mechanisms, and they also give sharp lower bounds for all levels of differential privacy protections, that is, privacy…
See paper details