The study of differentially private PAC learning runs all the way from
its introduction in 2008 [KLNRS08] to a best paper award at the
Symposium on Foundations of Computer Science (FOCS) this year [BLM20].
In this post, we’ll recap the history of this line of work, aiming for
enough detail for a rough understanding of the results and methods.
Before we get to the “what” and “how” of private PAC learning, it’s
worth thinking about the “why”. One motivation for this line of work is
that it neatly captures a fundamental question: does privacy in machine
learning come at a price? Machine learning is now sufficiently
successful and widespread for this question to have real import. But to
even start to address this question, we need a formalization of machine
learning that allows us to reason about possible tradeoffs in a
rigorous way. Statistical learning theory, and its computational
formalization as PAC learning, provide one such clean and wellstudied
model. We can therefore use PAC learning as a testbed whose insights we
might carry to other less idealized forms of learning.
With this motivation in mind, the rest of this post is structured as
follows. The first section covers the basics of the PAC model, and
subsequent sections gradually build up a chronology of results. When
possible, we give short sketches of the accompanying techniques.
PAC Learning
We’ll start with a brief overview of PAC learning absent any privacy
restrictions. Readers familiar with PAC learning can probably skip this
section while noting that

(the cardinality version of) Occam’s razor is a baseline learner
using \(O(\log\mathcal{H})\) samples,

VC dimension characterizes nonprivate PAC learning,

we’ll focus on the sample complexity of realizable PAC learning,

we’ll usually omit dependencies on accuracy and success probability
parameters, and

we’ll usually ignore computational efficiency.
For readers needing a refresher on PAC learning, the basic element of
the “probably approximately correct” (PAC) framework [Val84] is a
hypothesis. Each hypothesis is a function
\(h \colon \mathcal{X}\to \{1,1\}\) mapping examples from some space
\(\mathcal{X}\) to binary labels. A collection of hypotheses is a
hypothesis class \(\mathcal{H}\), e.g., thresholds (a.k.a. perceptrons),
rectangles, conjunctions, and so on. In the realizable setting, a
learner receives examples drawn from some unknown distribution and
labeled by an unknown \(h^\ast \in \mathcal{H}\). The learner’s goal is to
with high probability (“probably”) output a hypothesis that mostly
matches the labels of \(h^\ast\) on future examples from the unknown example
distribution (“approximately correct”). In the agnostic setting,
examples are not necessarily labeled by any \(h
\in \mathcal{H}\), and the goal is only to output a hypothesis that
approximates the best error of any hypothesis from \(\mathcal{H}\). As
mentioned above, we focus on the realizable setting unless otherwise
specified. In the proper setting, the learner must output a hypothesis
from \(\mathcal{H}\) itself. In the improper setting, this requirement
is removed.
In general, we say an algorithm \((\alpha,\beta)\)PAC learns
\(\mathcal{H}\) with sample complexity \(n\) if \(n\) samples are sufficient
to with probability at least \(1\beta\) obtain error at most \(\alpha\)
over new examples from the distribution. For the purposes of this post,
we generally omit these dependencies on \(\alpha\) and \(\beta\), as they
typically vary little or not at all when switching between nonprivate
and private PAC learning.
Fortunately, we always have a simple baseline learner based on empirical
risk minimization: given a set of labeled examples, iterate over all
hypotheses \(h \in \mathcal{H}\), check how many of the labeled examples
each \(h\) mislabels, and output a hypothesis that mislabels the fewest
examples. Using this learner, which is sometimes called “Occam’s razor,”
\(O(\log\mathcal{H})\) samples suffice to PAC learn \(\mathcal{H}\).
At the same time, \(\mathcal{H}\) is a pretty coarse measure of
hypothesis class complexity, as it would immediately rule out learning
any infinite hypothesis class (of which there are many). Thus, as you
might expect, we can do better. We do so using VC dimension.
\(\mathsf{VCD}\left(\mathcal{H}\right)\) is the size of the largest
possible collection of examples such that, for every labeling of the
examples, \(\mathcal{H}\) contains a hypothesis with that labeling. With
VC dimension, we can essentially swap \(\log\mathcal{H}\) with
\(\mathsf{VCD}\left(\mathcal{H}\right)\) in the Occam’s razor bound and
PAC learn with \(O(\mathsf{VCD}\left(\mathcal{H}\right))\) samples. In
fact, the “Fundamental Theorem of Statistical Learning” says that PAC
learnability (realizable or agnostic) is equivalent to finite VC
dimension. In this sense, \(\mathsf{VCD}\left(\mathcal{H}\right)\) is a
good measure of how hard it is to PAC learn \(\mathcal{H}\). As a
motivating example that will reappear later, note that for the
hypothesis class of 1dimensional thresholds over \(T\) points,
\(\log \mathcal{H} = \log T\), while
\(\mathsf{VCD}\left(\mathcal{H}\right)\) is only 1.
An illustration of 1dimensional thresholds. A given threshold is determined by some point \(x^\ast \in [T]\): any example \(x \leq x^\ast\) receives label \(1\), and any example \(x > x^\ast\) receives label 1.
A Simple Private PAC Learner
It is straightforward to add a differential privacy constraint to the
PAC framework: the hypothesis output by the learner must be a
differentially private function of the labeled examples
\((x_1, y_1), \ldots, (x_n, y_n)\). That is, changing any one of the
examples — even to one with an inconsistent label — must not affect
the distribution over hypotheses output by the learner by too much.
Since we haven’t talked about any other PAC learner, we may as well
start with the empirical risk minimizationstyle Occam’s razor discussed
in the previous section, which simply selects a hypothesis that
minimizes empirical error. A private version becomes easy if we view
this algorithm in the right light. All it is doing is assigning a score
to each possible output (the hypothesis’ empirical error) and outputting
one with the best (lowest) score. This makes it a good candidate for
privatization by the exponential mechanism [MT07].
Recall that the exponential mechanism uses a scoring function over
outputs to release better outputs with higher probability, subject to
the privacy constraint. More formally, the exponential mechanism
requires a scoring function \(u(X,h)\) mapping (database, output) pairs to
realvalued scores and then selects a given output \(h\) with probability
proportional to \(\exp\left(\tfrac{\varepsilon
u(X,h)}{2\Delta(u)}\right)\). Thus a lower \(\varepsilon\) (stricter
privacy requirement) and larger \(\Delta(u) := \sup_h \sup_{X \sim X’} u(X,h)  u(X’,h) \) (scoring function more sensitive to changing one element in the database \(X\) to make \(X’\)) both lead to a more uniform (more
private) output distribution.
Fortunately for our PAC learning setting, empirical error is not a very
sensitive scoring function: changing one sample only changes empirical
error by 1. We can therefore use (negative) empirical error as our
scoring function \(u(X,h)\), apply the exponential mechanism, and get a
“private Occam’s razor.” This was exactly what Kasiviswanathan, Lee,
Nissim, Raskhodnikova, and Smith [KLNRS08] did when they introduced
differentially private PAC learning in 2008. The resulting sample
complexity bounds differ from the generic Occam’s razor only by an
\(\varepsilon\) factor in the denominator, and
\(O(\log\mathcal{H}/\varepsilon)\) samples suffice to privately PAC
learn \(\mathcal{H}\).
Of course, our experience with nonprivate PAC learning suggests that we
shouldn’t be satisfied with this \(\log
\mathcal{H}\) dependence. Maybe VC dimension characterizes private PAC
learning, too?
Characterizing Pure Private PAC Learning
As it turns out, answering this question will take some time. We start
with a partial negative answer. Specifically, we’ll see a class with VC
dimension 1 and (a restricted form of) private sample complexity
arbitrarily larger than 1. We’ll also cover the first in a line of
characterization results for private PAC learning.
We first consider learners that satisfy pure privacy. Recall that pure
\((\varepsilon,0)\)differential privacy forces output distributions that
may only differ by a certain \(e^\varepsilon\) multiplicative factor (like
the exponential mechanism above). The strictly weaker notion of
approximate \((\varepsilon,\delta)\)differential privacy also allows a
small additive \(\delta\) factor. Second, we restrict ourselves to
proper learners, which may only output hypotheses from the learned
class \(\mathcal{H}\).
With these assumptions in place, in 2010, Beimel, Kasiviswanathan, and
Nissim [BKN10] studied a hypothesis class called \(\mathsf{Point}_d\).
\(\mathsf{Point}_d\) consists of \(2^d\) hypotheses, one for each vector in
\(\{0,1\}^d\). Taking the set of examples \(\mathcal{X}\) to be \(\{0,1\}^d\)
as well, we define each hypothesis in \(\mathsf{Point}_d\) to label only
its associated vector as 1, and the remaining \(2^d1\) examples as
1. [BKN10] showed that the hypothesis class \(\mathsf{Point}_d\) requires
\(\Omega(d)\) samples for proper pure private PAC learning. In contrast,
\(\mathsf{VCD}\left(\mathsf{Point}_d\right) = 1\), so this \(\Omega(d)\)
lower bound shows us that VC dimension does not characterize proper
pure private PAC learning.
This result uses the classic “packing” lower bound method, which powers
many lower bounds for pure differential privacy. The general packing
method is to first construct a large collection of databases which are
all “close enough” to each other but nonetheless all have different
“good” outputs. Once we have such a collection, we use group privacy.
Group privacy is a corollary of differential privacy that requires
databases differing in \(k\) elements to have \(k\varepsilon\)close output
distributions. Because of group privacy, if we start with a collection
of databases that are close together, then the output distributions for
any two databases in the collection cannot be too different. This
creates a tension: utility forces the algorithm to produce different
output distributions for different databases, but privacy forces
similarity. The packing argument comes down to arguing that, unless the
databases are large, privacy wins out, and when privacy wins out then
there is some database where the algorithm probably produces a bad
output.
For \(\mathsf{Point}_d\), we sketch the resulting argument as follows.
Suppose we have an \(\varepsilon\)private PAC learner that uses \(m\)
samples. Then we can define a collection of different databases of size
\(m\), one for each hypothesis in \(\mathsf{Point}_d\). By group privacy,
the output distribution for our private PAC learner changes by at most
\(e^{m\varepsilon}\) between any two of the databases in this collection.
Thus we can pick any \(h \in \mathsf{Point}_d\) and know that the
probability of outputting the wrong hypothesis is at least roughly
\(2^d \cdot e^{m\varepsilon}\). Since we need this probability to be
small, rearranging implies \(m =
\Omega(d/\varepsilon)\).
[BKN10] then contrasted this result with an improper pure private PAC
learner. This learner applies the exponential mechanism to a class
\(\mathsf{Point}_d’\) of hypotheses derived from \(\mathsf{Point}_d\) —
but not necessarily a subset of \(\mathsf{Point}_d\) — gives an
improper pure private PAC learner with sample complexity \(O(\log
d)\). Since this learner is improper, it circumvents the “one database
per hypothesis” step of the packing lower bound. Moreover, [BKN10] gave a
still more involved improper pure private PAC learner requiring only
\(O(1)\) samples. This separates proper pure private PAC learning from
improper pure private PAC learning. In contrast, the sample complexities
of proper and improper PAC learning absent privacy are the same up to
logarithmic factors in \(\alpha\) and \(\beta\).
In 2013, Beimel, Nissim, and Stemmer [BNS13] proved a more general
result. They gave the first characterization of pure (improper) private
PAC learning by defining a new hypothesis class measure called the
representation dimension, \(\mathsf{REPD}\left(\mathcal{H}\right)\).
Roughly, the representation dimension considers the collection of all
distributions \(\mathcal{D}\) over sets of hypotheses, not necessarily
from \(\mathcal{H}\), that “cover” \(\mathcal{H}\). By “cover,” we mean that
for any \(h
\in \mathcal{H}\), with high probability a set drawn from covering
distribution \(\mathcal{D}\) includes a hypothesis that mostly produces
labels that agree with \(h\). With this collection of distributions
defined, \(\mathsf{REPD}\left(\mathcal{H}\right)\) is the minimum over all
such covering distributions of the logarithm of the size of the largest
set in its support. Thus a hypothesis class that can be covered by a
distribution over small sets of hypotheses will have a small
representation dimension. With the notion of representation dimension in
hand, [BNS13] gave the following result:
Theorem 1 ([BNS13]). The sample complexity to pure private PAC learn \(\mathcal{H}\) is \(\Theta(\mathsf{REPD}\left(\mathcal{H}\right))\).
Representation dimension may seem like a strange definition, but a
sketch of the proof of this result helps illustrate the connection to
private learning. Recall from our private Occam’s razor, and the
improper pure private PAC learner above, that if we can find a good and
relatively small set of hypotheses to choose from, then we can apply the
exponential mechanism and call it a day. It is exactly this kind of
“good set of hypotheses” that representation dimension aims to capture.
A little more formally, given an upper bound on
\(\mathsf{REPD}\left(\mathcal{H}\right)\), we know there is some covering
distribution whose largest hypothesis set is not too big. That means we
can construct a learner that draws a hypothesis set from this covering
distribution and applies the exponential mechanism to it. Just as we
picked up a \(\log\mathcal{H}\) sample complexity dependence using
private Occam’s razor, since \(\mathsf{REPD}\left(\mathcal{H}\right)\)
measures the logarithm of the size of the largest hypothesis set in the
support, this pure private learner picks up a
\(\mathsf{REPD}\left(\mathcal{H}\right)\) sample complexity dependence
here. This gives us one direction of
Theorem 1.
This logic works in the other direction as well. To go from a pure
private PAC learner with sample complexity \(m\) to an upper bound on
\(\mathsf{REPD}\left(\mathcal{H}\right)\), we return to the group privacy
trick used by [BKN10]. Suppose we fix a database of size \(m\) and pass it
to the learner. By group privacy and the learner’s accuracy guarantee,
if we fix some concept \(c\), the learner has probability at least roughly
\(e^{m}\) of outputting a hypothesis that mostly agrees with \(c\). Thus if
we repeat this process roughly \(e^{m}\) times, we probably get at least
one hypothesis that mostly agrees with \(c\). In other words, this
repeated calling of the learner on the arbitrary database yields a
covering distribution for \(\mathcal{H}\). Since we called the learner
approximately \(e^m\) times, the logarithm of this is \(m\), and we get our
upper bound on \(\mathsf{REPD}\left(\mathcal{H}\right)\).
To recap, we now know that proper pure private PAC learning is strictly
harder than improper pure private PAC learning, which is characterized
by representation dimension. A picture sums it up. Note the dotted line,
since we don’t yet have any evidence separating finite representation
dimension and finite VC dimension.
Separating Pure and Approximate Private PAC Learning
So far, we’ve focused only on pure privacy. In this section, we move on
to the first separations between pure and approximate private PAC
learning, as well as the first connection between private learning and
online learning.
Our source is a pair of interconnected papers from around 2014. Among
other things, Feldman and Xiao [FX14] introduced Littlestone
dimension to private PAC learning. By connecting representation
dimension to results from communication complexity to Littlestone
dimension, they proved the following:
Theorem 2 ([FX14]). The sample complexity to pure private PAC learn \(\mathcal{H}\) is \(\Omega(\mathsf{LD}\left(\mathcal{H}\right))\).
Littlestone dimension \(\mathsf{LD}\left(\mathcal{H}\right)\) is, roughly,
the maximum number of mistakes an adversary can force an online
PAClearning algorithm to make [Lit88]. We always have
\(\mathsf{VCD}\left(\mathcal{H}\right) \leq \mathsf{LD}\left(\mathcal{H}\right) \leq \log\mathcal{H}\),
but these inequalities can be strict. For example, denoting by
\(\mathsf{Thresh_T}\) the class of thresholds over \(\{1, 2, \ldots,
T\}\), since an adversary can force \(\Theta(\log T)\) wrong answers from
an online learner binary searching over \(\{1,2, \ldots, T\}\),
\(\mathsf{LD}\left(\mathsf{Thresh_T}\right) = \Omega(\log T)\). In
contrast, \(\mathsf{VCD}\left(\mathsf{Thresh_T}\right) = 1\).
At first glance it’s not obvious what
Theorem 2 adds over
Theorem 1. After all,
Theorem 1 gives an equivalence, not just a lower bound. One
advantage of
Theorem 2 is that Littlestone dimension is a known
quantity that has already been studied in its own right. We can now
import results like the lower bound on
\(\mathsf{LD}\left(\mathsf{Thresh_T}\right)\), whereas bounds on
\(\mathsf{REPD}\left(\cdot\right)\) are not common. A second advantage is
that Littlestone dimension conceptually connects private learning and
online learning: we now know that pure private PAC learning is no easier
than online PAC learning.
A second paper by Beimel, Nissim, and Stemmer [BNS13b] contrasted this
\(\Omega(\log T)\) lower bound for pure private learning of thresholds
with a \(2^{O(\log^\ast T)}\) upper bound for approximate private PAC
learning \(\mathsf{Thresh_T}\). Here \(\log^\ast\) denotes the very
slowgrowing iterated logarithm, the number of times we must take the
logarithm of the argument to bring it \(\leq 1\). (We’re not kidding about
“very slowgrowing” either:
\(\log^\ast(\text{number of atoms in universe}) \approx
4\).) With Feldman and Xiao’s result, this separates pure private PAC
learning from approximate private PAC learning. It also shows that
representation dimension does not characterize approximate private PAC
learning.
At the same time, Feldman and Xiao observed that the connection between
pure private PAC learning and Littlestone dimension is imperfect. Again
borrowing results from communication complexity, they observed that the
hypothesis class \(\mathsf{Line_p}\) (which we won’t define here) has
\(\mathsf{LD}\left(\mathsf{Line_p}\right) = 2\) but
\(\mathsf{REPD}\left(\mathsf{Line_p}\right)
= \Theta(\log(p))\). In contrast, they showed that an approximate
private PAC learner can learn \(\mathsf{Line_p}\) using
\(O\left(\tfrac{\log(1/\beta)}{\alpha}\right)\) samples. Since this
entails no dependence on \(p\) at all, it improves the separation between
pure and approximate private PAC learning given by [BNS13b].
Let’s pause to recap what’s happened so far. We learned in the last
section that representation dimension characterizes pure private PAC
learning [BNS13]. We learned in this section that Littlestone dimension
gives lower bounds for pure private PAC learning but, as shown by
\(\mathsf{Line_p}\), these bounds are sometimes quite loose [FX14].
\(\mathsf{Thresh_T}\) shows that representation dimension does not
characterize approximate private PAC learning [FX14
; BNS13b], and we
still have no privacyspecific lower bounds for approximate private
learners. So the picture now looks like this:
In particular, we might still find that VC dimension characterizes
approximate private PAC learning!
Lower Bounds for Approximate Private PAC Learning
We now dash this hope. In 2015, Bun, Nissim, Stemmer, and
Vadhan [BNSV15] gave the first nontrivial lower bound for approximate
private PAC learning. They showed that learning \(\mathsf{Thresh_T}\) has
proper approximate private sample complexity \(\Omega(\log^\ast(T))\) and
\(O(2^{\log^\ast(T)})\).
We’ll at least try to give some intuition for the presence of \(\log^\ast\)
in the lower bound. Informally, the lower bound relies on an inductive
construction of a sequence of hard problems for databases of size
\(n=1, 2,
\ldots\). The \(k^{th}\) hard problem relies on a distribution over
databases of size \(k\) whose data universe is of of size exponential in
the size of the data universe for the \((k1)^{th}\) distribution. The
base case is the uniform distribution over the two singleton databases
\(\{0\}\) and \(\{1\}\), and they show how to inductively construct
successive problems such that a solution for the \(k^{th}\) problem
implies a solution for the \((k1)^{th}\) problem. Unraveling the
recursive relationship between the problem domain sizes implies a
general lower bound of roughly \(\log^\astX\) for domain \(X\).
The inclusion of \(\log^\ast\) makes this is an extremely mild lower bound.
However, \(\log^\ast(T)\) can still be arbitrarily larger than 1, so this is
the first definitive evidence that proper approximate privacy introduces
a cost over nonprivate PAC learning.
In 2018, Alon, Livni, Malliaris, and Moran [ALMM19] extended this
\(\Omega(\log^\ast T)\) lower bound for \(\mathsf{Thresh_T}\) to improper
approximate privacy. More generally, they gave concrete evidence for the
importance of thresholds, which have played a seemingly outsize role in
the work so far. They did so by relating a class’ Littlestone dimension
to its ability to “contain” thresholds. Here, we say \(\mathcal{H}\)
“contains” \(m\) thresholds if there exist \(m\) (unlabeled) examples
\(x_1,\ldots,x_m\) and hypotheses \(h_1, \ldots, h_m \in \mathcal{H}\) such
that the hypotheses “behave like” thresholds on the \(m\) examples, i.e.,
\(h_i(x_j) = 1 \Leftrightarrow j \geq
i\). With this language, they imported a result from model theory to show
that any hypothesis class \(\mathcal{H}\) contains
\(\log(\mathsf{LD}\left(\mathcal{H}\right))\) thresholds. This implies
that learning \(\mathcal{H}\) is at least as hard as learning
\(\mathsf{Thresh_T}\) with
\(T = \log(\mathsf{LD}\left(\mathcal{H}\right))\). Since
\(\log^\ast(\log(\mathsf{LD}\left(\mathcal{H}\right)))
= \Omega(\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))\), combining these
two results puts the following limit on private PAC learning:
Theorem 3 ([ALMM19]). The sample complexity to approximate private PAC learn \(\mathcal{H}\) is \(\Omega(\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))\).
Littlestone dimension characterizes online PAC learning, so we now know
that online PAC learnability is necessary for private PAC learnability.
Sufficiency, however, remains an open question. This produces the
following picture, where the dotted line captures the question of
sufficiency.
Characterizing Approximate Private PAC Learning
Spurred by this question, several advances in private PAC learning have
appeared in the last year. First, Gonen, Hazan, and Moran strengthened
Theorem 3 by giving a constructive method for converting
pure private learners to online learners [GHM19]. Their result
reaches back to the 2013 characterization of pure private learning in
terms of representation dimension by using the covering distribution to
generate a collection of “experts” for online learning. Again revisiting
\(\mathsf{Thresh_T}\), Kaplan, Ligett, Mansour, Naor, and
Stemmer [KLMNS20] significantly reduced the \(O(2^{\log^\ast(T)})\) upper
bound of [BNSV15] to just \(O((\log^\ast(T))^{1.5})\). And Alon, Beimel,
Moran, and Stemmer [ABMS20] justified this post’s focus on realizable
private PAC learning by giving a transformation from a realizable
approximate private PAC learner to an agnostic one at the cost of
slightly worse privacy and sample complexity. This built on an earlier
transformation that only applied to proper learners [BNS15].
Finally, Bun, Livni, and Moran [BLM20] answered the open question posed
by [ALMM19]:
Theorem 4 ([BLM20]). The sample complexity to approximate private PAC learn \(\mathcal{H}\) is \(2^{O({\mathsf{LD}\left(\mathcal{H}\right)})}\).
To prove this, [BLM20] introduced the notion of a globally stable
learner and showed how to convert an online learner to a globally stable
learner to a private learner. Thus, combined with the result of [ALMM19],
we now know that the sample complexity of private PAC learning any
\(\mathcal{H}\) is at least
\(\Omega(\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))\) and at most
\(2^{O({\mathsf{LD}\left(\mathcal{H}\right)})}\). In this sense, online
learnability characterizes private learnability.
Narrowing the gap between the lower and upper bounds above is an open
question. Note that we cannot hope to close the gap completely. For the
lower bound, the current \(\mathsf{Thresh_T}\) upper bound implies that no
general lower bound can be stronger than
\(\Omega((\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))^{1.5})\). For the
upper bound, there exist hypotheses classes \(\mathcal{H}\) with
\(\mathsf{VCD}\left(\mathcal{H}\right) = \mathsf{LD}\left(\mathcal{H}\right)\)
(e.g., \(\mathsf{VCD}\left(\mathsf{Point}_d\right) = \mathsf{LD}\left(\mathsf{Point}_d\right)= 1\)), so since nonprivate PAC learning requires
\(\Omega(\mathsf{VCD}\left(\mathcal{H}\right))\) samples, the best
possible private PAC learning upper bound is
\(O(\mathsf{LD}\left(\mathcal{H}\right))\). Nevertheless, proving either
bound remains open.
Conclusion
This concludes our post, and with it our discussion of this fundamental
question: the price of privacy in machine learning. We now know that in
the PAC model, proper pure private learning, improper pure private
learning, approximate private learning, and nonprivate learning are all
strongly separated. By the connection to Littlestone dimension, we also
know that approximate private learnability is equivalent to online
learnability. However, many questions about computational efficiency and
tight sample complexity bounds remain open.
As mentioned in the introduction, we focused on the clean yet widely
studied and influential model of PAC learning. Having characterized how
privacy enters the picture in PAC learning, we can hopefully convey this
understanding to other models of learning, and now approach these
questions from a rigorous and grounded point of view.
Congratulations to Mark Bun, Roi Livni, and Shay Moran on their best
paper award — and to the many individuals who paved the way before
them!
Acknowledgments
Thanks to Kareem Amin and Clément Canonne for helpful feedback while
writing this post.