Theory of Computing Blog Aggregator

Authors: Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian
Download: PDF
Abstract: We develop primal-dual coordinate methods for solving bilinear saddle-point problems of the form $\min_{x \in \mathcal{X}} \max_{y\in\mathcal{Y}} y^\top A x$ which contain linear programming, classification, and regression as special cases. Our methods push existing fully stochastic sublinear methods and variance-reduced methods towards their limits in terms of per-iteration complexity and sample complexity. We obtain nearly-constant per-iteration complexity by designing efficient data structures leveraging Taylor approximations to the exponential and a binomial heap. We improve sample complexity via low-variance gradient estimators using dynamic sampling distributions that depend on both the iterates and the magnitude of the matrix entries.

Our runtime bounds improve upon those of existing primal-dual methods by a factor depending on sparsity measures of the $m$ by $n$ matrix $A$. For example, when rows and columns have constant $\ell_1/\ell_2$ norm ratios, we offer improvements by a factor of $m+n$ in the fully stochastic setting and $\sqrt{m+n}$ in the variance-reduced setting. We apply our methods to computational geometry problems, i.e. minimum enclosing ball, maximum inscribed ball, and linear regression, and obtain improved complexity bounds. For linear regression with an elementwise nonnegative matrix, our guarantees improve on exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/(m+n)}$.

at September 18, 2020 12:00 AM UTC

Authors: Jakub Łącki, Yasamin Nazari
Download: PDF
Abstract: We provide new algorithms for maintaining approximate distances in a weighted undirected graph $G = (V, E)$ subject to edge deletions. Our first result is an algorithm that maintains $(1+\epsilon)$-approximate distances from a set of $s$ sources in $\tilde{O}(sm)$ total update time, assuming that $s= n^{\Omega(1)}$, $\epsilon = \Omega(1)$ and $|E|= n^{1+\Omega(1)}$.

This matches the best known static algorithm, up to polylogarithmic factors for a wide range of settings.

The currently best known algorithm for the problem is obtained by running the single-source algorithm of [Henzinger, Krinninger and Nanongkai, FOCS'14] independently from each source. Our result improves over the update time bound of this solution by removing a $2^{\tilde{O}(\log^{3/4} n)}$ factor. Additionally, we can maintain a $(1+\epsilon)$-approximate single-source shortest paths with amortized update time of $2^{\tilde{O}(\sqrt{\log n})}$, when $0< \epsilon<1$ is a constant and $|E|= n2^{\tilde{\Omega}(\sqrt{\log n})}$. This improves over the best known update time of $2^{\tilde{O}(\log^{3/4} n)}$

by [Henzinger, Krinninger and Nanongkai, FOCS'14].

Furthermore, for any integer $k \geq 1$ we give an algorithm for maintaining $(2k-1)(1+\epsilon)$-approximate all-pairs-shortest-paths, in $\tilde{O}(mn^{1/k})$ total update time and $O(k)$ query time\footnote{Throughout this paper we use the notation $\tilde{O}(f(n))$ to hide factors of $O(\text{polylog } (f(n)))$.}.

This improves over the result of [Chechik, FOCS'18] in a twofold way. Namely, we improve the total update time bound by removing an $n^{o(1)}$ factor and reduce the query time from $O(\log \log (nW))$ to $O(k)$.

Our results are based on a new decremental hopset construction that may be of independent interest.

at September 18, 2020 12:00 AM UTC

Authors: Adam Izdebski, Ronald de Wolf
Download: PDF
Abstract: Boosting is a general method to convert a weak learner (which generates hypotheses that are just slightly better than random) into a strong learner (which generates hypotheses that are much better than random). Recently, Arunachalam and Maity gave the first quantum improvement for boosting, by combining Freund and Schapire's AdaBoost algorithm with a quantum algorithm for approximate counting. Their booster is faster than classical boosting as a function of the VC-dimension of the weak learner's hypothesis class, but worse as a function of the quality of the weak learner. In this paper we give a substantially faster and simpler quantum boosting algorithm, based on Servedio's SmoothBoost algorithm.

at September 18, 2020 12:00 AM UTC

Authors: Hubie Chen, Bart M. P. Jansen, Karolina Okrasa, Astrid Pieterse, Paweł Rzążewski
Download: PDF
Abstract: We investigate the List $H$-Coloring problem, the generalization of graph coloring that asks whether an input graph $G$ admits a homomorphism to the undirected graph $H$ (possibly with loops), such that each vertex $v \in V(G)$ is mapped to a vertex on its list $L(v) \subseteq V(H)$. An important result by Feder, Hell, and Huang [JGT 2003] states that List $H$-Coloring is polynomial-time solvable if $H$ is a so-called bi-arc graph, and NP-complete otherwise. We investigate the NP-complete cases of the problem from the perspective of polynomial-time sparsification: can an $n$-vertex instance be efficiently reduced to an equivalent instance of bitsize $O(n^{2-\varepsilon})$ for some $\varepsilon > 0$? We prove that if $H$ is not a bi-arc graph, then List $H$-Coloring does not admit such a sparsification algorithm unless $NP \subseteq coNP/poly$. Our proofs combine techniques from kernelization lower bounds with a study of the structure of graphs $H$ which are not bi-arc graphs.

at September 18, 2020 01:20 AM UTC

Authors: Sjoerd Dirksen, Alexander Stollenwerk
Download: PDF
Abstract: We consider the problem of encoding a set of vectors into a minimal number of bits while preserving information on their Euclidean geometry. We show that this task can be accomplished by applying a Johnson-Lindenstrauss embedding and subsequently binarizing each vector by comparing each entry of the vector to a uniformly random threshold. Using this simple construction we produce two encodings of a dataset such that one can query Euclidean information for a pair of points using a small number of bit operations up to a desired additive error - Euclidean distances in the first case and inner products and squared Euclidean distances in the second. In the latter case, each point is encoded in near-linear time. The number of bits required for these encodings is quantified in terms of two natural complexity parameters of the dataset - its covering numbers and localized Gaussian complexity - and shown to be near-optimal.

at September 18, 2020 12:00 AM UTC

Authors: Gary P. T. Choi
Download: PDF
Abstract: Conformal mapping, a classical topic in complex analysis and differential geometry, has become a subject of great interest in the area of surface parameterization in recent decades with various applications in science and engineering. However, most of the existing conformal parameterization algorithms only focus on simply-connected surfaces and cannot be directly applied to surfaces with holes. In this work, we propose two novel algorithms for computing the conformal parameterization of multiply-connected surfaces. We first develop an efficient method for conformally parameterizing an open surface with one hole to an annulus on the plane. Based on this method, we then develop an efficient method for conformally parameterizing an open surface with $k$ holes onto a unit disk with $k$ circular holes. The conformality and bijectivity of the mappings are ensured by quasi-conformal theory. Numerical experiments and applications are presented to demonstrate the effectiveness of the proposed methods.

at September 18, 2020 12:00 AM UTC

Authors: Sébastien Bubeck, Niv Buchbinder, Christian Coester, Mark Sellke
Download: PDF
Abstract: We consider a generalization of the fundamental online metrical service systems (MSS) problem where the feasible region can be transformed between requests. In this problem, which we call T-MSS, an algorithm maintains a point in a metric space and has to serve a sequence of requests. Each request is a map (transformation) $f_t\colon A_t\to B_t$ between subsets $A_t$ and $B_t$ of the metric space. To serve it, the algorithm has to go to a point $a_t\in A_t$, paying the distance from its previous position. Then, the transformation is applied, modifying the algorithm's state to $f_t(a_t)$. Such transformations can model, e.g., changes to the environment that are outside of an algorithm's control, and we therefore do not charge any additional cost to the algorithm when the transformation is applied. The transformations also allow to model requests occurring in the $k$-taxi problem.

We show that for $\alpha$-Lipschitz transformations, the competitive ratio is $\Theta(\alpha)^{n-2}$ on $n$-point metrics. Here, the upper bound is achieved by a deterministic algorithm and the lower bound holds even for randomized algorithms. For the $k$-taxi problem, we prove a competitive ratio of $\tilde O((n\log k)^2)$. For chasing convex bodies, we show that even with contracting transformations no competitive algorithm exists.

The problem T-MSS has a striking connection to the following deep mathematical question: Given a finite metric space $M$, what is the required cardinality of an extension $\hat M\supseteq M$ where each partial isometry on $M$ extends to an automorphism? We give partial answers for special cases.

at September 18, 2020 12:00 AM UTC

Authors: Fernando G. S. L. Brandão, Richard Kueng, Daniel Stilck França
Download: PDF
Abstract: Quantum state tomography is a powerful, but resource-intensive, general solution for numerous quantum information processing tasks. This motivates the design of robust tomography procedures that use relevant resources as sparingly as possible. Important cost factors include the number of state copies and measurement settings, as well as classical postprocessing time and memory. In this work, we present and analyze an online tomography algorithm that is designed to optimize all the aforementioned resources at the cost of a worse dependence on accuracy. The protocol is the first to give optimal performance in terms of rank and dimension for state copies, measurement settings and memory. Classical runtime is also reduced substantially. Further improvements are possible by executing the algorithm on a quantum computer, giving a quantum speedup for quantum state tomography.

at September 18, 2020 12:00 AM UTC

Authors: Keren Censor-Hillel, Victor I. Kolobov, Gregory Schwartzman
Download: PDF
Abstract: In this paper we consider the fundamental problem of finding subgraphs in highly dynamic distributed networks - networks which allow an arbitrary number of links to be inserted / deleted per round. We show that the problems of $k$-clique membership listing (for any $k\geq 3$), 4-cycle listing and 5-cycle listing can be deterministically solved in $O(1)$-amortized round complexity, even with limited logarithmic-sized messages.

To achieve $k$-clique membership listing we introduce a very useful combinatorial structure which we name the robust $2$-hop neighborhood. This is a subset of the 2-hop neighborhood of a node, and we prove that it can be maintained in highly dynamic networks in $O(1)$-amortized rounds. We also show that maintaining the actual 2-hop neighborhood of a node requires near linear amortized time, showing the necessity of our definition. For $4$-cycle and $5$-cycle listing, we need edges within hop distance 3, for which we similarly define the robust $3$-hop neighborhood and prove it can be maintained in highly dynamic networks in $O(1)$-amortized rounds.

We complement the above with several impossibility results. We show that membership listing of any other graph on $k\geq 3$ nodes except $k$-clique requires an almost linear number of amortized communication rounds. We also show that $k$-cycle listing for $k\geq 6$ requires $\Omega(\sqrt{n} / \log n)$ amortized rounds. This, combined with our upper bounds, paints a detailed picture of the complexity landscape for ultra fast graph finding algorithms in this highly dynamic environment.

at September 18, 2020 12:00 AM UTC

Authors: Dorit S. Hochbaum, Asaf Levin, Xu Rao
Download: PDF
Abstract: We study here several variants of the covariates fine balance problem where we generalize some of these problems and introduce a number of others. We present here a comprehensive complexity study of the covariates problems providing polynomial time algorithms, or a proof of NP-hardness. The polynomial time algorithms described are mostly combinatorial and rely on network flow techniques. In addition we present several fixed-parameter tractable results for problems where the number of covariates and the number of levels of each covariate are seen as a parameter.

at September 18, 2020 12:00 AM UTC

Authors: Carl Einarson, Gregory Gutin, Bart M. P. Jansen, Diptapriyo Majumdar, Magnus Wahlstrom
Download: PDF
Abstract: We introduce and study two natural generalizations of the Connected VertexCover (VC) problem: the $p$-Edge-Connected and $p$-Vertex-Connected VC problem (where $p \geq 2$ is a fixed integer). Like Connected VC, both new VC problems are FPT, but do not admit a polynomial kernel unless $NP \subseteq coNP/poly$, which is highly unlikely. We prove however that both problems admit time efficient polynomial sized approximate kernelization schemes. We obtain an $O(2^{O(pk)}n^{O(1)})$-time algorithm for the $p$-Edge-Connected VC and an $O(2^{O(k^2)}n^{O(1)})$-time algorithm for the $p$-Vertex-Connected VC. Finally, we describe a $2(p+1)$-approximation algorithm for the $p$-Edge-Connected VC. The proofs for the new VC problems require more sophisticated arguments than for Connected VC. In particular, for the approximation algorithm we use Gomory-Hu trees and for the approximate kernels a result on small-size spanning $p$-vertex/edge-connected subgraph of a $p$-vertex/edge-connected graph obtained independently by Nishizeki and Poljak (1994) and Nagamochi and Ibaraki (1992).

at September 18, 2020 12:00 AM UTC

Authors: Jackson Abascal, Venkatesan Guruswami, Pravesh K. Kothari
Download: PDF
Abstract: We give an efficient algorithm to strongly refute \emph{semi-random} instances of all Boolean constraint satisfaction problems. The number of constraints required by our algorithm matches (up to polylogarithmic factors) the best-known bounds for efficient refutation of fully random instances. Our main technical contribution is an algorithm to strongly refute semi-random instances of the Boolean $k$-XOR problem on $n$ variables that have $\widetilde{O}(n^{k/2})$ constraints. (In a semi-random $k$-XOR instance, the equations can be arbitrary and only the right-hand sides are random.)

One of our key insights is to identify a simple combinatorial property of random XOR instances that makes spectral refutation work. Our approach involves taking an instance that does not satisfy this property (i.e., is \emph{not} pseudorandom) and reducing it to a partitioned collection of $2$-XOR instances. We analyze these subinstances using a carefully chosen quadratic form as a proxy, which in turn is bounded via a combination of spectral methods and semidefinite programming. The analysis of our spectral bounds relies only on an off-the-shelf matrix Bernstein inequality. Even for the purely random case, this leads to a shorter proof compared to the ones in the literature that rely on problem-specific trace-moment computations.

at September 18, 2020 12:00 AM UTC

Authors: Aaron Lowe, Svend C. Svendsen, Pankaj K. Agarwal, Lars Arge Duke University, Aarhus University)
Download: PDF
Abstract: An important problem in terrain analysis is modeling how water flows across a terrain creating floods by forming channels and filling depressions. In this paper we study a number of \emph{flow-query} related problems: Given a terrain $\Sigma$, represented as a triangulated $xy$-monotone surface with $n$ vertices, a rain distribution $R$ which may vary over time, determine how much water is flowing over a given edge as a function of time. We develop internal-memory as well as I/O-efficient algorithms for flow queries.

This paper contains four main results:

(i) We present an internal-memory algorithm that preprocesses $\Sigma$ into a linear-size data structure that for a (possibly time varying) rain distribution $R$ can return the flow-rate functions of all edges of $\Sigma$ in $O(\rho k+|\phi| \log n)$ time, where $\rho$ is the number of sinks in $\Sigma$, $k$ is the number of times the rain distribution changes, and $|\phi|$ is the total complexity of the flow-rate functions that have non-zero values;

(ii) We also present an I/O-efficient algorithm for preprocessing $\Sigma$ into a linear-size data structure so that for a rain distribution $R$, it can compute the flow-rate function of all edges using $O(\text{Sort}(|\phi|))$ I/Os and $O(|\phi| \log |\phi|)$ internal computation time.

(iii) $\Sigma$ can be preprocessed into a linear-size data structure so that for a given rain distribution $R$, the flow-rate function of an edge $(q,r) \in \Sigma$ under the single-flow direction (SFD) model can be computed more efficiently.

(iv) We present an algorithm for computing the two-dimensional channel along which water flows using Manning's equation; a widely used empirical equation that relates the flow-rate of water in an open channel to the geometry of the channel along with the height of water in the channel.

at September 18, 2020 12:00 AM UTC

Authors: Albert Cheu, Jonathan Ullman
Download: PDF
Abstract: There has been a recent wave of interest in intermediate trust models for differential privacy that eliminate the need for a fully trusted central data collector, but overcome the limitations of local differential privacy. This interest has led to the introduction of the shuffle model (Cheu et al., EUROCRYPT 2019; Erlingsson et al., SODA 2019) and revisiting the pan-private model (Dwork et al., ITCS 2010). The message of this line of work is that, for a variety of low-dimensional problems---such as counts, means, and histograms---these intermediate models offer nearly as much power as central differential privacy. However, there has been considerably less success using these models for high-dimensional learning and estimation problems.

In this work, we show that, for a variety of high-dimensional learning and estimation problems, both the shuffle model and the pan-private model inherently incur an exponential price in sample complexity relative to the central model. For example, we show that, private agnostic learning of parity functions over $d$ bits requires $\Omega(2^{d/2})$ samples in these models, and privately selecting the most common attribute from a set of $d$ choices requires $\Omega(d^{1/2})$ samples, both of which are exponential separations from the central model. Our work gives the first non-trivial lower bounds for these problems for both the pan-private model and the general multi-message shuffle model.

at September 18, 2020 12:00 AM UTC

Authors: Meghna Lowalekar, Pradeep Varakantham, Patrick Jaillet
Download: PDF
Abstract: In multi-capacity ridesharing, multiple requests (e.g., customers, food items, parcels) with different origin and destination pairs travel in one resource. In recent years, online multi-capacity ridesharing services (i.e., where assignments are made online) like Uber-pool, foodpanda, and on-demand shuttles have become hugely popular in transportation, food delivery, logistics and other domains. This is because multi-capacity ridesharing services benefit all parties involved { the customers (due to lower costs), the drivers (due to higher revenues) and the matching platforms (due to higher revenues per vehicle/resource). Most importantly these services can also help reduce carbon emissions (due to fewer vehicles on roads).

Online multi-capacity ridesharing is extremely challenging as the underlying matching graph is no longer bipartite (as in the unit-capacity case) but a tripartite graph with resources (e.g., taxis, cars), requests and request groups (combinations of requests that can travel together). The desired matching between resources and request groups is constrained by the edges between requests and request groups in this tripartite graph (i.e., a request can be part of at most one request group in the final assignment). While there have been myopic heuristic approaches employed for solving the online multi-capacity ridesharing problem, they do not provide any guarantees on the solution quality. To that end, this paper presents the first approach with bounds on the competitive ratio for online multi-capacity ridesharing (when resources rejoin the system at their initial location/depot after serving a group of requests).

at September 18, 2020 12:00 AM UTC

Authors: Basile Herzog, Giuseppe Di Molfetta
Download: PDF
Abstract: We provide first evidence that some families of nonlinear quantum systems, rephrased in terms of coined quantum walks with effective nonlinear phase, display a strong computational advantage for search algorithms, over graphs having sets of vertices of constant size, e.g. the infinite square grid. The numerical simulations show that the walker finds the marked vertex in $O(N^{1/4} \log^{3/4} N)$ steps with probability $O(1/\log N)$, for an overall complexity of $O(N^{1/4}\log^{7/4}N)$. We also confirm this result analytically. Moreover the algorithm is implemented without the need for a specific oracle step, but by topological hole defect in the grid. From a quantum computing perspective, however, this hints at novel applications of quantum walk search: instead of using them to look for "good" solutions within the configuration space of a problem, we could use them to look for topological properties of the entire configuration space.

at September 17, 2020 11:21 PM UTC

Authors: Boro Sofranac, Ambros Gleixner, Sebastian Pokutta
Download: PDF
Abstract: Fast domain propagation of linear constraints has become a crucial component of today's best algorithms and solvers for mixed integer programming and pseudo-boolean optimization to achieve peak solving performance. Irregularities in the form of dynamic algorithmic behaviour, dependency structures, and sparsity patterns in the input data make efficient implementations of domain propagation on GPUs and, more generally, on parallel architectures challenging. This is one of the main reasons why domain propagation in state-of-the-art solvers is single thread only. In this paper, we present a new algorithm for domain propagation which (a) avoids these problems and allows for an efficient implementation on GPUs, and is (b) capable of running propagation rounds entirely on the GPU, without any need for synchronization or communication with the CPU. We present extensive computational results which demonstrate the effectiveness of our approach and show that ample speedups are possible on practically relevant problems: on state-of-the-art GPUs, our geometric mean speed-up for reasonably-large instances is around 10x to 20x and can be as high as 195x on favorably-large instances.

at September 17, 2020 11:22 PM UTC

Authors: Isolde Adler, Polly Fahey
Download: PDF
Abstract: Property testing algorithms are highly efficient algorithms, that come with probabilistic accuracy guarantees. For a property P, the goal is to distinguish inputs that have P from those that are far from having P with high probability correctly, by querying only a small number of local parts of the input. In property testing on graphs, the distance is measured by the number of edge modifications (additions or deletions), that are necessary to transform a graph into one with property P. Much research has focussed on the query complexity of such algorithms, i. e. the number of queries the algorithm makes to the input, but in view of applications, the running time of the algorithm is equally relevant.

In (Adler, Harwath STACS 2018), a natural extension of the bounded degree graph model of property testing to relational databases of bounded degree was introduced, and it was shown that on databases of bounded degree and bounded tree-width, every property that is expressible in monadic second-order logic with counting (CMSO) is testable with constant query complexity and sublinear running time. It remains open whether this can be improved to constant running time.

In this paper we introduce a new model, which is based on the bounded degree model, but the distance measure allows both edge (tuple) modifications and vertex (element) modifications. Our main theorem shows that on databases of bounded degree and bounded tree-width, every property that is expressible in CMSO is testable with constant query complexity and constant running time in the new model. We also show that every property that is testable in the classical model is testable in our model with the same query complexity and running time, but the converse is not true.

We argue that our model is natural and our meta-theorem showing constant-time CMSO testability supports this.

at September 17, 2020 11:23 PM UTC

Authors: Abdurrahman Yaşar, Muhammed Fatih Balin, Xiaojing An, Kaan Sancak, Ümit V. Çatalyürek
Download: PDF
Abstract: Even distribution of irregular workload to processing units is crucial for efficient parallelization in many applications. In this work, we are concerned with a spatial partitioning called rectilinear partitioning (also known as generalized block distribution) of sparse matrices. More specifically, in this work, we address the problem of symmetric rectilinear partitioning of a square matrix. By symmetric, we mean the rows and columns of the matrix are identically partitioned yielding a tiling where the diagonal tiles (blocks) will be squares. We first show that the optimal solution to this problem is NP-hard, and we propose four heuristics to solve two different variants of this problem. We present a thorough analysis of the computational complexities of those proposed heuristics. To make the proposed techniques more applicable in real life application scenarios, we further reduce their computational complexities by utilizing effective sparsification strategies together with an efficient sparse prefix-sum data structure. We experimentally show the proposed algorithms are efficient and effective on more than six hundred test matrices. With sparsification, our methods take less than 3 seconds in the Twitter graph on a modern 24 core system and output a solution whose load imbalance is no worse than 1%.

at September 17, 2020 11:22 PM UTC

Authors: Dan Reznik, Ronaldo Garcia
Download: PDF
Abstract: Previously we showed the family of 3-periodics in the elliptic billiard (confocal pair) is the image under a variable similarity transform of poristic triangles (those with non-concentric, fixed incircle and circumcircle). Both families conserve the ratio of inradius to circumradius and therefore also the sum of cosines. This is consisten with the fact that a similarity preserves angles. Here we study two new Poncelet 3-periodic families also tied to each other via a variable similarity: (i) a first one interscribed in a pair of concentric, homothetic ellipses, and (ii) a second non-concentric one known as the Brocard porism: fixed circumcircle and Brocard inellipse. The Brocard points of this family are stationary at the foci of the inellipse. A key common invariant is the Brocard angle, and therefore the sum of cotangents. This raises an interesting question: given a non-concentric Poncelet family (limited or not to the outer conic being a circle), can a similar doppelg\"anger always be found interscribed in a concentric, axis-aligned ellipse and/or conic pair?

at September 17, 2020 11:23 PM UTC

Authors: Stasys Jukna, Hannes Seiwert
Download: PDF
Abstract: Given an assignment of real weights to the ground elements of a matroid, the min-max weight of a ground element $e$ is the minimum, over all circuits containing $e$, of the maximum weight of an element in that circuit with the element $e$ removed. We use this concept to answer the following structural questions for the minimum weight basis problem. Which elements are persistent under a given weighting (belong to all or to none of the optimal bases)? What changes of the weights are allowed while preserving optimality of optimal bases? How does the minimum weight of a basis change when the weight of a single ground element is changed, or when a ground element is contracted or deleted? Our answer to this latter question gives the tropical (min,+,-) analogue of Kirchhoff's arithmetic (+,x,/) effective conductance formula for electrical networks.

at September 17, 2020 11:20 PM UTC

Authors: Vahid R. Asadi, Igor Shinkar
Download: PDF
Abstract: Locally decodable codes (LDCs) are error-correcting codes $C : \Sigma^k \to \Sigma^n$ that admit a local decoding algorithm that recovers each individual bit of the message by querying only a few bits from a noisy codeword. An important question in this line of research is to understand the optimal trade-off between the query complexity of LDCs and their block length. Despite importance of these objects, the best known constructions of constant query LDCs have super-polynomial length, and there is a significant gap between the best constructions and the known lower bounds in terms of the block length.

For many applications it suffices to consider the weaker notion of relaxed LDCs (RLDCs), which allows the local decoding algorithm to abort if by querying a few bits it detects that the input is not a codeword. This relaxation turned out to allow decoding algorithms with constant query complexity for codes with almost linear length. Specifically, [BGH+06] constructed an $O(q)$-query RLDC that encodes a message of length $k$ using a codeword of block length $n = O(k^{1+1/\sqrt{q}})$.

In this work we improve the parameters of [BGH+06] by constructing an $O(q)$-query RLDC that encodes a message of length $k$ using a codeword of block length $O(k^{1+1/{q}})$. This construction matches (up to a multiplicative constant factor) the lower bounds of [KT00, Woo07] for constant query LDCs, thus making progress toward understanding the gap between LDCs and RLDCs in the constant query regime.

In fact, our construction extends to the stronger notion of relaxed locally correctable codes (RLCCs), introduced in [GRR18], where given a noisy codeword the correcting algorithm either recovers each individual bit of the codeword by only reading a small part of the input, or aborts if the input is detected to be corrupt.

at September 17, 2020 11:21 PM UTC

Authors: Enrique G. Alvarado, Bala Krishnamoorthy, Kevin R. Vixie
Download: PDF
Abstract: Given a compact $E \subset \mathbb{R}^n$ and $s > 0$, the maximum distance problem seeks a compact and connected subset of $\mathbb{R}^n$ of smallest one dimensional Hausdorff measure whose $s$-neighborhood covers $E$. For $E\subset \mathbb{R}^2$, we prove that minimizing over minimum spanning trees that connect the centers of balls of radius $s$, which cover $E$, solves the maximum distance problem.

The main difficulty in proving this result is overcome by the proof of Lemma 3.5, which states that one is able to cover the $s$-neighborhood of a Lipschitz curve $\Gamma$ in $\mathbb{R}^2$ with a finite number of balls of radius $s$, and connect their centers with another Lipschitz curve $\Gamma_\ast$, where $\mathcal{H}^1(\Gamma_\ast)$ is arbitrarily close to $\mathcal{H}^1(\Gamma)$.

We also present an open source package for computational exploration of the maximum distance problem using minimum spanning trees, available at

at September 17, 2020 11:23 PM UTC

The study of differentially private PAC learning runs all the way from its introduction in 2008 [KLNRS08] to a best paper award at the Symposium on Foundations of Computer Science (FOCS) this year [BLM20]. In this post, we’ll recap the history of this line of work, aiming for enough detail for a rough understanding of the results and methods.

Before we get to the “what” and “how” of private PAC learning, it’s worth thinking about the “why”. One motivation for this line of work is that it neatly captures a fundamental question: does privacy in machine learning come at a price? Machine learning is now sufficiently successful and widespread for this question to have real import. But to even start to address this question, we need a formalization of machine learning that allows us to reason about possible trade-offs in a rigorous way. Statistical learning theory, and its computational formalization as PAC learning, provide one such clean and well-studied model. We can therefore use PAC learning as a testbed whose insights we might carry to other less idealized forms of learning.

With this motivation in mind, the rest of this post is structured as follows. The first section covers the basics of the PAC model, and subsequent sections gradually build up a chronology of results. When possible, we give short sketches of the accompanying techniques.

PAC Learning

We’ll start with a brief overview of PAC learning absent any privacy restrictions. Readers familiar with PAC learning can probably skip this section while noting that

  1. (the cardinality version of) Occam’s razor is a baseline learner using \(O(\log|\mathcal{H}|)\) samples,

  2. VC dimension characterizes non-private PAC learning,

  3. we’ll focus on the sample complexity of realizable PAC learning,

  4. we’ll usually omit dependencies on accuracy and success probability parameters, and

  5. we’ll usually ignore computational efficiency.

For readers needing a refresher on PAC learning, the basic element of the “probably approximately correct” (PAC) framework [Val84] is a hypothesis. Each hypothesis is a function \(h \colon \mathcal{X}\to \{-1,1\}\) mapping examples from some space \(\mathcal{X}\) to binary labels. A collection of hypotheses is a hypothesis class \(\mathcal{H}\), e.g., thresholds (a.k.a. perceptrons), rectangles, conjunctions, and so on. In the realizable setting, a learner receives examples drawn from some unknown distribution and labeled by an unknown \(h^\ast \in \mathcal{H}\). The learner’s goal is to with high probability (“probably”) output a hypothesis that mostly matches the labels of \(h^\ast\) on future examples from the unknown example distribution (“approximately correct”). In the agnostic setting, examples are not necessarily labeled by any \(h \in \mathcal{H}\), and the goal is only to output a hypothesis that approximates the best error of any hypothesis from \(\mathcal{H}\). As mentioned above, we focus on the realizable setting unless otherwise specified. In the proper setting, the learner must output a hypothesis from \(\mathcal{H}\) itself. In the improper setting, this requirement is removed.

In general, we say an algorithm \((\alpha,\beta)\)-PAC learns \(\mathcal{H}\) with sample complexity \(n\) if \(n\) samples are sufficient to with probability at least \(1-\beta\) obtain error at most \(\alpha\) over new examples from the distribution. For the purposes of this post, we generally omit these dependencies on \(\alpha\) and \(\beta\), as they typically vary little or not at all when switching between non-private and private PAC learning.

Fortunately, we always have a simple baseline learner based on empirical risk minimization: given a set of labeled examples, iterate over all hypotheses \(h \in \mathcal{H}\), check how many of the labeled examples each \(h\) mislabels, and output a hypothesis that mislabels the fewest examples. Using this learner, which is sometimes called “Occam’s razor,” \(O(\log|\mathcal{H}|)\) samples suffice to PAC learn \(\mathcal{H}\).

At the same time, \(|\mathcal{H}|\) is a pretty coarse measure of hypothesis class complexity, as it would immediately rule out learning any infinite hypothesis class (of which there are many). Thus, as you might expect, we can do better. We do so using VC dimension. \(\mathsf{VCD}\left(\mathcal{H}\right)\) is the size of the largest possible collection of examples such that, for every labeling of the examples, \(\mathcal{H}\) contains a hypothesis with that labeling. With VC dimension, we can essentially swap \(\log|\mathcal{H}|\) with \(\mathsf{VCD}\left(\mathcal{H}\right)\) in the Occam’s razor bound and PAC learn with \(O(\mathsf{VCD}\left(\mathcal{H}\right))\) samples. In fact, the “Fundamental Theorem of Statistical Learning” says that PAC learnability (realizable or agnostic) is equivalent to finite VC dimension. In this sense, \(\mathsf{VCD}\left(\mathcal{H}\right)\) is a good measure of how hard it is to PAC learn \(\mathcal{H}\). As a motivating example that will re-appear later, note that for the hypothesis class of 1-dimensional thresholds over \(T\) points, \(\log |\mathcal{H}| = \log T\), while \(\mathsf{VCD}\left(\mathcal{H}\right)\) is only 1.

Example: a one-dimensional threshold function An illustration of 1-dimensional thresholds. A given threshold is determined by some point \(x^\ast \in [T]\): any example \(x \leq x^\ast\) receives label \(-1\), and any example \(x > x^\ast\) receives label 1.

A Simple Private PAC Learner

It is straightforward to add a differential privacy constraint to the PAC framework: the hypothesis output by the learner must be a differentially private function of the labeled examples \((x_1, y_1), \ldots, (x_n, y_n)\). That is, changing any one of the examples — even to one with an inconsistent label — must not affect the distribution over hypotheses output by the learner by too much.

Since we haven’t talked about any other PAC learner, we may as well start with the empirical risk minimization-style Occam’s razor discussed in the previous section, which simply selects a hypothesis that minimizes empirical error. A private version becomes easy if we view this algorithm in the right light. All it is doing is assigning a score to each possible output (the hypothesis’ empirical error) and outputting one with the best (lowest) score. This makes it a good candidate for privatization by the exponential mechanism [MT07].

Recall that the exponential mechanism uses a scoring function over outputs to release better outputs with higher probability, subject to the privacy constraint. More formally, the exponential mechanism requires a scoring function \(u(X,h)\) mapping (database, output) pairs to real-valued scores and then selects a given output \(h\) with probability proportional to \(\exp\left(\tfrac{\varepsilon u(X,h)}{2\Delta(u)}\right)\). Thus a lower \(\varepsilon\) (stricter privacy requirement) and larger \(\Delta(u) := \sup_h \sup_{X \sim X’} u(X,h) - u(X’,h) \) (scoring function more sensitive to changing one element in the database \(X\) to make \(X’\)) both lead to a more uniform (more private) output distribution.

Fortunately for our PAC learning setting, empirical error is not a very sensitive scoring function: changing one sample only changes empirical error by 1. We can therefore use (negative) empirical error as our scoring function \(u(X,h)\), apply the exponential mechanism, and get a “private Occam’s razor.” This was exactly what Kasiviswanathan, Lee, Nissim, Raskhodnikova, and Smith [KLNRS08] did when they introduced differentially private PAC learning in 2008. The resulting sample complexity bounds differ from the generic Occam’s razor only by an \(\varepsilon\) factor in the denominator, and \(O(\log|\mathcal{H}|/\varepsilon)\) samples suffice to privately PAC learn \(\mathcal{H}\).

Of course, our experience with non-private PAC learning suggests that we shouldn’t be satisfied with this \(\log |\mathcal{H}|\) dependence. Maybe VC dimension characterizes private PAC learning, too?

Characterizing Pure Private PAC Learning

As it turns out, answering this question will take some time. We start with a partial negative answer. Specifically, we’ll see a class with VC dimension 1 and (a restricted form of) private sample complexity arbitrarily larger than 1. We’ll also cover the first in a line of characterization results for private PAC learning.

We first consider learners that satisfy pure privacy. Recall that pure \((\varepsilon,0)\)-differential privacy forces output distributions that may only differ by a certain \(e^\varepsilon\) multiplicative factor (like the exponential mechanism above). The strictly weaker notion of approximate \((\varepsilon,\delta)\)-differential privacy also allows a small additive \(\delta\) factor. Second, we restrict ourselves to proper learners, which may only output hypotheses from the learned class \(\mathcal{H}\).

With these assumptions in place, in 2010, Beimel, Kasiviswanathan, and Nissim [BKN10] studied a hypothesis class called \(\mathsf{Point}_d\). \(\mathsf{Point}_d\) consists of \(2^d\) hypotheses, one for each vector in \(\{0,1\}^d\). Taking the set of examples \(\mathcal{X}\) to be \(\{0,1\}^d\) as well, we define each hypothesis in \(\mathsf{Point}_d\) to label only its associated vector as 1, and the remaining \(2^d-1\) examples as -1. [BKN10] showed that the hypothesis class \(\mathsf{Point}_d\) requires \(\Omega(d)\) samples for proper pure private PAC learning. In contrast, \(\mathsf{VCD}\left(\mathsf{Point}_d\right) = 1\), so this \(\Omega(d)\) lower bound shows us that VC dimension does not characterize proper pure private PAC learning.

This result uses the classic “packing” lower bound method, which powers many lower bounds for pure differential privacy. The general packing method is to first construct a large collection of databases which are all “close enough” to each other but nonetheless all have different “good” outputs. Once we have such a collection, we use group privacy. Group privacy is a corollary of differential privacy that requires databases differing in \(k\) elements to have \(k\varepsilon\)-close output distributions. Because of group privacy, if we start with a collection of databases that are close together, then the output distributions for any two databases in the collection cannot be too different. This creates a tension: utility forces the algorithm to produce different output distributions for different databases, but privacy forces similarity. The packing argument comes down to arguing that, unless the databases are large, privacy wins out, and when privacy wins out then there is some database where the algorithm probably produces a bad output.

For \(\mathsf{Point}_d\), we sketch the resulting argument as follows. Suppose we have an \(\varepsilon\)-private PAC learner that uses \(m\) samples. Then we can define a collection of different databases of size \(m\), one for each hypothesis in \(\mathsf{Point}_d\). By group privacy, the output distribution for our private PAC learner changes by at most \(e^{m\varepsilon}\) between any two of the databases in this collection. Thus we can pick any \(h \in \mathsf{Point}_d\) and know that the probability of outputting the wrong hypothesis is at least roughly \(2^d \cdot e^{-m\varepsilon}\). Since we need this probability to be small, rearranging implies \(m = \Omega(d/\varepsilon)\).

[BKN10] then contrasted this result with an improper pure private PAC learner. This learner applies the exponential mechanism to a class \(\mathsf{Point}_d’\) of hypotheses derived from \(\mathsf{Point}_d\) — but not necessarily a subset of \(\mathsf{Point}_d\) — gives an improper pure private PAC learner with sample complexity \(O(\log d)\). Since this learner is improper, it circumvents the “one database per hypothesis” step of the packing lower bound. Moreover, [BKN10] gave a still more involved improper pure private PAC learner requiring only \(O(1)\) samples. This separates proper pure private PAC learning from improper pure private PAC learning. In contrast, the sample complexities of proper and improper PAC learning absent privacy are the same up to logarithmic factors in \(\alpha\) and \(\beta\).

In 2013, Beimel, Nissim, and Stemmer [BNS13] proved a more general result. They gave the first characterization of pure (improper) private PAC learning by defining a new hypothesis class measure called the representation dimension, \(\mathsf{REPD}\left(\mathcal{H}\right)\). Roughly, the representation dimension considers the collection of all distributions \(\mathcal{D}\) over sets of hypotheses, not necessarily from \(\mathcal{H}\), that “cover” \(\mathcal{H}\). By “cover,” we mean that for any \(h \in \mathcal{H}\), with high probability a set drawn from covering distribution \(\mathcal{D}\) includes a hypothesis that mostly produces labels that agree with \(h\). With this collection of distributions defined, \(\mathsf{REPD}\left(\mathcal{H}\right)\) is the minimum over all such covering distributions of the logarithm of the size of the largest set in its support. Thus a hypothesis class that can be covered by a distribution over small sets of hypotheses will have a small representation dimension. With the notion of representation dimension in hand, [BNS13] gave the following result:

Theorem 1 ([BNS13]). The sample complexity to pure private PAC learn \(\mathcal{H}\) is \(\Theta(\mathsf{REPD}\left(\mathcal{H}\right))\).

Representation dimension may seem like a strange definition, but a sketch of the proof of this result helps illustrate the connection to private learning. Recall from our private Occam’s razor, and the improper pure private PAC learner above, that if we can find a good and relatively small set of hypotheses to choose from, then we can apply the exponential mechanism and call it a day. It is exactly this kind of “good set of hypotheses” that representation dimension aims to capture. A little more formally, given an upper bound on \(\mathsf{REPD}\left(\mathcal{H}\right)\), we know there is some covering distribution whose largest hypothesis set is not too big. That means we can construct a learner that draws a hypothesis set from this covering distribution and applies the exponential mechanism to it. Just as we picked up a \(\log|\mathcal{H}|\) sample complexity dependence using private Occam’s razor, since \(\mathsf{REPD}\left(\mathcal{H}\right)\) measures the logarithm of the size of the largest hypothesis set in the support, this pure private learner picks up a \(\mathsf{REPD}\left(\mathcal{H}\right)\) sample complexity dependence here. This gives us one direction of Theorem 1.

This logic works in the other direction as well. To go from a pure private PAC learner with sample complexity \(m\) to an upper bound on \(\mathsf{REPD}\left(\mathcal{H}\right)\), we return to the group privacy trick used by [BKN10]. Suppose we fix a database of size \(m\) and pass it to the learner. By group privacy and the learner’s accuracy guarantee, if we fix some concept \(c\), the learner has probability at least roughly \(e^{-m}\) of outputting a hypothesis that mostly agrees with \(c\). Thus if we repeat this process roughly \(e^{m}\) times, we probably get at least one hypothesis that mostly agrees with \(c\). In other words, this repeated calling of the learner on the arbitrary database yields a covering distribution for \(\mathcal{H}\). Since we called the learner approximately \(e^m\) times, the logarithm of this is \(m\), and we get our upper bound on \(\mathsf{REPD}\left(\mathcal{H}\right)\).

To recap, we now know that proper pure private PAC learning is strictly harder than improper pure private PAC learning, which is characterized by representation dimension. A picture sums it up. Note the dotted line, since we don’t yet have any evidence separating finite representation dimension and finite VC dimension.

Landscape of Private PAC, take 1

Separating Pure and Approximate Private PAC Learning

So far, we’ve focused only on pure privacy. In this section, we move on to the first separations between pure and approximate private PAC learning, as well as the first connection between private learning and online learning.

Our source is a pair of interconnected papers from around 2014. Among other things, Feldman and Xiao [FX14] introduced Littlestone dimension to private PAC learning. By connecting representation dimension to results from communication complexity to Littlestone dimension, they proved the following:

Theorem 2 ([FX14]). The sample complexity to pure private PAC learn \(\mathcal{H}\) is \(\Omega(\mathsf{LD}\left(\mathcal{H}\right))\).

Littlestone dimension \(\mathsf{LD}\left(\mathcal{H}\right)\) is, roughly, the maximum number of mistakes an adversary can force an online PAC-learning algorithm to make [Lit88]. We always have \(\mathsf{VCD}\left(\mathcal{H}\right) \leq \mathsf{LD}\left(\mathcal{H}\right) \leq \log|\mathcal{H}|\), but these inequalities can be strict. For example, denoting by \(\mathsf{Thresh_T}\) the class of thresholds over \(\{1, 2, \ldots, T\}\), since an adversary can force \(\Theta(\log T)\) wrong answers from an online learner binary searching over \(\{1,2, \ldots, T\}\), \(\mathsf{LD}\left(\mathsf{Thresh_T}\right) = \Omega(\log T)\). In contrast, \(\mathsf{VCD}\left(\mathsf{Thresh_T}\right) = 1\).

At first glance it’s not obvious what Theorem 2 adds over Theorem 1. After all, Theorem 1 gives an equivalence, not just a lower bound. One advantage of Theorem 2 is that Littlestone dimension is a known quantity that has already been studied in its own right. We can now import results like the lower bound on \(\mathsf{LD}\left(\mathsf{Thresh_T}\right)\), whereas bounds on \(\mathsf{REPD}\left(\cdot\right)\) are not common. A second advantage is that Littlestone dimension conceptually connects private learning and online learning: we now know that pure private PAC learning is no easier than online PAC learning.

A second paper by Beimel, Nissim, and Stemmer [BNS13b] contrasted this \(\Omega(\log T)\) lower bound for pure private learning of thresholds with a \(2^{O(\log^\ast T)}\) upper bound for approximate private PAC learning \(\mathsf{Thresh_T}\). Here \(\log^\ast\) denotes the very slow-growing iterated logarithm, the number of times we must take the logarithm of the argument to bring it \(\leq 1\). (We’re not kidding about “very slow-growing” either: \(\log^\ast(\text{number of atoms in universe}) \approx 4\).) With Feldman and Xiao’s result, this separates pure private PAC learning from approximate private PAC learning. It also shows that representation dimension does not characterize approximate private PAC learning.

At the same time, Feldman and Xiao observed that the connection between pure private PAC learning and Littlestone dimension is imperfect. Again borrowing results from communication complexity, they observed that the hypothesis class \(\mathsf{Line_p}\) (which we won’t define here) has \(\mathsf{LD}\left(\mathsf{Line_p}\right) = 2\) but \(\mathsf{REPD}\left(\mathsf{Line_p}\right) = \Theta(\log(p))\). In contrast, they showed that an approximate private PAC learner can learn \(\mathsf{Line_p}\) using \(O\left(\tfrac{\log(1/\beta)}{\alpha}\right)\) samples. Since this entails no dependence on \(p\) at all, it improves the separation between pure and approximate private PAC learning given by [BNS13b].

Let’s pause to recap what’s happened so far. We learned in the last section that representation dimension characterizes pure private PAC learning [BNS13]. We learned in this section that Littlestone dimension gives lower bounds for pure private PAC learning but, as shown by \(\mathsf{Line_p}\), these bounds are sometimes quite loose [FX14]. \(\mathsf{Thresh_T}\) shows that representation dimension does not characterize approximate private PAC learning [FX14 ; BNS13b], and we still have no privacy-specific lower bounds for approximate private learners. So the picture now looks like this:

Landscape of Private PAC, take 2

In particular, we might still find that VC dimension characterizes approximate private PAC learning!

Lower Bounds for Approximate Private PAC Learning

We now dash this hope. In 2015, Bun, Nissim, Stemmer, and Vadhan [BNSV15] gave the first nontrivial lower bound for approximate private PAC learning. They showed that learning \(\mathsf{Thresh_T}\) has proper approximate private sample complexity \(\Omega(\log^\ast(T))\) and \(O(2^{\log^\ast(T)})\).

We’ll at least try to give some intuition for the presence of \(\log^\ast\) in the lower bound. Informally, the lower bound relies on an inductive construction of a sequence of hard problems for databases of size \(n=1, 2, \ldots\). The \(k^{th}\) hard problem relies on a distribution over databases of size \(k\) whose data universe is of of size exponential in the size of the data universe for the \((k-1)^{th}\) distribution. The base case is the uniform distribution over the two singleton databases \(\{0\}\) and \(\{1\}\), and they show how to inductively construct successive problems such that a solution for the \(k^{th}\) problem implies a solution for the \((k-1)^{th}\) problem. Unraveling the recursive relationship between the problem domain sizes implies a general lower bound of roughly \(\log^\ast|X|\) for domain \(X\).

The inclusion of \(\log^\ast\) makes this is an extremely mild lower bound. However, \(\log^\ast(T)\) can still be arbitrarily larger than 1, so this is the first definitive evidence that proper approximate privacy introduces a cost over non-private PAC learning.

In 2018, Alon, Livni, Malliaris, and Moran [ALMM19] extended this \(\Omega(\log^\ast T)\) lower bound for \(\mathsf{Thresh_T}\) to improper approximate privacy. More generally, they gave concrete evidence for the importance of thresholds, which have played a seemingly outsize role in the work so far. They did so by relating a class’ Littlestone dimension to its ability to “contain” thresholds. Here, we say \(\mathcal{H}\) “contains” \(m\) thresholds if there exist \(m\) (unlabeled) examples \(x_1,\ldots,x_m\) and hypotheses \(h_1, \ldots, h_m \in \mathcal{H}\) such that the hypotheses “behave like” thresholds on the \(m\) examples, i.e., \(h_i(x_j) = 1 \Leftrightarrow j \geq i\). With this language, they imported a result from model theory to show that any hypothesis class \(\mathcal{H}\) contains \(\log(\mathsf{LD}\left(\mathcal{H}\right))\) thresholds. This implies that learning \(\mathcal{H}\) is at least as hard as learning \(\mathsf{Thresh_T}\) with \(T = \log(\mathsf{LD}\left(\mathcal{H}\right))\). Since \(\log^\ast(\log(\mathsf{LD}\left(\mathcal{H}\right))) = \Omega(\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))\), combining these two results puts the following limit on private PAC learning:

Theorem 3 ([ALMM19]). The sample complexity to approximate private PAC learn \(\mathcal{H}\) is \(\Omega(\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))\).

Littlestone dimension characterizes online PAC learning, so we now know that online PAC learnability is necessary for private PAC learnability. Sufficiency, however, remains an open question. This produces the following picture, where the dotted line captures the question of sufficiency.

Landscape of Private PAC, take 3

Characterizing Approximate Private PAC Learning

Spurred by this question, several advances in private PAC learning have appeared in the last year. First, Gonen, Hazan, and Moran strengthened Theorem 3 by giving a constructive method for converting pure private learners to online learners [GHM19]. Their result reaches back to the 2013 characterization of pure private learning in terms of representation dimension by using the covering distribution to generate a collection of “experts” for online learning. Again revisiting \(\mathsf{Thresh_T}\), Kaplan, Ligett, Mansour, Naor, and Stemmer [KLMNS20] significantly reduced the \(O(2^{\log^\ast(T)})\) upper bound of [BNSV15] to just \(O((\log^\ast(T))^{1.5})\). And Alon, Beimel, Moran, and Stemmer [ABMS20] justified this post’s focus on realizable private PAC learning by giving a transformation from a realizable approximate private PAC learner to an agnostic one at the cost of slightly worse privacy and sample complexity. This built on an earlier transformation that only applied to proper learners [BNS15].

Finally, Bun, Livni, and Moran [BLM20] answered the open question posed by [ALMM19]:

Theorem 4 ([BLM20]). The sample complexity to approximate private PAC learn \(\mathcal{H}\) is \(2^{O({\mathsf{LD}\left(\mathcal{H}\right)})}\).

To prove this, [BLM20] introduced the notion of a globally stable learner and showed how to convert an online learner to a globally stable learner to a private learner. Thus, combined with the result of [ALMM19], we now know that the sample complexity of private PAC learning any \(\mathcal{H}\) is at least \(\Omega(\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))\) and at most \(2^{O({\mathsf{LD}\left(\mathcal{H}\right)})}\). In this sense, online learnability characterizes private learnability.

Landscape of Private PAC, final take

Narrowing the gap between the lower and upper bounds above is an open question. Note that we cannot hope to close the gap completely. For the lower bound, the current \(\mathsf{Thresh_T}\) upper bound implies that no general lower bound can be stronger than \(\Omega((\log^\ast(\mathsf{LD}\left(\mathcal{H}\right)))^{1.5})\). For the upper bound, there exist hypotheses classes \(\mathcal{H}\) with \(\mathsf{VCD}\left(\mathcal{H}\right) = \mathsf{LD}\left(\mathcal{H}\right)\) (e.g., \(\mathsf{VCD}\left(\mathsf{Point}_d\right) = \mathsf{LD}\left(\mathsf{Point}_d\right)= 1\)), so since non-private PAC learning requires \(\Omega(\mathsf{VCD}\left(\mathcal{H}\right))\) samples, the best possible private PAC learning upper bound is \(O(\mathsf{LD}\left(\mathcal{H}\right))\). Nevertheless, proving either bound remains open.


This concludes our post, and with it our discussion of this fundamental question: the price of privacy in machine learning. We now know that in the PAC model, proper pure private learning, improper pure private learning, approximate private learning, and non-private learning are all strongly separated. By the connection to Littlestone dimension, we also know that approximate private learnability is equivalent to online learnability. However, many questions about computational efficiency and tight sample complexity bounds remain open.

As mentioned in the introduction, we focused on the clean yet widely studied and influential model of PAC learning. Having characterized how privacy enters the picture in PAC learning, we can hopefully convey this understanding to other models of learning, and now approach these questions from a rigorous and grounded point of view.

Congratulations to Mark Bun, Roi Livni, and Shay Moran on their best paper award — and to the many individuals who paved the way before them!


Thanks to Kareem Amin and Clément Canonne for helpful feedback while writing this post.

by Matthew Joseph at September 16, 2020 06:00 PM UTC

Tree codes are combinatorial structures introduced by Schulman (STOC 1993) as key ingredients in interactive coding schemes. Asymptotically-good tree codes are long known to exist, yet their explicit construction remains a notoriously hard open problem. Even proposing a plausible construction, without the burden of proof, is difficult and the defining tree code property requires structure that remains elusive. To the best of our knowledge, only one candidate appears in the literature, due to Moore and Schulman (ITCS 2014). We put forth a new candidate for an explicit asymptotically-good tree code. Our construction is an extension of the vanishing rate tree code by Cohen-Haeupler-Schulman (STOC 2018) combined with a vanishing distance tree code by Gelles et al. (SODA 2016). The correctness of our construction relies on a conjecture that we introduce on certain Pascal determinants indexed by the points of the Boolean hypercube. We furnish evidence supporting our conjecture through numerical computation, combinatorial arguments from planar path graphs and based on well-studied heuristics from arithmetic geometry.

at September 16, 2020 05:46 PM UTC

We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property $\mathcal{P}$, and parameters $0< \epsilon, \delta <1$, we want to distinguish {\em with probability at least $1-\delta$} whether these distributions satisfy $\mathcal{P}$ or are $\epsilon$-far from $\mathcal{P}$ in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to $\delta = \Omega(1)$), and provided sample-optimal testers for a range of properties. While one can always boost the confidence probability of any such tester by black-box amplification, this generic boosting method typically leads to sub-optimal sample bounds. Here we study the following broad question: For a given property $\mathcal{P}$, can we {\em characterize} the sample complexity of testing $\mathcal{P}$ as a function of all relevant problem parameters, including the error probability $\delta$? Prior to this work, uniformity testing was the only statistical task whose sample complexity had been characterized in this setting. As our main results, we provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters. We also show matching information-theoretic lower bounds on the sample complexity of these problems. Our techniques naturally extend to give optimal testers for related problems. To illustrate the generality of our methods, we give optimal algorithms for testing collections of distributions and testing closeness with unequal sized samples.

at September 16, 2020 05:20 PM UTC

Update (Sep. 17): Several people, here and elsewhere, wrote to tell me that while they completely agreed with my strategic and moral stance in this post, they think that it’s the ads of Republican Voters Against Trump, rather than the Lincoln Project, that have been most effective in changing Trump supporters’ minds. So please consider donating to RVAT instead or in addition! In fact, what the hell, I’ll match donations to RVAT up to $1000.

For the past few months, I’ve alternated between periods of debilitating depression and (thankfully) longer stretches when I’m more-or-less able to work. Triggers for my depressive episodes include reading social media, watching my 7-year daughter struggle with prolonged isolation, and (especially) contemplating the ongoing apocalypse in the American West, the hundreds of thousands of pointless covid deaths, and an election in 48 days that if I didn’t know such things were impossible in America would seem likely to produce a terrifying standoff as a despot and millions of his armed loyalists refuse to cede control. Meanwhile, catalysts for my relatively functional periods have included teaching my undergrad quantum information class, Zoom calls with my students, life on Venus?!? (my guess is no, but almost entirely due to priors), learning new math (fulfilling a decades-old goal, I’m finally working my way through Paul Cohen’s celebrated proof of the independence of the Continuum Hypothesis—more about that later!).

Of course, when you feel crushed by the weight of the world’s horribleness, it improves your mood to be able even just to prick the horribleness with a pin. So I was gratified that, in response to a previous post, Shtetl-Optimized readers contributed at least $3,000, the first $2,000 of which I matched, mostly to the Biden-Harris campaign but a little to the Lincoln Project.

Alas, a commenter was unhappy with the latter:

Lincoln Project? Really? … Pushing the Overton window rightward during a worldwide fascist dawn isn’t good. I have trouble understanding why even extremely smart people have trouble with this sort of thing.

Since this is actually important, I’d like to spend the rest of this post responding to it.

For me it’s simple.

What’s the goal right now? To defeat Trump. In the US right now, that’s the prerequisite to every other sane political goal.

What will it take to achieve that goal? Turnout, energizing the base, defending the election process … but also, if possible, persuading a sliver of Trump supporters in swing states to switch sides, or at least vote third party or abstain.

Who is actually effective at that goal? Well, no one knows for sure. But while I thought the Biden campaign had some semi-decent ads, the Lincoln Project’s best stuff seems better to me, just savagely good.

Why are they effective? The answer seems obvious: for the same reason why a jilted ex is a more dangerous foe than a stranger. If anyone understood how to deprogram a Republican from the Trump cult, who would it be: Alexandria Ocasio-Cortez, or a fellow Republican who successfully broke from the cult?

Do I agree with the Lincoln Republicans about most of the “normal” issues that Americans once argued about? Not at all. Do I hold them, in part, morally responsible for creating the preconditions to the current nightmare? Certainly.

And should any of that cause me to boycott them? Not in my moral universe. If Churchill and FDR could team up with Stalin, then surely we in the Resistance can temporarily ally ourselves with the rare Republicans who chose their stated principles over power when tested—their very rarity attesting to the nontriviality of their choice.

To my mind, turning one’s back on would-be allies, in a conflict whose stakes obviously overshadow what’s bad about those allies, is simultaneously one of the dumbest and the ugliest things that human beings can do. It abandons reason for moral purity and ends up achieving neither.

by Scott at September 16, 2020 07:51 AM UTC

by David Eppstein at September 15, 2020 10:15 PM UTC

Happy birthday to Ken

Ken Regan is of course my partner on GLL. He is faculty in the computer science department at the University of Buffalo. His PhD was in 1986 from Oxford University and it was titled On the separation of complexity classes. He was the PhD student of Dominic Welsh who was a student of John Hammersley.

Today I would like to wish Ken a happy birthday.

He is now 61 years young. I hope you will join me and wish him many more birthdays. His age is special for many reasons:

  • It is a twin prime.
  • It is equal to {5^{2} + 6^{2}}.
  • It is the ninth Mersenne prime: {2^{61} - 1 = 2,305,843,009,213,693,951}.

There are three big I’s in his life. Let’s talk about two of them.

Interest in Cricket

Ken loves sports in general and especially cricket. Last Sunday he told me he watched his Bills win their first NFL game while he watched a cricket match. I have no idea how cricket works, but here is Ken’s explanation: Are Cricket and Baseball sister games?

  • In a baseball game you see pitchers on the field.
    In a cricket match you see fielders on the pitch.
  • In baseball, a bad delivery is called a “Ball”.
    In cricket, it’s a “No Ball”.
  • In baseball, if a batter carries his bat, he’s out.
    In cricket, the batsmen always carry their bat, and an opening batsman who “carries his bat” is never out.
  • In baseball, an innings is called a half-inning.
    In cricket, an inning is called an innings.
  • In baseball, a batter hit by a pitched ball gets a free pass to First Base.
    In cricket, such a batter can be Out Leg Before Wicket.
  • In baseball, if a ball is caught over the boundary, “yer out!”
    In cricket, you score 6 runs.
  • In baseball, when a batter “walks”, he gets a free pass to first and is not out.
    In cricket, it means the batsman declares himself out before the umpire has a chance to make the call. This classic show of sportsmanship is considered unsportsmanlike in baseball.

    Interest in Chess

    When the chess world wants to know if someone has cheated, they call Ken. He is an international chess master, and has worked on stopping cheating for years. It is important these days, since most tournaments are now online. And cheating is easier when no one is directly able to watch you. Ken is busy.

    Let’s look at the cheating problem. Suppose that Alice and Bob are playing an online game of chess. Alice makes her own moves, but she wonders if Bob could be cheating. He could be using advice from another “player”, Sally. There are several points:

    1. Sally is a stronger player than anyone—she can easily beat Alice and Bob.
    2. Sally not only says “here is my move”—she will sometimes give several good moves.
    3. Sally is a program that is deterministic—given a position she gives the same answer.The issue for Ken is: When Alice played Bob did Bob make the moves or did he consult Sally?

    There are many complexities:

    1. What if Bob agreed with all Sally’s moves? Then he certainly did cheat.
    2. What if Bob was just lucky and played above his strength? Then he did not cheat.
    3. What if Bob used Sally for some positions but not others? Then he did cheat, but it may be hard to be sure.
    4. And so on.

    What Ken has done is create both a theory and programs to determine whether Bob did indeed cheat. I find the general problem of telling if one cheats online at chess to be fascinating. See us before for more details and also see this.

    Open Problems

    Ken is one of the nicest people I know. Hope he has many more birthdays and many more twin primes.


by rjlipton at September 15, 2020 04:24 PM UTC

A 2-year postdoc position in Theoretical Computer Science/Algorithmic Foundations of AI is available in the Algorithms and Data Structures group at Aarhus University, Denmark.

Candidates should have a recent PhD in Computer Science, Mathematics, or Economics on topics that fall within algorithmic game theory or computational social choice, and a strong publication record.


by shacharlovett at September 14, 2020 05:10 PM UTC

 Last seek I blogged about two math problems of interest to me here.

One of them two people posted answers, which was great since I didn't know how to solve them and now I do. Yeah! I blogged about that here.

The other problem got no comments, so I suppose it was of interest to me but not others. I was interested in it because the story behind it is interesting, and the answer is interesting.

it is from the paper 

An interesting and serendipitous number by John Ewing and Ciprian Foias, which is a chapter in the wonderful book 

Finite vs Infinite: Contributions to an eternal dilemma

Here is the story, I paraphrase the article (I'll give pointers  later).

In the mid 1970's a student asked Ciprian about the following math-competition problem:

x(1)>0    x(n+1) =  (1 + (1/x(n)))^n. For which x(1) does x(n) --> infinity?

It turned out this was a misprint. The actual problem was

x(1)>0  x(n+1)=(1+(1/x(n))^{x(n)}. For which x(1) does x(n) --> infinity.

The actual math-comp problem  (with exp x(n)) is fairly easy (I leave it to you.) But this left the misprinted problem (with exp n).  Crispian proved that there is exactly ONE x(1) such that x(n)--> infinity. 

Its approx 1.187... and may be trans.

I find the story and the result interesting, but the proof is to long for a blog post.

I tried to find the article online and could not. A colleague found the following:

A preview of the start of the article here

Wikipedia Page on the that number, called the Foias constant, here

Mathworld page on that number here

Most of the article but skips two pages here

by gasarch ( at September 14, 2020 04:16 PM UTC

In this post, we highlight the connection between Broadcast and Agreement in the synchronous model. Broadcast and Agreement: How can you implement one from the other? We defined Agreement and Broadcast in a previous post, here is a recap: Agreement A set of $n$ nodes where each node $i$ has...

at September 14, 2020 02:07 PM UTC

Consider the problem of computing the majority of a stream of $n$ i.i.d. uniformly random bits. This problem, known as the {\it coin problem}, is central to a number of counting problems in different data stream models. We show that any streaming algorithm for solving this problem with large constant advantage must use $\Omega(\log n)$ bits of space. We extend our lower bound to proving tight lower bounds for solving multiple, randomly interleaved copies of the coin problem, as well as for solving the OR of multiple copies of a variant of the coin problem. Our proofs involve new measures of information complexity that are well-suited for data streams. We use these lower bounds to obtain a number of new results for data streams. In each case there is an underlying $d$-dimensional vector $x$ with additive updates to its coordinates given in a stream of length $m$. The input streams arising from our coin lower bound have nice distributional properties, and consequently for many problems for which we only had lower bounds in general turnstile streams, we now obtain the same lower bounds in more natural models, such as the bounded deletion model, in which $\|x\|_2$ never drops by a constant fraction of what it was earlier, or in the random order model, in which the updates are ordered randomly. In particular, in the bounded deletion model, we obtain nearly tight lower bounds for approximating $\|x\|_{\infty}$ up to additive error $\frac{1}{\sqrt{k}} \|x\|_2$, approximating $\|x\|_2$ up to a multiplicative $(1 + \epsilon)$ factor (resolving a question of Jayaram and Woodruff in PODS 2018), and solving the Point Query and $\ell_2$-Heavy Hitters Problems. In the random order model, we also obtain new lower bounds for the Point Query and $\ell_2$-Heavy Hitters Problems. We also give new algorithms complementing our lower bounds and illustrating the tightness of the models we consider, including an algorithm for approximating $\|x\|_{\infty}$ up to additive error $\frac{1}{\sqrt{k}} \|x\|_2$ in turnstile streams (resolving a question of Cormode in a 2006 IITK Workshop), and an algorithm for finding $\ell_2$-heavy hitters in randomly ordered insertion streams (which for random order streams, resolves a question of Nelson in a 2018 Warwick Workshop).

at September 14, 2020 02:02 AM UTC

We continue our series of posts on State Machine Replication (SMR). In this post, we move from consensus under crash failures to consensus under omission failures. We still keep the synchrony assumption. Let’s begin with a quick overview of what we covered in previous posts: Upper bound: We can tolerate...

at September 13, 2020 07:09 PM UTC

Continuous can beat discrete

Nisheeth Vishnoi is a professor at Yale University in the computer science department. The faculty there is impressive and includes many of the top researchers in the world. The CS faculty is pretty good too. As Nisheeth’s PhD advisor, years ago, I am proud that he is at Yale.

Today I wish to discuss a new book by Nisheeth.

The title is Algorithms for Convex Optimization. Let me jump ahead and say that I like the book and especially this insight:

One way to solve discrete problems is to apply continuous methods.

This is not a new insight, but is an important one. Continuous math is older than discrete and often is more powerful. Some examples of this are:

{\bullet} Analytic number theory is based on the behavior of continuous functions. Some of the deepest theorems on prime numbers use such methods. Think of the Riemann zeta function

\displaystyle  \zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} = \frac{1}{1^s} + \frac{1}{2^s} + \frac{1}{3^s} + \cdots

as a function of complex numbers {s}.

{\bullet} Additive number theory is based on the behavior of continuous functions. Think of generating functions and Fourier methods.

The power of continuous methods is one that I sometimes forget. Nisheeth’s book is a testament to the power of this idea.


Nisheeth’s book uses another fundamental idea from complexity theory. This is: restrict problems in some way. Allowing too large a class usually makes complexity high. For example, trees are easier in general than planar graphs, and sparse graphs are easier than general graphs. Of course “in general” must be controlled, but restricting the problem types does often reduce complexity.

Convexity adds to this tradition since convex generalizes the notion of linear. And convex problems of all kinds are abundant in practice, abundant in theory, and are important.

The MW dictionary says convex means:

: being a continuous function or part of a continuous function with the property that a line joining any two points on its graph lies on or above the graph.

Here is a passage by Roman Dwilewicz on the history of the convexity concept:

It was known to the ancient Greeks that there are only five regular convex polyhedra.

It seems that the first more rigorous definition of convexity was given by Archimedes of Syracuse, (ca 287 – ca 212 B.C.) in his treatise: On the sphere and cylinder.

These definitions and postulates of Archimedes were dormant for about two thousand years!

I say it’s lucky that Archimedes was not up for tenure.

Nisheeth’s Book

Nisheeth’s book is now available at this site. I have just started to examine it and must say I like the book. Okay, I am not an expert on convex algorithms, nor am I an expert on this type of geometric theory. But I definitely like his viewpoint. Let me explain in a moment.

First I cannot resist adding some statistics about his book created here:

No way I can read the book in nine hours. But I like seeing how many characters and so on the book has. I will have to calculate the same for other books.

Discrete vs Continuous Methods

Nisheeth in his introduction explains how continuous methods help in many combinatorial problems, like finding flows on graphs. He uses the {s}{t} flow problem as his example. The {s}{t}-maximum flow problem arises in real-world scheduling problems, but is also a fundamental combinatorial problem that can be used to find a maximum matching in a bipartite graph, for example.

Combinatorial algorithms for the maximum flow problem. He points out that by building on the Ford-Fulkerson method, various polynomial-time results were proved and other bounds were improved. But he states that the improvements stopped in 1998. Discrete methods seem to be unable to improve complexity for flow problems.

Convex programming-based algorithms. He adds:

Starting with the paper by Paul Christiano, Jonathan Kelner, Aleksander Mądry, Daniel Spielman, Shang-Hua Teng
the last decade has seen striking progress on the {s}{t} maximum flow problem. One of the keys to this success has been to abandon combinatorial approaches and view the {s}{t} maximum flow problem through the lens of continuous optimization.

Thus, at this point it may seem like we are heading in the wrong direction. We started off with a combinatorial problem that is a special type of a linear programming problem, and here we are with a nonlinear optimization formulation for it. Thus the questions arise: which formulation should we chose? and, why should this convex optimization approach lead us to faster algorithms?


Open Problems

Take a look at Nisheeth’s site for the answers.

I wish I were better informed about continuous methods in general. They are powerful and pretty. Maybe I could solve an open problem that I have thought about if I knew this material better. Hmmm. Maybe it will help you solve some open problem of your own. Take a look at his book.


by rjlipton at September 13, 2020 06:27 PM UTC

We prove that the Impagliazzo-Nisan-Wigderson (STOC 1994) pseudorandom generator (PRG) fools ordered (read-once) permutation branching programs of unbounded width with a seed length of $\widetilde{O}(\log d + \log n \cdot \log(1/\varepsilon))$, assuming the program has only one accepting vertex in the final layer. Here, $n$ is the length of the program, $d$ is the degree (equivalently, the alphabet size), and $\varepsilon$ is the error of the PRG. In contrast, we show that a randomly chosen generator requires seed length $\Omega(n \log d)$ to fool such unbounded-width programs. Thus, this is an unusual case where an explicit construction is "better than random." Except when the program's width $w$ is very small, this is an improvement over prior work. For example, when $w = \text{poly}(n)$ and d = 2, the best prior PRG for permutation branching programs was simply Nisan's PRG (Combinatorica 1992), which fools general ordered branching programs with seed length $O(\log(wn/\varepsilon) \log n)$. We prove a seed length lower bound of $\widetilde{\Omega}(\log d + \log n \cdot \log(1/\varepsilon))$ for fooling these unbounded-width programs, showing that our seed length is near-optimal. In fact, when $\varepsilon \leq 1 / \log n$, our seed length is within a constant factor of optimal. Our analysis of the INW generator uses the connection between the PRG and the derandomized square of Rozenman and Vadhan (RANDOM 2005) and the recent analysis of the latter in terms of unit-circle approximation by Ahmadinejad et al. (FOCS 2020).

at September 13, 2020 03:39 AM UTC

The field of Interactive Coding studies how an interactive protocol can be made resilient to channel errors. Even though this field has received abundant attention since Schulman's seminal paper (FOCS 92), constructing interactive coding schemes that are both deterministic and efficient, and at the same time resilient to adversarial errors (with constant information and error rates), remains an elusive open problem. An appealing approach towards resolving this problem is to show efficiently encodable and decodable constructions of a combinatorial object called tree codes (Schulman, STOC 93). After a lot of effort in this direction, the current state of the art has deterministic constructions of tree codes that are efficiently encodable but require a logarithmic (instead of constant) alphabet (Cohen, Haeupler, and Schulman, STOC 18). However, we still lack (even heuristic) candidate constructions that are efficiently decodable. In this work, we show that tree codes that are efficiently encodable, {\em but not efficiently decodable}, also imply deterministic and efficient interactive coding schemes that are resilient to adversarial errors. Our result immediately implies a deterministic and efficient interactive coding scheme with a logarithmic alphabet (i.e., $1/\log \log$ rate). We show this result using a novel implementation of hashing through deterministic tree codes that is powerful enough to yield interactive coding schemes.

at September 11, 2020 04:55 PM UTC

We construct an explicit family of 3XOR instances which is hard for Omega(sqrt(log n)) levels of the Sum-of-Squares hierarchy. In contrast to earlier constructions, which involve a random component, our systems can be constructed explicitly in deterministic polynomial time. Our construction is based on the high-dimensional expanders devised by Lubotzky, Samuels and Vishne, known as LSV complexes or Ramanujan complexes, and our analysis is based on two notions of expansion for these complexes: cosystolic expansion, and a local isoperimetric inequality due to Gromov. Our construction offers an interesting contrast to the recent work of Alev, Jeronimo and the last author (FOCS 2019). They showed that 3XOR instances in which the variables correspond to vertices in a high-dimensional expander are easy to solve. In contrast, in our instances the variables correspond to the edges of the complex.

at September 11, 2020 04:58 AM UTC

Which is best for students?

Cropped from Wikipedia src

Moshe Vardi holds multiple professorships at Rice University. He is also the Senior Editor of Communications of the ACM. His is therefore a voice to be reckoned with in the current debate over how best to teach during the pandemic. Much of the debate is over whether all should hear his voice the same way, or some hear it in the classroom while others hear it remotely.

Today we note his recent column for Medium advocating the former. Then I (Ken) give some of my own impressions.

His September 5 column followed an August 8 opinion given to the Rice student newspaper. Both begin with concern over the conflict between safety and value for students. Much of the value of college—most according to statistics he cites—comes from being collegial: outside the classroom. But many such activities, not only evening parties but informal games and gatherings, are the most unsafe.

We will focus however on what Moshe says about the nature of instruction for lecture courses. Certainly for laboratory courses there is a sharp trade-off between safety and in-person interaction. But we focus here on what he says about the nature of teaching in the lecture hall, where one can take safety as a given requirement.

An In-Person Remoteness Paradox

I have just returned from sabbatical at the University at Buffalo (UB) and am teaching this fall a small elective 4xx/5xx theory course. It has 15 students, smaller than the 25 in the hypothetical class Moshe describes but of the same order of magnitude. In the spring I will be teaching a larger undergraduate course which is also on target for his concerns. I have taught such a class every spring for a decade. While this assignment is not a new to me, the issue of safety raises tough choices about the delivery options. My options are:

  1. remote-only;

  2. in-person only;

  3. hybrid use of 1 and 2 for designated course components;

  4. hybrid-flexible, meaning 1 and 2 are conducted simultaneously with students free to choose either option, even on a per-lecture basis.

I have committed to hybrid-flexible. For my current fall course, I made this commitment in early summer when there was uncertainty over in-person instruction requirements for student visas and registration. I believe that my larger course will be implemented as safely in a large room as my current course. The question is quality.

Moshe notes right away a paradox for his hypothetical class that could apply to any of modes 2–4; to include the last expressly, I’ve inserted the word “even”:

…I realized that [even] the students in the classroom will have to be communicating with me on Zoom, to be heard and recorded. All this, while both the students and I are wearing face masks. It dawned on me then that I will be conducting remote teaching in the classroom.

Business Insider source—yet another variation

In fact, I have one volunteer now in the room logging into Zoom to help with interaction from those attending remotely. This helps because my podium has less space to catch faces and detect questions right away. I do repeat questions so they are picked up in the recording and often redirect them to the class. Still, the mere fact of my not seeing faces alongside the notes and interactive drawings I am sharing makes me feel Moshe’s paradox all the time. This is even though my room allows denser spacing than at Rice, so a class of 25 could sit closer. Let me, however, say why I love stand-up teaching before addressing his paramount question of what is best for the students at this time.

From Whiteboards to Tutorials

Dick once wrote a post, “In Praise of Chalk Talks.” First, with reference to talks pre-made using PowerPoint or LaTeX slides, Dick wrote:

Such talks can be informative and easy to follow, yet sometimes PowerPoint is not well suited to giving a proof. The slides do not hold enough state for us to easily follow the argument.

Moreover, when I contributed to the open-problems session of the workshop at IAS in Princeton, NJ that we covered two years ago, Avi Wigderson insisted that everyone use chalk, not slides. I’ve used slides for UB’s data-structures and programming languages courses, but I think students benefit from seeing proofs and problem-solving ideas grow.

I find furthermore that the feel of immersion in a process of discovery is enhanced by an in-person presence. I had this in mind when I followed Dick’s post with a long one imagining Kurt Gödel expounding the distinctive points of his set theory (joint with Paul Bernays and John von Neumann), all on one chalkboard. My classes are not as interactive as in that post, but I prepare junctures in lectures for posing questions and doing a little bit of Socratic method. And I try to lead this with body language as well as voice inflection, whether at a whiteboard or drawing on hard paper via a document camera.

Still, it exactly this “extra” that gets diminished for those who are remote. When I share my screen for notes or a drawing (both in MathCha), they see my movements only in a small second window if at all. They do hear my voice—but I do not hear theirs even if they unmute themselves. Nor can I read their state of following as I do in the room. Without reiterating the safety factor as Moshe does, I can reformulate his key question as:

Does the non-uniformity and inequality of hybrid delivery outweigh the benefits of making in-person instruction available to some?

I must quickly add that in-person teaching is perceived as a collective need at UB. The web form I filled for Spring 2021 stated that some in-person classes must be available at all levels, 1xx through 7xx. I am happy to oblige. But the fact that I chose a flexible structure, especially in a small class, does allow the students to give opinion on this question, as well as on something Moshe says next:

“Remote teaching” actually can do a better job of reproducing the intimacy that we take for granted in small classes.

Toward this end, I am implementing a remote version of the tutorial system I was part of for two eight-week terms at Oxford while a junior fellow of Merton College. When Cambridge University declared already last May that there would be no in-person lectures all the way through summer 2021, this is because most lectures there are formally optional anyway. The heart of required teaching is via weekly tutorial hours in groups of one-to-three students. They are organized separately by each of thirty-plus constituent colleges rather than by department-centered staff, so the numbers are divided to be manageable. In my math-course tutorials the expectation was for each student to present a solved problem and participate in discussions that build on the methods.

I am doing this every other week this fall, alternating with weeks of problem-set review that will be strictly optional and classed as enhanced office hours. All UB office hours must be remote anyway. The tutorial requirement was agreed by student voice-vote in a tradeoff with lowering the material in timed exams to compensate for differences in home situations. After a few weeks of this, the class will take stock for opinions on which delivery options work best. UB has already committed to being remote-only after Thanksgiving, and it is possible that the on-campus medical situation will trigger an earlier conversion anyway.

Open Problems

We would like to throw the floor open for comment on Moshe’s matters that we’ve highlighted and on his other opinions about the university mission amid the current crisis more generally.

[edited to reflect that at Rice too, the hypothetical class could be in any of modes 2–4, and that spacing is further than in my UB room. “Princeton IAS” -> “IAS in Princeton, NJ”]

by KWRegan at September 10, 2020 10:26 PM UTC

TL;DR: Know of a great recent paper that should be highlighted to the theory community and beyond? Email a nomination to by October 19th.

The goal of the SIGACT Research Highlights Committee is to help promote
top computer science theory research via identifying results that are of
high quality and broad appeal to the general computer science audience.
These results would then be recommended for consideration for the CACM Research Highlights section as well as other general-audience computer science research outlets.

Nomination and Selection Process:

The committee solicits two types of nominations:

1) Conference nominations. Each year, the committee will ask the PC
chairs of theoretical computer science conferences to send a selection
of up to three top papers from these conferences (selected based on both
their technical merit and breadth of interest to non-theory audience)
and forwarding them to the committee for considerations.

2) Community nominations. The committee will accept nominations from the members of the community. Each such nomination should summarize the contribution of the nominated paper and also argue why this paper is
suitable for broader outreach. The nomination should be no more than a
page in length and can be submitted at any time by emailing it to Self-nominations are

The nomination deadline is Monday, October 19, 2020 .


The SIGACT Research Highlights Committee currently comprises the
following members:

Boaz Barak, Harvard University
Irit Dinur, Weizmann Institute of Science
Aleksander Mądry, Massachusetts Institute of Technology (chair)
Jelani Nelson, University of California, Berkeley

by Boaz Barak at September 10, 2020 02:17 PM UTC