Theory of Computing Blog Aggregator

J Scott Provan (site)

The following post was kindly contributed by Karim Adiprasito. (Here is the link to Karim’s paper.)

So, Gil asked me to say a little bit about my proof of the g-conjecture (and some other conjectures in discrete geometry) on his blog, and since he bought me many  coffees to explain it to him (or if he is to be believed, the department paid), I am happy to oblige.

So, I want to explain a special, but critical case to the proof. It contains the some shadow of the core ideas necessary, but needs some more tricks I will remark on afterwards.

Also, I wanted to take this opportunity to mention something marvelous that I learned from Leonid Gurvits recently that increased my own understanding of one of the key tricks used indefinitely. That trick is the following cool lemma.

Leonid Gurvits

Perturbation lemma

PERTURBATION LEMMA: Consider two linear maps

\alpha, \beta: X\ \longrightarrow\ Y

of two real vector spaces X and Y. Assume that

\beta (\ker \alpha ) \cap \rm{im}~ \alpha =\{0\} \subset Y.

Then a generic linear combination \alpha ``+"\beta of \alpha and \beta  has kernel
\ker (\alpha  ``+" \beta )= \ker \alpha \cap \ker \beta.

Cool, no? Proof, then: Find a subspace A of X such that

\alpha A\ =\ \alpha X\quad \text{and}\ \quad X\ \cong\ A \oplus \ker\alpha

so that in particular \alpha is injective on A. Then, for \epsilon \ge 0 small enough, the image of

\alpha\ +\ \epsilon \beta{:}\ X\ \longrightarrow\ Y


(\alpha\ +\ \epsilon \beta)(A)\ +\ \beta\ker\alpha\ \subset\ Y.

But if we norm Y in any way, then (\alpha+\epsilon \beta)(A) approximates \alpha A as \epsilon tends to zero, which is linearly independent from \beta\, \ker\alpha by assumption. WALLA

Now, how is this used.

Face rings

Let me set up some of the basic objects.

If \Delta is an abstract simplicial complex on ground-set [n]:= \{1,\cdots,n\}, let I_\Delta := \langle \textbf{x}^{\textbf{a}}: supp (\textbf{a})\notin\Delta\rangle denote the nonface ideal in \mathbb{R}[\mathbf{x}], where \mathbb{R}[\mathbf{x}]=\mathbb{R}[x_1,\cdots,x_n].

Let \mathbb{R}^\ast[\Delta]:= \mathbb{R}[\mathbf{x}]/I_\Delta denote the face ring of \Delta. A collection of linear forms \Theta=(\theta_1,\cdots,\theta_l) in the polynomial ring \mathbb{R}[\textbf{x}] is a partial linear system of parameters if

\dim_{\rm{Krull}} {\mathbb{R}^\ast[\Delta]} {\Theta \mathbb{R}^\ast[\Delta]} =\dim_{\rm{Krull}} \mathbb{R}^\ast[\Delta]-l,

for \dim_{\rm{Krull}} the Krull dimension. If l=\dim_{\rm{Krull}} \mathbb{R}^\ast[\Delta] = \dim \Delta +1, then \Theta is simply a linear system of parameters, and the corresponding quotient A(\Delta)={\mathbb{R}^\ast[\Delta]}/{\Theta \mathbb{R}^\ast[\Delta]} is called an Artinian reduction of \mathbb{R}^\ast[\Delta].

The g-conjecture

The g-conjecture (as described earlier  in Gil’s blog) is implied by the following property:

(HL) For every sphere S of even dimension d-1=2k, there is an Artinian reduction A(S) and a degree one element \ell such that the map

A^k(S) \ \xrightarrow{\ \cdot \ell\ }\ A^{k+1}(S)

is an isomorphism.

This is quite a reasonable demand. Indeed, Graebe proved that A^d(S) \cong \mathbb{R} and that the resulting pairing

A^k(S) \times A^{k+1}(S)\rightarrow \mathbb{R}

is perfect, so A^k(S) and A^{k+1}(S) are isomorphic as vector spaces. We shall call this property (PD), because it is a special case of Poincaré pairing.

(HL) is a special case of the Hard Lefschetz Theorem I prove in my paper, and we will prove it for a subset of all triangulated spheres here. Proving it for all spheres implies the g-conjecture (and other conjectures, such as the Grünbaum conjecture), and proving the hard Lefschetz theorem in full generality is not much harder.

Lou Billera

Vertex-decomposable spheres

Lets recall a cool notion due to Provan and Billera: A pure simplicial d-complex is vertex decomposable if it is a simplex, or there exists a vertex whose link is vertex decomposable of dimension d-1 and its deletion is vertex decomposable of dimension d.

We restrict our attention to vertex decomposable spheres and disks and assume the boundary of the link is vertex decomposable as well in every step.

THEOREM: Vertex decomposable spheres satisfy (HL).

We prove this theorem by induction on dimension, the base case of zero-dimensional spheres (k=0) being clear.

Lets label the vertices of S in order of their vertex decomposition, from 1 to n. Now, \ell will be a linear combination of indeterminates, so lets assume we have constructed an element \ell_i that uses just the first i of them, and such that \ell_i itself is as close to a Lefschetz element as possible for its kind, that is, the kernel of

A^k(S) \ \xrightarrow{\ \cdot \ell_i\ }\ A^{k+1}(S)

is the intersection of kernels of the maps

A^k(S) \ \xrightarrow{\ \cdot x_j\ }\ A^{k+1}(S)

where j ranges from 1 to i.

We want to construct a map \ell_{i+1} with this property (which I call the transversal prime property}. To this effect, we want to apply the perturbation lemma to the maps \beta x_{i+1}, \alpha=\ell_i, and with respect to the spaces X=A^k(S) and Y=A^{k+1}(S). Let us denote by D the ball given as the union of neighborhoods of the first i vertices.

For this, we have to find out the kernel of \ell_i. But this is the the ideal in A(S) generated by the monomials which are not divisible by any of the first i indeterminates. Lets call it K, and its restriction to A(st_{i+1} \Sigma) by K_{i+1}. Lets also look at the image I of \ell_i, which by Graebe’s theorem is exactly the span of the images of the maps the maps

A^k(S) \ \xrightarrow{\ \cdot x_j\ }\ A^{k+1}(S)

where j ranges from 1 to i.

But then, x_{i+1}K \cap I is 0 in degree k+1 if and only if A(st_{i+1} \partial D) is 0 in degree k. Why is that? Because with respect to the Poincaré pairing, x_{i+1}K \cap I (in degree k+1) and A(st_{i+1} \partial D) (in degree k) are dual.
The ring A(st_{i+1} \partial D) is obtained by taking \mathbb{R}[st_{i+1} \partial D], seen as a quotient of \mathbb{R}[S] and modding out by the ideal generated by the linear system \Theta. But that is of length d, even though st_{i+1} \partial D is only of dimension d-2. We can remove the vertex i+1 for the price of removing one of the linear forms, but then we have the same issue, having a (d-3)-sphere st_{i+1} \partial D and a system \Theta' of length d-1. Still, one too many! Taking a subsystem of length d-2, we obtain an Artinian reduction for \mathbb{R}[lk_{i+1} \partial D] via a linear system \Theta'', but what happens to the additional linear form of \Theta' not in \Theta''? It has to act as a Lefschetz element on \mathbb{R}[lk_{i+1} \partial D]/\Theta''\mathbb{R}[lk_{i+1} \partial D] if we want

A(st_{i+1} \partial D)\ \cong\ \mathbb{R}[lk_{i+1} \partial D]/\Theta'\mathbb{R}[lk_{i+1} \partial D]

to be trivial in degree k. But we may assume so by induction! Hence, we can choose \ell_{i+1} as the generic sum of \ell_i and x_{i+1} by the perturbation lemma.

So, ultimately, we can construct a map \ell_n with the transversal prime property. But then its kernel is the intersection of the kernels of

A^k(S) \ \xrightarrow{\ \cdot x_j\ }\ A^{k+1}(S) ,

where j ranges from 1 to n. But that is 0SABABA.

And beyond?

Now, we have the Lefschetz theorem for a special class, but that is less than what we want in the end, since vertex decomposable spheres are few and in between (do you see a reason why? there are many). So, what do we do? For a long time, I tried to extend the perturbation lemma to combine more than two maps.
Recently (depending on when Gil puts this post on the blog), I met Leonid Gurvits for the first time on a marvelous little conference at the Simons Institute. I knew that the problem is related to Hall’s Marriage Theorem for operators (I explain this connection a bit further in my paper), but Leonid enlightened this further by pointing me towards several nice papers, starting with his work on Quantum Matching Theory. Indeed, finding a good extension to three and more maps would essentially mean that we could also find Hall Marriage Type Theorems for 3-regular hypergraphs, which we know for complexity reasons to be unlikely.

So what can we do instead? Well, it turns out that I only really needed to look at the k-skeleton of S above, and there is no need to be vertex decomposable. It is enough to find another nicely decomposable d-1-manifold that contains it the k-skeleton of S, and then use some technical topological tricks to connect the local picture to global homological properties.




by Gil Kalai at February 23, 2019 04:19 PM UTC

Finding a set of nearly independent objects

Wikipedia bio source

Giuseppe Vitali was the mathematician who famously used the Axiom of Choice, in 1905, to give the first example of a non-measurable subset of the real numbers.

Today I want to discuss another of his results that is a powerful tool.

The existence of a set that cannot properly be assigned a measure was a surprise at the time, and still is a surprise. It is a wonderful example of the power of the Axiom of Choice. See this for details.

We are interested in another of his results that is more a theorem about coverings. It is the Vitali covering theorem–see this. The theorem shows that a certain type of covering—ah, we will explain the theorem in a moment.

The power of this theorem is that it can be used to construct various objects in analysis. There are now many applications of this theorem. It is a powerful tool that can be used to prove many nice results. I do not know of any—many?—applications of the existence of a non-measurable set. Do you know any?

Making A Mapping Injective

Let’s look at an application of the Vitali theorem that may be new. But in any case it may help explain what the Vitali theorem is all about.

Suppose that {F:X \rightarrow Y}. We can make the map surjective if we restrict {Y} to be equal to {F(X)}. It is not so simple to make the map injective, but we can in general do that also.

Theorem 1 Let {F} be a surjective function from {X} to {Y}. Then there is a subset {S} of {X} so that {F} is injective from {S} to {Y}.

Proof: For each {y} in {Y} select one {x} from the set {F^{-1}(y)} and place it into {S}. Recall {F^{-1}(y)} is the set of {z} so that {F(z)=y}.This of course uses the Axiom of Choice to make the choices of which {x} to choose. Then clearly {S} is the required set. \Box

The difficulty with this trivial theorem is that {S} cannot be controlled easily if it is constructed via the Axiom of Choice. It could be a very complicated set. Our goal is to see how well we can control {S} if we assume that the mapping {F} is smooth.

How can we do better? The answer is quite a bit better if we assume that {F} is a “nice” function. We give up surjectivity onto {Y} but only by a null set.

Theorem 2 Suppose that {F} is a surjective smooth map from {X} to {Y} where {X} and {Y} are open subsets of {{\mathbb R}}. Also suppose that {F} locally is invertible. Then there is a subset {S} of {X} so that

  1. The complement of {F(S)} is a null set.

  2. The map {F} is injective from {S} to {F(S)}.

That is that for all distinct points {\boldsymbol{a}} and {\boldsymbol{b}} in {S}, {F(\boldsymbol{a}) \neq F(\boldsymbol{b})}. Moreover the map from {F(S)} to {S} is smooth.

Set Coverings

How can we prove this theorem? An obvious idea is to do the following. Pick an open interval {U=[a,b]} in {X} so that {F(U) = V} for an open set in {V} and so that {F} is injective from {U} to {V}. Setting {S} to {U} clearly works: the map {F} is injective on {S}. This is far from the large set that we wish to have, but it is a start. The intuition is to select another open interval {U'} that is disjoint from {U} so that again {F} is injective from {U'} to {V'}. We can then add {U'} to our {S}.

We can continue in this way and collect many open sets that we add to {S}. Can we arrange that the union of these sets yield a {S} so that {F(S)} is most of {Y}? In general the answer is no. Suppose that the intervals are the following:

\displaystyle  [k,k+1.1]

for {k=0,1,2,\dots} Roughly we can only get about half of the space that the intervals cover and keep the chosen intervals disjoint. If we select { [k,k+1.1] } then we cannot select {[k+1,k+1+1.1]} since

\displaystyle  [k,k+1.1] \cap [k+1,k+1+1.1] \neq \emptyset.

Vitali’s theorem comes to the rescue. It allows us to avoid his problem, by insisting that intervals have an additional property.

The Vitali Covering Theorem

The trick is to use a refinement of a set cover that allows a disjoint cover to exist for almost all of the target set. The next definition is critical to the Vitali covering theorem.

Definition 3 Let {U} be a subset of {{\mathbb R}}. Let {[a_{\lambda},b_{\lambda}]} be intervals over {\lambda} in some index set {I}. We say these intervals are a cover of {U} proved {U} is a subset of the union of all the intervals. Say the intervals also are a Vitali cover of {U} provided for all points {x} in {U} and all {\epsilon > 0}, there is an interval {[c,d]} that contains {x} and {0 < d-c < \epsilon}.

The Vitali theorem is the following:

Theorem 4 Let {U} be a subset of {{\mathbb R}}. Let {[a_{\lambda},b_{\lambda}]} be intervals for {\lambda} in some index set {I}. Assume that the family is a Vitali cover of {U}. Then there is a countable subfamily of disjoints intervals in the family so that they cover all of {U} except for possibly a null set.

The Vitali theorem can be extended to any finite dimensional space {{\mathbb R}^{n}}. Then intervals become disks and so on.

Open Problems

Do you see how to prove Theorem 2 from Vitali’s theorem? The insight is now one can set up a Vitali covering of the space {Y}.

by RJLipton+KWRegan at February 23, 2019 04:58 AM UTC

The Minimum Circuit Size Problem (MCSP) asks if a given truth table of a Boolean function $f$ can be computed by a Boolean circuit of size at most $\theta$, for a given parameter $\theta$. We improve several circuit lower bounds for MCSP, using pseudorandom generators (PRGs) that are local; a PRG is called local if its output bit strings, when viewed as the truth table of a Boolean function, can be computed by a Boolean circuit of small size. We get new and improved lower bounds for MCSP that almost match the best-known lower bounds against several circuit models. Specifically, we show that computing MCSP, on functions with a truth table of length $N$, requires $N^{3-o(1)}$-size de Morgan formulas, improving the recent $N^{2-o(1)}$ lower bound by Hirahara and Santhanam (CCC, 2017), $N^{2-o(1)}$-size formulas over an arbitrary basis or general branching programs (no non-trivial lower bound was known for MCSP against these models), and $2^{\Omega\left(N^{1/(d+2.01)}\right)}$-size depth-$d$ $AC^0$ circuits, improving the superpolynomial lower bound by Allender et al. (SICOMP, 2006). The $AC^0$ lower bound stated above matches the best-known $AC^0$ lower bound (for PARITY) up to a small additive constant in the depth. Also, for the special case of depth-$2$ circuits (i.e., CNFs or DNFs), we get an almost optimal lower bound of \(2^{N^{1-o(1)}}\) for MCSP.

at February 23, 2019 01:22 AM UTC

Authors: Holden Lee, Oren Mangoubi, Nisheeth K. Vishnoi
Download: PDF
Abstract: Given a sequence of convex functions $f_0, f_1, \ldots, f_T$, we study the problem of sampling from the Gibbs distribution $\pi_t \propto e^{-\sum_{k=0}^t f_k}$ for each epoch $t$ in an online manner. This problem occurs in applications to machine learning, Bayesian statistics, and optimization where one constantly acquires new data, and must continuously update the distribution. Our main result is an algorithm that generates independent samples from a distribution that is a fixed $\varepsilon$ TV-distance from $\pi_t$ for every $t$ and, under mild assumptions on the functions, makes poly$\log(T)$ gradient evaluations per epoch. All previous results for this problem imply a bound on the number of gradient or function evaluations which is at least linear in $T$. While we assume the functions have bounded second moment, we do not assume strong convexity. In particular, we show that our assumptions hold for online Bayesian logistic regression, when the data satisfy natural regularity properties. In simulations, our algorithm achieves accuracy comparable to that of a Markov chain specialized to logistic regression. Our main result also implies the first algorithm to sample from a $d$-dimensional log-concave distribution $\pi_T \propto e^{-\sum_{k=0}^T f_k}$ where the $f_k$'s are not assumed to be strongly convex and the total number of gradient evaluations is roughly $T\log(T)+\mathrm{poly}(d),$ as opposed to $T\cdot \mathrm{poly}(d)$ implied by prior works. Key to our algorithm is a novel stochastic gradient Langevin dynamics Markov chain that has a carefully designed variance reduction step built-in with fixed constant batch size. Technically, lack of strong convexity is a significant barrier to the analysis, and, here, our main contribution is a martingale exit time argument showing the chain is constrained to a ball of radius roughly poly$\log(T)$ for the duration of the algorithm.

at February 23, 2019 11:23 PM UTC

Authors: Alexander Cowtan, Silas Dilkes, Ross Duncan, Alexandre Krajenbrink, Will Simmons, Seyon Sivarajah
Download: PDF
Abstract: We introduce a new architecture-agnostic methodology for mapping abstract quantum circuits to realistic quantum computing devices with restricted qubit connectivity, as implemented by Cambridge Quantum Computing's tket compiler. We present empirical results showing the effectiveness of this method in terms of reducing two-qubit gate depth and two-qubit gate count, compared to other implementations.

at February 23, 2019 11:22 PM UTC

Authors: Talya Eden, Dana Ron, Will Rosenbaum
Download: PDF
Abstract: In this paper, we revisit the problem of sampling edges in an unknown graph $G = (V, E)$ from a distribution that is (pointwise) almost uniform over $E$. We consider the case where there is some a priori upper bound on the arboriciy of $G$. Given query access to a graph $G$ over $n$ vertices and of average degree $d$ and arboricity at most $\alpha$, we design an algorithm that performs $O\!\left(\frac{\alpha}{d} \cdot \frac{\log^3 n}{\varepsilon}\right)$ queries in expectation and returns an edge in the graph such that every edge $e \in E$ is sampled with probability $(1 \pm \varepsilon)/m$. The algorithm performs two types of queries: degree queries and neighbor queries. We show that the upper bound is tight (up to poly-logarithmic factors and the dependence in $\varepsilon$), as $\Omega\!\left(\frac{\alpha}{d} \right)$ queries are necessary for the easier task of sampling edges from any distribution over $E$ that is close to uniform in total variational distance. We also prove that even if $G$ is a tree (i.e., $\alpha = 1$ so that $\frac{\alpha}{d}=\Theta(1)$), $\Omega\left(\frac{\log n}{\log\log n}\right)$ queries are necessary to sample an edge from any distribution that is pointwise close to uniform, thus establishing that a $\mathrm{poly}(\log n)$ factor is necessary for constant $\alpha$. Finally we show how our algorithm can be applied to obtain a new result on approximately counting subgraphs, based on the recent work of Assadi, Kapralov, and Khanna (ITCS, 2019).

at February 23, 2019 12:00 AM UTC

Authors: Avery Miller, Boaz Patt-Shamir, Will Rosenbaum
Download: PDF
Abstract: We consider the Adversarial Queuing Theory (AQT) model, where packet arrivals are subject to a maximum average rate $0\le\rho\le1$ and burstiness $\sigma\ge0$. In this model, we analyze the size of buffers required to avoid overflows in the basic case of a path. Our main results characterize the space required by the average rate and the number of distinct destinations: we show that $O(k d^{1/k})$ space suffice, where $d$ is the number of distinct destinations and $k=\lfloor 1/\rho \rfloor$; and we show that $\Omega(\frac 1 k d^{1/k})$ space is necessary. For directed trees, we describe an algorithm whose buffer space requirement is at most $1 + d' + \sigma$ where $d'$ is the maximum number of destinations on any root-leaf path.

at February 23, 2019 11:26 PM UTC

Authors: Kevin Buchin, Anne Driemel, Martijn Struijs
Download: PDF
Abstract: We study the complexity of clustering curves under $k$-median and $k$-center objectives in the metric space of the Fr\'echet distance and related distance measures. The $k$-center problem has recently been shown to be NP-hard, even in the case where $k=1$, i.e. the minimum enclosing ball under the Fr\'echet distance. We extend these results by showing that also the $k$-median problem is NP-hard for $k=1$. Furthermore, we show that the $1$-median problem is W[1]-hard with the number of curves as parameter. We show this under the discrete and continuous Fr\'echet and Dynamic Time Warping (DTW) distance. Our result generalizes an earlier result by Bulteau et al. from 2018 for a variant of DTW that uses squared distances. Moreover, closing some gaps in the literature, we show positive results for a variant where the center curve may have complexity at most $\ell$ under the discrete Fr\'echet distance. In particular, for fixed $k,\ell$ and $\varepsilon$, we give $(1+\varepsilon)$-approximation algorithms for the $(k,\ell)$-median and $(k,\ell)$-center objectives and a polynomial-time exact algorithm for the $(k,\ell)$-center objective.

at February 23, 2019 11:30 PM UTC

Authors: Johannes Bund, Christoph Lenzen, Will Rosenbaum
Download: PDF
Abstract: Synchronizing clocks in distributed systems is well-understood, both in terms of fault-tolerance in fully connected systems and the dependence of local and global worst-case skews (i.e., maximum clock difference between neighbors and arbitrary pairs of nodes, respectively) on the diameter of fault-free systems. However, so far nothing non-trivial is known about the local skew that can be achieved in topologies that are not fully connected even under a single Byzantine fault. Put simply, in this work we show that the most powerful known techniques for fault-tolerant and gradient clock synchronization are compatible, in the sense that the best of both worlds can be achieved simultaneously.

Concretely, we combine the Lynch-Welch algorithm [Welch1988] for synchronizing a clique of $n$ nodes despite up to $f<n/3$ Byzantine faults with the gradient clock synchronization (GCS) algorithm by Lenzen et al. [Lenzen2010] in order to render the latter resilient to faults. As this is not possible on general graphs, we augment an input graph $\mathcal{G}$ by replacing each node by $3f+1$ fully connected copies, which execute an instance of the Lynch-Welch algorithm. We then interpret these clusters as supernodes executing the GCS algorithm, where for each cluster its correct nodes' Lynch-Welch clocks provide estimates of the logical clock of the supernode in the GCS algorithm. By connecting clusters corresponding to neighbors in $\mathcal{G}$ in a fully bipartite manner, supernodes can inform each other about (estimates of) their logical clock values. This way, we achieve asymptotically optimal local skew, granted that no cluster contains more than $f$ faulty nodes, at factor $O(f)$ and $O(f^2)$ overheads in terms of nodes and edges, respectively. Note that tolerating $f$ faulty neighbors trivially requires degree larger than $f$, so this is asymptotically optimal as well.

at February 23, 2019 11:26 PM UTC

Authors: John Iacono, Varunkumar Jayapaul, Ben Karsin
Download: PDF
Abstract: The performance of modern computation is characterized by locality of reference, that is, it is cheaper to access data that has been accessed recently than a random piece of data. This is due to many architectural features including caches, lookahead, address translation and the physical properties of a hard disk drive; attempting to model all the components that constitute the performance of a modern machine is impossible, especially for general algorithm design purposes. What if one could prove an algorithm is asymptotically optimal on all systems that reward locality of reference, no matter how it manifests itself within reasonable limits? We show that this is possible, and that algorithms that are asymptotically optimal in the cache-oblivious model are asymptotically optimal in any reasonable locality-of-reference rewarding setting. This is surprising as the cache-oblivious model envisions a particular architectural model involving blocked memory transfer into a multi-level hierarchy of caches of varying sizes, and was not designed to directly model locality-of-reference correlated performance.

at February 23, 2019 11:23 PM UTC

Authors: Anupam Gupta, Haotian Jiang, Ziv Scully, Sahil Singla
Download: PDF
Abstract: Suppose there are $n$ Markov chains and we need to pay a per-step \emph{price} to advance them. The "destination" states of the Markov chains contain rewards; however, we can only get rewards for a subset of them that satisfy a combinatorial constraint, e.g., at most $k$ of them, or they are acyclic in an underlying graph. What strategy should we choose to advance the Markov chains if our goal is to maximize the total reward \emph{minus} the total price that we pay?

In this paper we introduce a Markovian price of information model to capture settings such as the above, where the input parameters of a combinatorial optimization problem are given via Markov chains. We design optimal/approximation algorithms that jointly optimize the value of the combinatorial problem and the total paid price. We also study \emph{robustness} of our algorithms to the distribution parameters and how to handle the \emph{commitment} constraint.

Our work brings together two classical lines of investigation: getting optimal strategies for Markovian multi-armed bandits, and getting exact and approximation algorithms for discrete optimization problems using combinatorial as well as linear-programming relaxation ideas.

at February 23, 2019 11:28 PM UTC

Authors: Lingxiao Huang, Nisheeth K. Vishnoi
Download: PDF
Abstract: Fair classification has been a topic of intense study in machine learning, and several algorithms have been proposed towards this important task. However, in a recent study, Friedler et al. observed that fair classification algorithms may not be stable with respect to variations in the training dataset -- a crucial consideration in several real-world applications. Motivated by their work, we study the problem of designing classification algorithms that are both fair and stable. We propose an extended framework based on fair classification algorithms that are formulated as optimization problems, by introducing a stability-focused regularization term. Theoretically, we prove a stability guarantee, that was lacking in fair classification algorithms, and also provide an accuracy guarantee for our extended framework. Our accuracy guarantee can be used to inform the selection of the regularization parameter in our framework. To the best of our knowledge, this is the first work that combines stability and fairness in automated decision-making tasks. We assess the benefits of our approach empirically by extending several fair classification algorithms that are shown to achieve the best balance between fairness and accuracy over the Adult dataset. Our empirical results show that our framework indeed improves the stability at only a slight sacrifice in accuracy.

at February 23, 2019 11:22 PM UTC

Authors: Therese Biedl
Download: PDF
Abstract: It is well-known that every $n$-vertex planar graph with minimum degree 3 has a matching of size at least $\frac{n}{3}$. But all proofs of this use the Tutte-Berge-formula for the size of a maximum matching. Hence these proofs are not directly algorithmic, and to find such a matching one must apply a general-purposes maximum matching algorithm, which has run-time $O(n^{1.5}\alpha(n))$ for planar graphs. In contrast to this, this paper gives a linear-time algorithm that finds a matching of size at least $\frac{n}{3}$ in any planar graph with minimum degree 3.

at February 23, 2019 11:27 PM UTC

Authors: Ashish Dwivedi, Rajat Mittal, Nitin Saxena
Download: PDF
Abstract: Finding an irreducible factor, of a polynomial $f(x)$ modulo a prime $p$, is not known to be in deterministic polynomial time. Though there is such a classical algorithm that {\em counts} the number of irreducible factors of $f\bmod p$. We can ask the same question modulo prime-powers $p^k$. The irreducible factors of $f\bmod p^k$ blow up exponentially in number; making it hard to describe them. Can we count those irreducible factors $\bmod~p^k$ that remain irreducible mod $p$? These are called {\em basic-irreducible}. A simple example is in $f=x^2+px \bmod p^2$; it has $p$ many basic-irreducible factors. Also note that, $x^2+p \bmod p^2$ is irreducible but not basic-irreducible!

We give an algorithm to count the number of basic-irreducible factors of $f\bmod p^k$ in deterministic poly(deg$(f),k\log p$)-time. This solves the open questions posed in (Cheng et al, ANTS'18 \& Kopp et al, Math.Comp.'19). In particular, we are counting roots $\bmod\ p^k$; which gives the first deterministic poly-time algorithm to compute Igusa zeta function of $f$. Also, our algorithm efficiently partitions the set of all basic-irreducible factors (possibly exponential) into merely deg$(f)$-many disjoint sets, using a compact tree data structure and {\em split} ideals.

at February 23, 2019 11:22 PM UTC

Authors: P. A. M. Casares, M. A. Martin-Delgado
Download: PDF
Abstract: We introduce a new quantum optimization algorithm for Linear Programming (LP) problems based on Interior Point (IP) Predictor-Corrector (PC) methods whose (worst case) time complexity is $O(\sqrt{n}Ls^3 k \epsilon^{-1}\epsilon_s^{-1}) $. This represents a quantum speed-up in the number $n$ of variables in the cost function with respect to the comparable classical Interior Point (IP) algorithms that behave as $O((n+m)\sqrt{nk}L s^3\log(\epsilon^{-1})\epsilon_s^{-1})$ or $O(\sqrt{n}(n+m)L)$ depending on the technique employed, where $m$ is the number of constraints and the rest of the variables are defined in the introduction. The average time complexity of our algorithm is $O(\sqrt{n}s^3 k \epsilon^{-1}\epsilon_s^{-1})$, which equals the behaviour on $n$ of quantum Semidefinite Programming (SDP) algorithms based on multiplicative weight methods when restricted to LP problems and heavily improves on the precision $\epsilon^{-1}$ of the algorithm. Unlike the quantum SDP algorithm, the quantum PC algorithm does not depend on size parameters of the primal and dual LP problems ($R,r$), and outputs a feasible and optimal solution whenever it exists.

at February 23, 2019 11:27 PM UTC

In the 1990s I published a series of papers on data structures for closest pairs. As long as you already know how to maintain dynamic sets of objects of some type, and answer nearest-neighbor queries among them, you can also keep track of the closest pair, and this can be used as a subroutine in many other computational geometry algorithms. But it turns out that many of those algorithms can now be simplified and sped up by using mutual nearest neighbors (pairs of objects that are each other’s nearest neighbors) instead of closest pairs.

My original motivation for studying these types of problems was to maintain minimum spanning trees of dynamic point sets, using closest red-blue pairs of Euclidean points,1 2 and I later found more applications in hierarchical clustering, greedy matching, traveling salesperson heuristics,3 4 and (with Jeff Erickson) motorcycle graphs and straight skeletons.5 But to use these closest pair data structures, you have to pay two logarithmic factors in time complexity over the time for the underlying nearest-neighbor data structure. So they’re not competitive with (uncolored) Euclidean closest pair data structures, which take only logarithmic time in any fixed dimension. Instead they make more sense to use with other distances than Euclidean, with objects more complicated than single points, or with variations like the red-blue closest pair for which the logarithmic-time solution doesn’t work.

For several variations of hierarchical clustering, an alternative and simpler technique has been known for quite a bit longer, based on finding mutual nearest neighbors (pairs of objects that are nearer to each other than to anything else) rather than closest pairs.6 7 It’s called the nearest neighbor chain algorithm, but really it’s a data structure rather than an algorithm, one that allows you to maintain a dynamic point set and find pairs of mutual nearest neighbors, again based on calls to an underlying nearest neighbor data structure. The idea is to maintain a stack of shorter and shorter pairs of nearest neighbors, until the two objects whose distance is on the top of the stack have nothing nearer – they are mutual nearest neighbors. Whenever you want a pair of neighbors, you look at the top pair, an object and its nearest neighbor , and ask whether ’s nearest neighbor is . If so, you have found a mutual nearest neighbor pair, and if not you have a new shorter distance to push onto the stack.

One can this in a hierarchical clustering algorithm that repeatedly finds and merges the nearest two clusters, whenever the distance between clusters has a special property: a merged cluster is never closer to other clusters than the closer of the two clusters that was merged. This property implies both that the stack of distances remains valid after the merge, and that mutual nearest neighbors are always safe to merge. If two clusters are mutual nearest neighbors, then the closest-pair clustering algorithm will eventually merge them, because none of its actions can cause them to stop being mutual nearest neighbors. So we might as well merge them immediately once we discover them to be mutual nearest neighbors. (One way to formulate this mathematically is that the set of mutual nearest neighbor pairs merged by the clustering algorithm forms an antimatroid). When this works, you get a clustering algorithm that uses a linear number of nearest neighbor queries, instead of the queries that you would get using my closest-pair data structures.

In more recent research with UCI student Nil Mamano (finishing his doctorate this year; hire him for a postdoc, he’s good!) we noticed that the nearest neighbor chain algorithm can also be applied to certain stable marriage problems with preferences coming from geometric distances.8 Our latest preprint, “Euclidean TSP, Motorcycle Graphs, and Other New Applications of Nearest-Neighbor Chains” (with Efrat, Frishberg, Goodrich, Kobourov, Mamano, Matias, and Polishchuk, arXiv:1902.06875) extends this to a much broader set of applications. As well as simplifying and speeding up my previous work on motorcycle graphs and TSP heuristics, we also use nearest neighbor chains in a bigger class of stable matching problems and in an approximate geometric set cover problem. In each case, we need to show either that the problem has an antimatroid-like property (so using mutual nearest neighbors produces the same solution as closest pairs) or that, even when varying from the same solution, it achieves the same quality. It’s not quite true that anything closest pairs can do, mutual nearest neighbors can do better, but it’s close.

Another idea in the paper is that to find (exact!) mutual nearest neighbor pairs one can sometimes get away with using approximate near neighbor structures. This is important if you’re using Euclidean distance, because the time bounds for exact nearest neighbors have the form for constants that get very small as gets large, while approximate nearest neighbors are logarithmic in all dimensions. The idea is to build the stack of shorter distances by asking for a constant number of approximate near neighbors, the th of which is within a constant factor of the distance to the actual th nearest neighbor. By a packing argument for points in Euclidean space, either some two of these points are closer to each other than the distance on the current stack top (in which case you can build the stack one more level) or these approximate neighbors are guaranteed to contain the actual nearest neighbor (in which case you can either detect a mutual nearest neighbor pair or again build the stack). This idea leads, for instance, to an algorithm for the multi-fragment TSP heuristic that takes time in Euclidean spaces of any bounded dimension; the best previous time appears to be an -time algorithm (valid in any metric space) from one of my previous papers.3

  1. Agarwal, P. K., Eppstein, D., and Matoušek, J., “Dynamic algorithms for half-space reporting, proximity problems, and geometric minimum spanning trees”, FOCS, 1992, pp. 80–89. 

  2. Eppstein, D., “Dynamic Euclidean minimum spanning trees and extrema of binary functions”, Discrete Comput. Geom. 13: 111–122, 1995. 

  3. Eppstein, D., “Fast hierarchical clustering and other applications of dynamic closest pairs”, SODA, 1998, pp. 619–628, arXiv:cs.DS/9912014, J. Experimental Algorithmics 5 (1): 1–23, 2000.  2

  4. Cardinal, J., and Eppstein, D., “Lazy algorithms for dynamic closest pair with arbitrary distance measures”, ALENEX, 2004, pp. 112–119. 

  5. Eppstein, D., and Erickson, J., “Raising roofs, crashing cycles, and playing pool: applications of a data structure for finding pairwise interactions”, SoCG, 1998, pp. 58–67, Discrete Comput. Geom. 22 (4): 569–592, 1999. 

  6. Benzécri, J.-P. (1982), “Construction d’une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques”, Les Cahiers de l’Analyse des Données, 7 (2): 209–218. 

  7. Juan, J. (1982), “Programme de classification hiérarchique par l’algorithme de la recherche en chaîne des voisins réciproques”, Les Cahiers de l’Analyse des Données, 7 (2): 219–225. 

  8. Eppstein, D., Goodrich, M. T., and Mamano, N., “Algorithms for stable matching and clustering in a grid”, arXiv:1704.02303, IWCIA 2017, LNCS 10256 (2017), pp. 117–131. 

(Discuss on Mastodon)

by David Eppstein at February 21, 2019 09:06 PM UTC

Last weekend I saw the documentary Joseph Pulitzer: Voice of the People. Pulitzer, as you probably know from the prize named after him, was a major newspaper publisher in the late 19th and early 20th century. He ran two papers, the St. Louis Post-Dispatch and The New York World. The World at one point took on massive proportions, including sheet music of the latest tunes and dress patterns of new fashion that one could make at home. The World was the Internet of the turn of the 20th century.

The movie mentioned the many editions of the paper during the day, including the extra edition. An extra edition came out because of some major breaking news story. Back then newspapers would drum up minor stories to sell extra editions but they tended to disappear over time.

Which brings me to Monday, August 19, 1991. Hard-line members of the communist party of the USSR attempted a coup to take over the government from Mikhail Gorbachev. To us in the US, this seemed like the cold war which appeared to be coming to an end might rekindle. At the time I lived in Chicago and on that Monday the Chicago Tribune ran an extra afternoon edition talking about the coup. The return to the cold war didn't happen. Within a couple of days the coup failed and if anything hastened the dissolution of the Soviet Republic.

That was probably the last of the extra editions. By the time of the next major historical event, ten years and twenty-three days later, we had the Internet and cell phones and one no longer needed a newspaper to tell you the world has changed.

by Lance Fortnow ( at February 21, 2019 12:42 PM UTC

The Minimum Circuit Size Problem (MCSP) asks whether a (given) Boolean function has a circuit of at most a (given) size. Despite over a half-century of study, we know relatively little about the computational complexity of MCSP. We do know that questions about the complexity of MCSP have significant ramifications on longstanding open problems. In a recent development, Golovnev et al. [11] improves the status of unconditional lower bounds for MCSP, showing that MCSP is not in $AC^0[p]$ for any prime $p$. While their results generalize to most "typical" circuit classes, it fails to generalize to the circuit minimization problem for depth-d formulas, denoted ($AC^0_d$)-MCSP. In particular, their result relies on a Lipchitz hypothesis that is unknown (and possibly false) in the case of ($AC^0_d$)-MCSP. Despite this, we show that ($AC^0_d$)-MCSP is not in $AC^0[p]$ by proving even the failure of the Lipchitzness for $AC^0_d$ formulas implies that MAJORITY reduces to ($AC^0_d$)-MCSP under $AC^0$ truth table reductions. Somewhat remarkably, our proof (in the case of non-Lipchitzness) uses completely different techniques than [11]. To our knowledge, this is the first MCSP reduction that uses modular properties of a function's circuit complexity. We also define MOCSP, an oracle version of MCSP that takes as input a Boolean function $f$, a size threshold $s$, and oracle Boolean functions $f_1, ..., f_t$, and determines whether there is an oracle circuit of size at most $s$ that computes $f$ when given access to $f_1, ... , f_t$. We prove that MOCSP is $NP$-complete under non-uniform $AC^0$ many-one reductions as well as (uniform) $ZPP$ truth table reductions. We also observe that improving this $ZPP$ reduction to a deterministic polynomial-time reduction requires showing $EXP \neq ZPP$ (using theorems of Hitchcock and Pavan [17] and Murray and Williams [22]). Optimistically, these MOCSP results could be a first step towards $NP$-hardness results for MCSP. At the very least, we believe MOCSP clarifies the barriers towards proving hardness for MCSP and provides a useful "testing ground" for questions about MCSP.

at February 19, 2019 06:24 PM UTC

We show that any nondeterministic read-once branching program that computes a satisfiable Tseitin formula based on an $n\times n$ grid graph has size at least $2^{\Omega(n)}$. Then using the Excluded Grid Theorem by Robertson and Seymour we show that for arbitrary graph $G(V,E)$ any nondeterministic read-once branching program that computes a satisfiable Tseitin formula based on $G$ has size at least $2^{\Omega(tw(G)^\delta)}$ for all $\delta <1/36$, where $tw(G)$ is the treewidth of $G$ (for planar graphs and some other classes of graphs the statement holds for $\delta=1$). We also show an upper bound $O(|E| 2^{pw(G)})$, where $pw(G)$ is the pathwidth of $G$. We apply the mentioned results in the analysis of the complexity of derivation in the proof system $OBDD(\land, reordering)$ and show that any $OBDD(\land, reordering)$-refutation of an unsatisfiable Tseitin formula based on a graph $G$ has size at least $2^{\Omega(tw(G)^\delta)}$.

at February 19, 2019 06:23 PM UTC

All of twitter is .... atwitter?... over the OpenAI announcement and partial non-release of code/documentation for a language model that purports to generate realistic-sounding text from simple prompts. The system actually addresses many NLP tasks, but the one that's drawing the most attention is the deepfakes-like generation of plausible news copy (here's one sample).

Most consternation is over the rapid PR buzz around the announcement, including somewhat breathless headlines (that OpenAI is not responsible for) like

OpenAI built a text generator so good, it’s considered too dangerous to release
Researchers, scared by their own work, hold back “deepfakes for text” AI
There are concerns that OpenAI is overhyping solid but incremental work, that they're disingenuously allowing for overhyped coverage in the way they released the information, or worse that they're deliberately controlling hype as a publicity stunt.

I have nothing useful to add to the discussion above: indeed, see posts by Anima Anandkumar, Rob MunroZachary Lipton  and Ryan Lowe for a comprehensive discussion of the issues relating to OpenAI.  Jack Clark from OpenAI has been engaging in a lot of twitter discussion on this as well.

But what I do want to talk about is the larger issues around responsible science that this kerfuffle brings up. Caveat, as Margaret Mitchell puts it in this searing thread.

To understand the kind of "norm-building" that needs to happen here, let's look at two related domains.

In computer security, there's a fairly well-established model for finding weaknesses in systems. An exploit is discovered, the vulnerable entity is given a chance to fix it, and then the exploit is revealed , often simultaneously with patches that rectify it. Sometimes the vulnerability isn't easily fixed (see Meltdown and Spectre). But it's still announced.

A defining characteristic of security exploits is that they are targeted, specific and usually suggest a direct patch. The harms might be theoretical, but are still considered with as much seriousness as the exploit warrants.

Let's switch to a different domain: biology. Starting from the sequencing of the human genome through the million-person precision medicine project to CRISPR and cloning babies, genetic manipulation has provided both invaluable technology for curing disease as well as grave ethical concerns about misuse of the technology. And professional organizations as well as the NIH have (sometimes slowly) risen to the challenge of articulating norms around the use and misuse of such technology.

Here, the harms are often more diffuse, and the harms are harder to separate from the benefits. But the harm articulation is often focused on the individual patient, especially given the shadow of abuse that darkens the history of medicine.

The harms with various forms of AI/ML technology are myriad and diffuse. They can cause structural damage to society - in the concerns over bias, the ways in which automation affects labor, the way in which fake news can erode trust and a common frame of truth, and so many others - and they can cause direct harm to individuals. And the scale at which these harms can happen is immense.

So where are the professional groups, the experts in thinking about the risks of democratization of ML, and all the folks concerned about the harms associated with AI tech? Why don't we have the equivalent of the Asilomar conference on recombinant DNA?

I appreciate that OpenAI has at least raised the issue of thinking through the ethical ramifications of releasing technology. But as the furore over their decision has shown, no single imperfect actor can really claim to be setting the guidelines for ethical technology release, and "starting the conversation" doesn't count when (again as Margaret Mitchell points out) these kinds of discussions have been going on in different settings for many years already.

Ryan Lowe suggests workshops at major machine learning conferences. That's not a bad idea. But it will attract the people who go to machine learning conferences. It won't bring in the journalists, the people getting SWAT'd (and one case killed) by fake news, the women being harassed by trolls online with deep-fake porn images. 

News is driven by news cycles. Maybe OpenAI's announcement will lead to us thinking more about issues of responsible data science. But let's not pretend these are new, or haven't been studied for a long time, or need to have a discussion "started".

by Suresh Venkatasubramanian ( at February 19, 2019 04:00 PM UTC

We show that any $n$-variate polynomial computable by a syntactically multilinear circuit of size $\mathop{poly}(n)$ can be computed by a depth-$4$ syntactically multilinear ($\Sigma\Pi\Sigma\Pi$) circuit of size at most $\exp\left({O\left(\sqrt{n\log n}\right)}\right)$. For degree $d = \omega(n/\log n)$, this improves upon the upper bound of $\exp\left({O(\sqrt{d}\log n)}\right)$ obtained by Tavenas (MFCS 2015) for general circuits, and is known to be asymptotically optimal in the exponent when $d < n^{\epsilon}$ for a small enough constant $\epsilon$. Our upper bound matches the lower bound of $\exp\left({\Omega\left(\sqrt{n\log n}\right)}\right)$ proved by Raz and Yehudayoff (CC 2009), and thus cannot be improved further in the exponent. Our results hold over all fields and also generalize to circuits of small individual degree. More generally, we show that an $n$-variate polynomial computable by a syntactically multilinear circuit of size $\mathop{poly}(n)$ can be computed by a syntactically multilinear circuit of product-depth $\Delta$ of size at most $\exp\left(O\left(\Delta \cdot (n/\log n)^{1/\Delta} \cdot \log n\right)\right)$. It follows from the lower bounds of Raz and Yehudayoff (CC 2009) that in general, for constant $\Delta$, the exponent in this upper bound is tight and cannot be improved to $o\left(\left(n/\log n\right)^{1/\Delta}\cdot \log n\right)$.

at February 19, 2019 02:27 PM UTC

I recently read an absurd article that speculated on who the Democratic VICE prez nominees will be. Yes, you read that right, VICE Prez. Gee, wouldn't knowing who the Prez nominees was first help. But leave it to Nate Silver and company to have a fascinating yet... premature article on this. Here is the link: article on who will be Dem VP nominee.  Its hard enough to pick the VP when the Prez is known! I did a blog on how badly intrade would do on predicting VP, here. I didn't predict that intrade would go out of business (`intrade' was short hand for `betting markets' just like `Uber' is shorthand for `hailing a car with your phone'- I wonder if the term `Uber' will still be used if they go out of business?)

After reading I thought

Wow- there are so many If-Then clauses in this that I could make half a lecture about it for my Discrete Math class where we are talking about Prop. Logic.

I emailed the students to read it, and we had a lively discussion. I made sure it was a non-partisan discussion. Realize that a statement like:

If the Prez is a female then the VP will likely be a male because it is thought, righly or wrongly, that all-female ticket can't win

is not partisan or sexist, it is stating an opinion about how voters would go. It may be incorrect, but its not offensive.

Here is what the class thought the article was saying (and I think they are right) expressed in Logic.

Note one other thing-- the Prez nominees  might not be the choice of the party (Trump wasn't the choice of the Rep party in 2016, though not clear who was, see here) but the VP really will be since (I think) the party has more control over that.

If the Prez Nominees is female then the Vice Prez nominee will be male. (I agree)

If the Prez Nominees is white mail then the Vice Prez nominee will be either female or non-white but not both. (Great question to express that as symbols. Not sure what I think- the thought is that two white males will be odd since so many of the candidates are female or non-white. Having said that, in recent years the VP has NOT been someone else who ran:  Clinton-Keane, Keane had not been a candidate, Trump-Pence, Pence had not been a candidate, Romney-Ryan, Ryan had not been a candidate, Obama-Biden, Biden had been a candidate briefly in 2008, but this was not one of these `lets unite the party by picking by OPPONENT to be VP', McCain-Palin, Palin had not been a candidate. Kerry-Edwards. YES- Edwards had been a serious candidate in 2004, though some think he was running for VP all along since he never said an unkind word about his opponents)

Prez from one of the coasts IFF VP from the center. (I won't go over the list again, but this has been less of a thing since Clinton-Gore were both from flyover states.)

Ideology- If Prez is establishment then VP is leftists. (Not sure these terms are so well defined to say this, and their are other ideologies as well, not sure how they fit in.)

If Prez lacks Fed Experience then VP should have Fed Exp. (Quite common: Obama-Biden, Romney-Ryan, Clinton-Gore, Bush-Cheney)

If Prez has fed experience than nothing can be deduced about VP Fed Exp.

If Prez lacks gravitas then VP should have gravitas (tricky! Don't want the VP to outshine the Prez!)

Absolute statement: VP should not outshine prez. A Kangaroo ticket is when the people prefer the VP to the Prez (Dukakis-Bentsen had this problem. Kerry-Edwards might have).

If Prez is boring then VP should be exciting. But again tricky! (I think that was why McCain picked Palin. And she was exciting, but not really in a good way. Couldn't he have picked someone exciting, and perhaps female who was, you know, Qualified?)

VP's like doctors: do no harm.

VP from Swing state helpful but not necc.

Bad to have a VP who is a senator from a state where the Governor is republican and picks the replacement senator.

VP should be someone who the country can picture being president without laughing. I am not talking ideology I am talking about seriousness and experience. Palin was the only one in recent memory who even voters of her party worried that she was not up to being president. One pundit defending the choice talked about how McCain was healthy and hence Palin wouldn't become prez so don't worry about it. NOT reassuring! Quayle was also not seen as serious, though not as much as Palin.

If Prez is old then VP mattes more(?)

Many of the above depend on who the Prez is. And that is one of the points: one can write down a long prop formula with many IF-THEN's to determine who properties the VP will have, and then

1) When the Prez is picked many of the variables are set and hence the formula becomes much easeier and

2) Could STILL be wrong!

MY prediction: my last two predictions have been wrong so I am reluctant go predict anything. But I will tell you my WRONG predictions and then my current one

I predicted Paul Ryan would be the Prez Nominee in 2016. He didn't even run.

I predicted that Al Franken would be the Prez Nominess in 2020- he understands TV and was an entertainer, so he could match Trump. Whoops.

So with that sterling record I predict: Prez: Cory Booker, VP: Beto

I am NOT a pundit- so what I predict is not what I hope happens.  What do I hope? In the interest of full disclosure (gee, shouldn't that have come at the beginning) I admit that I want to see Trump lose but I have no idea what makes someone `electable' nowadays.

by GASARCH ( at February 19, 2019 01:27 PM UTC

Minimum Circuit Size Problem (MCSP) asks to decide if a given truth table of an $n$-variate boolean function has circuit complexity less than a given parameter $s$. We prove that MCSP is hard for constant-depth circuits with mod $p$ gates, for any prime $p\geq 2$ (the circuit class $AC^0[p])$. Namely, we show that MCSP requires $d$-depth $AC^0[p]$ circuits of size at least $exp(N^{0.49/d})$, where $N=2^n$ is the size of an input truth table of an $n$-variate boolean function. Our circuit lower bound proof shows that MCSP can solve the coin problem: distinguish uniformly random $N$-bit strings from those generated using independent samples from a biased random coin which is $1$ with probability $1/2+N^{-0.49}$, and $0$ otherwise. Solving the coin problem with such parameters is known to require exponentially large $AC^0[p]$ circuits. Moreover, this also implies that MAJORITY is computable by a non-uniform $AC^0$ circuit of polynomial size that also has MCSP-oracle gates. The latter has a few other consequences for the complexity of MCSP, e.g., we get that any boolean function in $NC^1$ (i.e., computable by a polynomial-size formula) can also be computed by a non-uniform polynomial-size $AC^0$ circuit with MCSP-oracle gates.

at February 18, 2019 09:30 PM UTC

To solving the big questions, that is

Cropped from Device Plus source

Tetsuya Miyamoto is a mathematics teacher who divides his time between Tokyo and Manhattan. He is known for creating in 2004 the popular KenKen puzzle, which the New York Times started running ten years ago. As with its sister puzzles Sudoku and Kakuro, unlimited-size versions of it are {\mathsf{NP}}-complete.

Today we observe the 10th anniversary of this blog and ask what progress has been made on the {\mathsf{P=NP}} question.

The {\mathsf{P=NP}} is a question about asymptotic complexity. From time to time we have tried to raise corresponding questions about concrete complexity that might yield more progress. What catches our eye about the KenKen puzzles is that their generation is a full-blown application within concrete complexity. The NYT’s KenKen puzzles are all generated using software by David Levy that can tailor their hardness. Quoting the NYT anniversary article by Will Shortz:

[Levy’s] program knows every possible method for solving a KenKen, which he has rated in difficulty from easy to hard. Thus, when a KenKen has been made, the computer knows exactly how hard it is.

This seems to say there is a hardness measure that is objective—quite apart from the idea of having human testers try the puzzle and say how hard they found it to be. We surmise that it is lower for instances that have more forced plays at the start. We wonder whether Levy’s criteria can be generalized.

Incidentally, this is the same Levy who won a challenge chess match in 1978 against the computer Chess 4.7 to complete his win of a famous ten-year $1,000+ bet. He lost $1,000 back when he was defeated by Deep Thought in 1989. He later became president of the International Computer Games Association (ICGA), whose Journal published a nice paper on the {\mathsf{NP}}-completess of the aforementioned puzzles and many others.

GLL’s Tenth Anniversary

GLL’s first post, on 12 Feb. 2009, featured Stephen Rudich and his work on the “Natural Proofs.” Two other posts that day covered other aspects of why the {\mathsf{P=NP}} question is hard. Our question, dear readers, is:

Has anything happened in the past ten years to make any part of those posts out-of-date in the slightest way?

We won’t claim any such progress, though we have tried to stir ideas. In the meantime, we have written 806 other posts:

  • Some have featured our own work and ideas (considering Ken’s chess research in a separate vein);

  • some have featured others’ direct attempts at breakthrough lower and upper bounds (with a few successes);

  • many have featured other kinds of results by others;

  • many have pulled “idea nuggets” from the past;

  • many have been humor and social commentary.

To date, we’ve had 18,575 comments plus trackbacks on these posts and just over 2.1 million views. We are less able to quantify impacts, beyond occasionally seeing citations of articles on the blog as sources. We try for precision as well as readability and are grateful for reader comments with fixes when we slip up on the former.

P vs NP

There continue to be claims of proofs that {\mathsf{P \neq NP}} and some that {\mathsf{P=NP}} While these proofs do not seem to be correct, there is something that we wish to remark about them. Many argue as follows:

There is some problem say {S} that seems to require a search of exponentially many objects. Then the proof states that any algorithm for {S} must actually look at all or most of the exponentially many objects. This of course is where the proof is not complete.

There is some sense to these proofs. They seem related to the oracle proofs that for example show that for some oracle set {A} it is the case that

\displaystyle  \mathsf{P}^{A} \neq \mathsf{NP}^{A}.

we have discussed these types of proofs before—we even said that we did not like them.

The trouble with these results that are rigorous is that they change {\mathsf{P}} vs {\mathsf{NP}} in a central manner, and this seems to make the results much less interesting. Roughly here is how they argue: Imagine that for each {n} we either put a string of length {n} into {A} or we do not. The point is that if we do this in a unpredictable manner then a polynomial time machine will not be able to decide whether for {n} there is or is not a string of length {n} in {A}. But a nondeterministic machine with just use its power and guess. This shows, essentially, that

\displaystyle  \mathsf{P}^{A} \neq \mathsf{NP}^{A}

is true.

There is some relationship to many attempts to show {\mathsf{P} \neq \mathsf{NP}}. The proofs often argue that one must look at all the objects. The counterpart here is that a polynomial time machine will not have enough time to check the {2^{n}} strings of length {n} to see if they are in {A}. But this works in the oracle case because we allow the rule that decides whether or not a string is in {A} to be very complicated. In the real world, in the world where we study the real {\mathsf{P} \neq \mathsf{NP}} question, we cannot assume that {NP}-complete problems use a complicated rule. That is precisely what we are trying to prove.

State of the Game

What can we say? Mostly the big open questions remain. We still have no non-linear lower bounds on circuit complexity and no progress of any definite kind on {\mathsf{P}=\mathsf{NP}}. What do you think?

What is commonly hailed as one of the two biggest results in our field last year was a positive solution to what is intuitively a slightly weaker form of the Unique Games Conjecture (UGC). For UGC we can refer you to Wikipedia’s article:

The note [2] is in turn a reference to a 2010 post here. The new paper proves hardness for the relaxed situation where, roughly speaking, a trial assignment to a node in a constraint graph limits the other node on any connecting edge to at most two possible values, rather than a unique value as in UGC. This relaxation retains many properties that had caused disbelief in the original UGC, yet it was proved—in that sense a big deal.

Nevertheless we note that UGC, at its core, is just asserting that for arbitrarily small {\epsilon}, with our power to make other parameter(s) as large as desired, we can execute an {\mathsf{NP}}-hardness proof. We have been executing {\mathsf{NP}}-hardness proofs for almost fifty years. That is something we in the field have proven good at. True, these hardness results becomes lower bound proofs if and when {\mathsf{NP \neq P}} is proved, and true, we have been as vocal as any on the standpoint that significant lower bounds will come from constructions that are usually thought of as being for upper bounds. But the new proof from a year ago doesn’t feel like that. We invite readers to tell us connections from UGC to the possibility of actually constructing lower bounds.

Open Problems

We at GLL thank you all for your help and support these ten years. Ken and I plan to continue doing what we have done in the past. Plan on a visit from our special friend on St. Patrick’s day, for example. Thanks again and let us know how we are doing.

[fixed date of first GLL post]

by RJLipton+KWRegan at February 18, 2019 09:14 PM UTC

In this post  I would like to tell you about three papers and three theorems. I am thankful to Moshe White and Imre Barany for helpful discussions.

a) Universality of vector sequences and universality of Tverberg partitions, by Attila Por;

Theorem (Por’s universality result, 2018): Every long enough sequence of points in general position in \mathbb R^d  contains a subsequence of length n whose Tverberg partitions are exactly the so called rainbow partitions.

b) Classifying unavoidable Tverberg partitions, by Boris BukhPo-Shen LohGabriel Nivasch

Theorem (Bukh, Loh, and Nivasch, 2017): Let H be a tree-like r-uniform simple hypergraph with d+1 edges and n=(d+1)(r-1)+1 edges. It is possible to associate to the vertices of each such hypergraph H a set X of n points in \mathbb R^d so that the Tverberg partitions of X correspond precisely to rainbow coloring of the hypergraph H. Moreover, the number of rainbow coloring is (r-1)!^d. (Here, we consider two colorings as the same if they differ by a permutation of the colors.)

c) On Tverberg partitions, by Moshe White

Theorem (White, 2015): For any partition a_1,a_2,\dots ,a_r: 1 \le a_i\le d+1 of n, there exists a set X \subset \mathbb R^d of n  points, such that every Tverberg partition of X  induces the same partition on n given by the parts a_1,\dots,a_r. Moreover, the number of Tverberg’s partitions of X is (r-1)!^d

See the original abstracts for the papers at the end of the post.

Radon’s and Tverberg’s theorems and Sierksma’s conjecture

Recall the beautiful theorem of Tverberg: (We devoted two posts (I, II) to its background and proof.)

Tverberg Theorem (1965): Let x_1,x_2,\dots, x_m be points in R^d, m \ge (r-1)(d+1)+1. Then there is a partition S_1,S_2,\dots, S_r of \{1,2,\dots,m\} such that \cap _{j=1}^rconv (x_i: i \in S_j) \ne \emptyset.

The (much easier) case r=2 of Tverberg’s theorem is Radon’s theorem.

We devoted a post to seven open problems related to Tverberg’s theorem, and one of them was:

Sierksma Conjecture: The number of Tverberg’s r-partitions of a set of (r-1)(d+1)+1 points in R^d is at least ((r-1)!)^d.

Gerard Sierksma’s construction with (r-1)!^d Tverberg’s partition is obtained by taking (r-1) copies of each vertex of a simplex containing the origin in its interior, and adding the origin itself. A configuration of (r-1)(d+1)+1 points in R^d with precisely ((r-1)!)^d Tverberg partitions to r parts is called a Sierksma Configuration.

White’s Theorem

In 2015 Moshe White proved the following theorem which was an open problem for many years. White’s construction was surprisingly simple.

Theorem 1 (White, 2015):  For any partition a_1,a_2,\dots ,a_r: 1 \le a_i\le d+1 of n, there exists a set X \subset \mathbb R^d of n  points, such that every Tverberg partition of X  induces the same partition on n given by the parts a_1,\dots,a_r. Moreover, the number of Tverberg’s partitions of X is (r-1)!^d

Bukh, Loh, and Nivasch’s  examples via staircase convexity.

Five tree-like simple hypergraphs that correspond to configurations of 11 points in 4-dimensional space.

Start with a tree-like hypergraph H of d+1 blocks of size r like the five examples in the Figure above. The intersection of every two blocks has at most one element. The union of all blocks has n=(d+1)(r-1)+1 elements.

A rainbow coloring of a r-uniform hypergraph H is a coloring of the vertices of H with r colors so that the vertices of every edge is colored by all r colors.

Theorem 2 (Bukh, Loh, and Nivasch): It is possible to associate to the vertices of each such hypergraph H a set X of n points in \mathbb R^d so that the Tverberg partitions of X correspond precisely to rainbow coloring of the hypergraph H. Moreover, the number of rainbow coloring is (r-1)!^d. (Here, we consider two colorings as the same if they differ by a permutation of the colors.)

For a star-like hypergraph where all blocks have a vertex in common we get the original Sierksma’s example. (Example (d) above.) White’s examples are obtained by considering such hypergraphs where there exists an edge A such that all edges have non empty intersection with A. (Examples (c), (d), and (e) above).

Rainbow colorings of our five examples 

Tverberg’s partitions for stretched points on the moment curve

It is natural to consider $n$ points on the moment curve x(t)=(t,t^2,\dots, t^d). It turns out that the set of Tverberg’s partitions for points on the moment curve depend on the precise location of the points. By stretched points on the moment curve I mean that  you take the points x(t_1), x(t_2), \dots x(t_n) where t_1 << t_2 << \dots t_n, namely $t_2$ is much much larger than t_1 and t_3 is much much much much larger than t_2, etc. etc. In this case, the configuration corresponds to a path H: you let the vertices be \{1,2,\dots,n\} and the edges are sets of the form \{(k-1)(r-1)+1, (k-1)(r-1)+2,\dots , k(r-1)+1\}. A stretched configuration of points on the moment curve has the property that every subset is also a stretched configuration of points on the moment curve.

The importance of Tverberg’s partitions for stretched points on the moment curve was realized by Barany and Por, by Bukh, Loh, and Nivasch, and by Perles and Sidron (See their paper Tverberg Partitions of Points on the Moment Curve), and perhaps by others as well.

Por’s universality result

Por’s universality theorem asserts that in terms of Tverberg partitions every large enough configuration of points in general position in R^d contains a configuration whose Tverberg partitions are those of a stretched configuration of n points on the moment curve! Por’s  universality result was conjectured independently by Bukh, Loh, and Nivasch, (and they gave some partial results) and by Por himself.

Theorem 3 (Por’s universality result, 2018): Every long enough sequence of points in \mathbb R^d in general position contains a subsequence  of length n whose Tverberg partitions are exactly the so called rainbow partitions.

Por actually proved an apparently stronger statement: We can find a subsequence Y so the conclusion holds not only for Y but also for every subsequence Z of Y

Staircase Convexity

The work of Bukh, Loh, and Nivasch relied on an important method of “staircase convexity”. An earlier 2001 application of the method (where it was introduced) was for lower bounds on weak epsilon nets by Bukh, Matousek, and Nivasch (Here are links to the paper, and to slides from a talk by Boris Bukh. See also this post and this one of the weak epsilon net problem.) Roughly the idea is this: consider a stretched grid where the sequence of coordinates are very very fast growing. When you choose configuration of points in such a grid, questions regarding their convex hulls translate to purely combinatorial problems.

Stairconvex sets explained by Boris Bukh

Erdos Szekeres in the plane and higher dimensions

The planar case

Let ES(n) be the smallest integer such that any set of ES(n) points in the plane in general position contains n points in convex position. In their seminal 1935 paper, Erdős and Szekeres showed that ES(n) is finite.

The finiteness of ES(n) can be stated as follows: Given a sequence of N points in general position in the plane x_1,x_2, \dots , x_N  there is a subsequence 1_i,x_2, \dots , x_n such that the line segments [x_i,x_k] and [x_j,x_\ell ] intersect. With this statement, the Erdős and Szekeres’ theorem can be seen as identifying a universal set of points in term of its Radon partitions (or equivalently in terms of its order type).

In high dimensions

In higher dimensions we can define ES_d(n) and replace “in convex position” by “in cyclic position”. The finiteness of ES_d(n) (with terrible bounds) follows easily from various Ramsey results. In a series of papers very good lower and upper bounds where obtained:  Imre BaranyJiri MatousekAttila Por: Curves in R^d intersecting every hyperplane at most d+1 timesMarek EliášJiří MatoušekEdgardo Roldán-PensadoZuzana Safernová: Lower bounds on geometric Ramsey functions; Marek Elias, Jiri Matousek: Higher-order Erdos–Szekeres theorems .

Por’s result

Por’s result can be seen as a far-reaching strengthening of the finiteness of ES_d(n).

Further Discussion:

High order order types?

Can you base a higher-order notion of “order types” on Tverberg partitions?

The order type of a sequence of n points affinely spanning R^d, is the described by the vector of signs (0, 1 or -1) of volume of simplices described by subsequences of length d+1. Equivalently the order type can be described by the minimal Radon partitions of the points.

  1. We can first ask if we can base a notion of higher order types on Tverberg’s partitions to r parts where r>2.
  2. Next we can ask for an associated notion of “higher order oriented matroids.” (Oriented matroids in the usual sense are abstract order types which coincide with Euclidean order types for every subsequence of d+3 points.)
  3. A natural question regarding these “higher order types is: If a sequence of points in strong general position is Tverberg-equivalent to stretched points on the moment curve, does it apply to all of its subsequences?

Another way to consider “higher” order types is to enlarge the family by to start with a family of points add to it all Radon points of minimal Radon’s partition and consider the order type of the new configuration. (This operation can be repeated r times.) See this paper of Michael Kallay on point sets which contain their Radon points.

Staircase convexity order types

Understanding order types of configuration of points on stretched grids of Bukh et al. is a very interesting problem. It is interesting to understand such configurations that are not in general position as well. (In particular, which matroids are supported on the stretched grid?) Of course, the method may well have many more applications.

Fantastically strong forms of Sierksma’s conjecture

Is the following true: For every sequence T of n=(r-1)(d+1)+1 points in R^d there is a Sierksma’s configuration S of $n$ points so that every Tverberg’s partition of  S is a Tverberg’s partition of T?

An even stronger version is:

Does every sequence T of (r-1)(d+1)+1 points in R^d there is a tree-like simple hypergraph so that all the rainbow coloring of H correspond to Tverberg partitions of the sequence? If true this will be a fantastically strong version of Sierksma’s conjecture.

Is the Erdős-Szekeres’ conjecture outrageous?

Erdős and Szekeres proved in 1935 that ES(n)\le {{2n-4}\choose{n-2}}+1=4^{n-o(n)}, and in 1960, they showed that ES(n) \ge 2^{n-2}+1, and conjectured this to be optimal. Despite the efforts of many researchers, until recently no improvement in the order of magnitude has ever been made on the upper bound over  81 years. A  recent breakthrough result by Andrew Suk (Here are links to the paper, and to our post discussing the result) asserts that ES(n)=2^{n+o(n)}. Sometime ago I asked over MO a question on outrageous mathematical conjectures and perhaps the conjecture that  ES(n) = 2^{n-2}+1 on the nose is an example.

Original Abstracts

Universality of vector sequences and universality of Tverberg partitions, by Attila Por;

Classifying unavoidable Tverberg partitions, by Boris Bukh, Po-Shen Loh, Gabriel Nivasch

On Tverberg partitions, by Moshe White


by Gil Kalai at February 16, 2019 04:06 PM UTC

Beware the Ides of February.

by David Eppstein at February 15, 2019 10:45 AM UTC

Henry Cohn

A follow up paper on the tight bounds for sphere packings in eight and 24 dimensions. (Thanks, again, Steve, for letting me know.)

For the 2016 breakthroughs see this post, this post of John Baez, this article by Erica Klarreich on Quanta Magazine, and a Notices AMS article by  Henry Cohn   A conceptual breakthrough in sphere packing. See also, Henry Cohn’s 2010 paper Order and disorder in energy minimization, and Maryna Viazovska’s ICM 2018 videotaped lecture.

Henry Cohn, Abhinav Kumar, Stephen D. Miller, Danylo Radchenko, and Maryna Viazovska: Universal optimality of the E8 and Leech lattices and interpolation formulas

Abstract: We prove that the E_8 root lattice and the Leech lattice are universally optimal among point configurations in Euclidean spaces of dimensions 8 and 24, respectively. In other words, they minimize energy for every potential function that is a completely monotonic function of squared distance (for example, inverse power laws or Gaussians), which is a strong form of robustness not previously known for any configuration in more than one dimension. This theorem implies their recently shown optimality as sphere packings, and broadly generalizes it to allow for long-range interactions.

The proof uses sharp linear programming bounds for energy. To construct the optimal auxiliary functions used to attain these bounds, we prove a new interpolation theorem, which is of independent interest. It reconstructs a radial Schwartz function f from the values and radial derivatives of f and its Fourier transform \hat f at the radii √2π for integers n ≥ 1 in R^8 and n ≥ 2 in R^{24}. To prove this theorem, we construct an interpolation basis using integral transforms of quasimodular forms, generalizing Viazovska’s work on sphere packing and placing it in the context of a more conceptual theory.

by Gil Kalai at February 15, 2019 07:17 AM UTC

The next TCS+ talk will take place this coming Wednesday, February 20th at
1:00 PM Eastern Time (10:00 AM Pacific Time, 19:00 Central European
Time, 18:00 UTC). Sepehr Assadi from Princeton University will speak about “A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling” (abstract below).

Please make sure you reserve a spot for your group to join us live by signing up on the online form. As usual, for more information about the TCS+ online seminar series and the upcoming talks, or to suggest a possible topic or speaker, please see the website.

Abstract: In the subgraph counting problem, we are given a (large) graph G(V, E) and a (small) graph H (e.g., a triangle), and the goal is to estimate the number of occurrences of H in G. Our focus in this talk is on designing sublinear-time algorithms for approximately computing number of occurrences of H in G in the setting where the algorithm is given query access to G. This problem has been studied in several recent work which primarily focused on specific families of graphs H such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs H in the literature. This is in sharp contrast to the closely related subgraph enumeration problem in which the goal is to list all copies of the subgraph H in G. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph H in a graph G with m edges is O(m^{p(H)}), where p(H) is the fractional edge cover number of H, and enumeration algorithms with matching runtime are known for every H.

In this talk, we bridge this gap between the subgraph counting and subgraph enumeration problems and present a simple sublinear-time algorithm that estimates the number of occurrences of any arbitrary graph H in G, denoted by \#H, to within a (1 \pm \varepsilon)-approximation factor with high probability in O(m^{p(H)} /\#H)\cdot \text{poly}(\log n,1/\varepsilon) time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph H under the additional assumption of edge-sample queries. Our results are also applicable to the more general problem of database join size estimation problem and for this slightly more general problem achieve optimal bounds for every choice of H.

Joint work with Michael Kapralov and Sanjeev Khanna.

by plustcs at February 15, 2019 12:03 AM UTC

So you've got an iPhone XS in Space Grey. Congrats, so do 20 million other people. Maybe you have different cases but otherwise the hardware in all these phones are virtually identical. Yet you can tell with a glance that this is your phone. You can personalize apps and other elements of the home screen. It's your calendar and email and music.

What? You've dropped your phone over Niagara falls. Luckily you've backed up your data. So you go back to Apple and buy another Space Grey iPhone XS and restore your data. Physically it's a completely different phone but for all practical purposes it's though you still had the original phone. Your phone is not defined by the device but the data that resides on it.

It's not just phones. I can log into Google on anyone's Chrome browser and it will feel like my machine.

Now we've all heard about a future world where nobody owns cars and we get driven around in self-driving Ubers, Lyfts and Waymos. One argument against this world is that people feel connected to their cars and unwilling to commute in some generic vehicle. But one can also imagine the car knows who you are, knows how you like your music, your lighting, how you adjust your seats even how your car drives. It becomes your car. Maybe even has electronic bumper stickers that change to support your political party.

You can imagine the same for hotel rooms, your office, maybe even your apartment. It won't replicate your dog (or will it?) but as we get define more by our data than our things, do our things matter at all?

by Lance Fortnow ( at February 14, 2019 12:52 PM UTC

A few weeks ago, I was at QIP’2019 in Boulder, CO. This week I was at SQuInT’2019 in Albuquerque, NM. There were lots of amazing talks—feel free to ask in the comments section.

There’s an interview with me at the website “GigaOm,” conducted by Byron Reese and entitled Quantum Computing: Capabilities and Limits. I didn’t proofread the transcript and it has some errors in it, but hopefully the meaning comes through. In other interview news, if you were interested in my podcast with Adam Ford in Melbourne but don’t like YouTube, Adam has helpfully prepared transcripts of the two longest segments: The Ghost in the Quantum Turing Machine and The Winding Road to Quantum Supremacy.

The New York Times ran an article entitled The Hard Part of Computer Science? Getting Into Class, about the surge in computer science majors all over the US, and the shortage of professors to teach them. The article’s go-to example of a university where this is happening is UT Austin, and there’s extensive commentary from my department chair, Don Fussell.

The STOC’2019 accepted papers list is finally out. Lots of cool stuff!

by Scott at February 13, 2019 04:43 AM UTC

You've probably heard the following:

               At first I didn't want to get an X but now that I have it, I can't imagine life without one.

X could be telegraph, radio, TV, color TV, VCR, CD player, streaming, Netflix, Amazon prime, an uber account, Washer and Dryer, Car Phones (remember those), Cell Phones. If you go back in history  wrist watches or sun dials (or wrist-sun-dials!).

This has happened to me recently though not with an object. I read an article someplace saying that ze can be used instead of he or she. It was referring to nonbinaries (using `they' never quite sounded right) but actually it would be great if this was a general genderless pronoun. I am not making a political statement here (although I doubt I have any readers who are against genderless pronouns).

Once I came across the term ze I found places to use it and now I can't imagine not using it.

In a recent article I wrote I needed to say that someone was probably confused, but I did not know their gender. I used

                                                         Ze was probably confused

which is much better than

                                                         S/he was probably confused

                                                         He or she was probably confused

                                                         The student was probably confused

                                                         They were probably confused.

Note that the first two leave out nonbinaries.

0) In the article I put in a footnote saying what ze meant. In the future I may not have to.

1) Will ze catch on? This blog post is an attempt to hasten the practice.

2) Is there a term for his/her that is non-gendered? If not then maybe zer.

3) Will there be political pushback on this usage? If its phrased as a way to include nonbinaries than unfortunately yes. If its phrased as above as when you don't know the gender, what do you do, then no.

4) Is  nonbinary the correct term? If not then please politely correct me in the comments.

5) Has Ms replaced Miss and Mrs?

I have used the term ze several times since then- often when I get email from a student such that I can't tell from the first name what their gender is, and I need to forward the email, such as

                   Ze wants to take honors discrete math but does not have the prerequisite, but
                   since ze placed in the top five in the math olympiad, we'll let zer take it.

by GASARCH ( at February 11, 2019 07:43 PM UTC

July 8, 2019 Patras, Greece Submission deadline: April 26, 2019 Registration deadline: April 30, 2019 The 2019 edition of the Parameterized Approximation Algorithms Workshop (PAAW) will take place as a satellite workshop of ICALP 2019 in Patras, Greece, on Monday July 8th 2019. — Topics of interest — – Parameterized approximation algorithms – Lossy … Continue reading Parameterized Approximation Algorithms Workshop (PAAW) 2019

by shacharlovett at February 11, 2019 03:31 PM UTC

The other day I was derailed by this tweet from Fermat’s Library:

Inverse factorial tweet

The moment I saw it, I had to stop in my tracks, grab a scratch pad, and check out the formula. The result made sense in a rough-and-ready sort of way. Since the multiplicative version of \(n!\) goes to infinity as \(n\) increases, the “divisive” version should go to zero. And \(\frac{n^2}{n!}\) does exactly that; the polynomial function \(n^2\) grows slower than the exponential function \(n!\) for large enough \(n\):

\[\frac{1}{1}, \frac{4}{2}, \frac{9}{6}, \frac{16}{24}, \frac{25}{120}, \frac{36}{720}, \frac{49}{5040}, \frac{64}{40320}, \frac{81}{362880}, \frac{100}{3628800}.\]

But why does the quotient take the particular form \(\frac{n^2}{n!}\)? Where does the \(n^2\) come from?

To answer that question, I had to revisit the long-ago trauma of learning to divide fractions, but I pushed through the pain. Proceeding from left to right through the formula in the tweet, we first get \(\frac{n}{n-1}\). Then, dividing that quantity by \(n-2\) yields

\[\cfrac{\frac{n}{n-1}}{n-2} = \frac{n}{(n-1)(n-2)}.\]

Continuing in the same way, we ultimately arrive at:

\[n \mathbin{/} (n-1) \mathbin{/} (n-2) \mathbin{/} (n-3) \mathbin{/} \cdots \mathbin{/} 1 = \frac{n}{(n-1) (n-2) (n-3) \cdots 1} = \frac{n}{(n-1)!}\]

To recover the tweet’s stated result of \(\frac{n^2}{n!}\), just multiply numerator and denominator by \(n\). (To my taste, however, \(\frac{n}{(n-1)!}\) is the more perspicuous expression.)

I am a card-carrying factorial fanboy. You can keep your fancy Fibonaccis; this is my favorite function. Every time I try out a new programming language, my first exercise is to write a few routines for calculating factorials. Over the years I have pondered several variations on the theme, such as replacing \(\times\) with \(+\) in the definition (which produces triangular numbers). But I don’t think I’ve ever before considered substituting \(\mathbin{/}\) for \(\times\). It’s messy. Because multiplication is commutative and associative, you can define \(n!\) simply as the product of all the integers from \(1\) through \(n\), without worrying about the order of the operations. With division, order can’t be ignored. In general, \(x \mathbin{/} y \ne y \mathbin{/}x\), and \((x \mathbin{/} y) \mathbin{/} z \ne x \mathbin{/} (y \mathbin{/} z)\).

The Fermat’s Library tweet puts the factors in descending order: \(n, n-1, n-2, \ldots, 1\). The most obvious alternative is the ascending sequence \(1, 2, 3, \ldots, n\). What happens if we define the divisive factorial as \(1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n\)? Another visit to the schoolroom algorithm for dividing fractions yields this simple answer:

\[1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n = \frac{1}{2 \times 3 \times 4 \times \cdots \times n} = \frac{1}{n!}.\]

In other words, when we repeatedly divide while counting up from \(1\) to \(n\), the final quotient is the reciprocal of \(n!\). (I wish I could put an exclamation point at the end of that sentence!) If you’re looking for a canonical answer to the question, “What do you get if you divide instead of multiplying in \(n!\)?” I would argue that \(\frac{1}{n!}\) is a better candidate than \(\frac{n}{(n - 1)!}\). Why not embrace the symmetry between \(n!\) and its inverse?

Of course there are many other ways to arrange the n integers in the set \(\{1 \ldots n\}\). How many ways? As it happens, \(n!\) of them! Thus it would seem there are \(n!\) distinct ways to define the divisive \(n!\) function. However, looking at the answers for the two permutations discussed above suggests there’s a simpler pattern at work. Whatever element of the sequence happens to come first winds up in the numerator of a big fraction, and the denominator is the product of all the other elements. As a result, there are really only \(n\) different outcomes—assuming we stick to performing the division operations from left to right. For any integer \(k\) between \(1\) and \(n\), putting \(k\) at the head of the queue creates a divisive \(n!\) equal to \(k\) divided by all the other factors. We can write this out as:

\[\cfrac{k}{\frac{n!}{k}}, \text{ which can be rearranged as } \frac{k^2}{n!}.\]

And thus we also solve the minor mystery of how \(\frac{n}{(n-1)!}\) became \(\frac{n^2}{n!}\) in the tweet.

It’s worth noting that all of these functions converge to zero as \(n\) goes to infinity. Asymptotically speaking, \(\frac{1^2}{n!}, \frac{2^2}{n!}, \ldots, \frac{n^2}{n!}\) are all alike.

Ta dah! Mission accomplished. Problem solved. Done and dusted. Now we know everything there is to know about divisive factorials, right?

Well, maybe there’s one more question. What does the computer say? If you take your favorite factorial algorithm, and do as the tweet suggests, replacing any appearance of the \(\times\) (or *) operator with /, what happens? Which of the \(n\) variants of divisive \(n!\) does the program produce?

Here’s my favorite algorithm for computing factorials, in the form of a Julia program:

function mul!(n)
    if n == 1
        return 1
        return n * mul!(n - 1)

This is the algorithm that has introduced generations of nerds to the concept of recursion. In narrative form it says: If \(n\) is \(1\), then \(mul!(n)\) is \(1\). Otherwise, evaluate the function \(mul!(n-1)\), then multiply the result by \(n\). You might ask what happens if \(n\) is zero or negative. You might ask, but please don’t. For present purposes, \(n \in \mathbb{N}\).Starting with any positive \(n\), the sequence of recursive calls must eventually bottom out with \(n = 1\).

The function can be written more tersely using Julia’s one-liner style of definition:.

mul!(n)  =  n == 1 ? 1 : n * mul!(n - 1)

The right side of the assignment statement is a conditional expression, or ternary operator, which has the form a ? b : c. Here a is a boolean test clause, which must return a value of either true or false. If a is true, clause b is evaluated, and the result becomes the value of the entire expression. Otherwise clause c is evaluated.

Just to be sure I’ve got this right, here are the first 10 factorials, as calculated by this program:

[mul!(n) for n in 1:10]
10-element Array{Int64,1}:

Now let’s edit that definition and convert the single occurence of * to a /, leaving everything else (except the name of the function) unchanged.

div!(n)  =  n == 1 ? 1 : n / div!(n - 1)

And here’s what comes back when we run the program for values of \(n\) from \(1\) through \(20\):

[div!(n) for n in 1:20]
20-element Array{Real,1}:

Huh? That sure doesn’t look like it’s converging to zero—not as \(\frac{1}{n!}\) or as \(\frac{n}{n - 1}\). As a matter of fact, it doesn’t look like it’s going to converge at all. The graph below suggests the sequence is made up of two alternating components, both of which appear to be slowly growing toward infinity as well as diverging from one another.


In trying to make sense of what we’re seeing here, it helps to change the output type of the div! function. Instead of applying the division operator /, which returns the quotient as a floating-point number, we can substitute the // operator, which returns an exact rational quotient, reduced to lowest terms.

div!(n)  =  n == 1 ? 1 : n // div!(n - 1)

Here’s the sequence of values for n in 1:20:

20-element Array{Real,1}:

The list is full of curious patterns. It’s a double helix, with even numbers and odd numbers zigzagging in complementary strands. The even numbers are not just even; they are all powers of \(2\). Also, they appear in pairs—first in the numerator, then in the denominator—and their sequence is nondecreasing. But there are gaps; not all powers of \(2\) are present. The odd strand looks even more complicated, with various small prime factors flitting in and out of the numbers. (The primes have to be small—smaller than \(n\), anyway.)

This outcome took me by surprise. I had really expected to see a much tamer sequence, like those I worked out with pencil and paper. All those jagged, jitterbuggy ups and downs made no sense. Nor did the overall trend of unbounded growth in the ratio. How could you keep dividing and dividing, and wind up with bigger and bigger numbers?

At this point you may want to pause before reading on, and try to work out your own theory of where these zigzag numbers are coming from. If you need a hint, you can get a strong one—almost a spoiler—by looking up the sequence of numerators or the sequence of denominators in the Online Encyclopedia of Integer Sequences.

Here’s another hint. A small edit to the div! program completely transforms the output. Just flip the final clause, changing n // div!(n - 1) into div!(n - 1) // n.

div!(n)  =  n == 1 ? 1 : div!(n - 1) // n

Now the results look like this:

10-element Array{Real,1}:

This is the inverse factorial function we’ve already seen, the series of quotients generated when you march left to right through an ascending sequence of divisors \(1 \mathbin{/} 2 \mathbin{/} 3 \mathbin{/} \cdots \mathbin{/} n\).

It’s no surprise that flipping the final clause in the procedure alters the outcome. After all, we know that division is not commutative or associative. What’s not so easy to see is why the sequence of quotients generated by the original program takes that weird zigzag form. What mechanism is giving rise to those paired powers of 2 and the alternation of odd and even?

I have found that it’s easier to explain what’s going on in the zigzag sequence when I describe an iterative version of the procedure, rather than the recursive one. (This is an embarrassing admission for someone who has argued that recursive definitions are easier to reason about, but there you have it.) Here’s the program:

function div!_iter(n)
    q = 1
    for i in 1:n
        q = i // q
    return q

I submit that this looping procedure is operationally identical to the recursive function, in the sense that if div!(n) and div!_iter(n) both return a result for some positive integer n, it will always be the same result. Here’s my evidence:

[div!(n) for n in 1:20]    [div!_iter(n) for n in 1:20]
            1                         1//1    
           2//1                       2//1    
           3//2                       3//2    
           8//3                       8//3    
          15//8                      15//8    
          16//5                      16//5    
          35//16                     35//16   
         128//35                    128//35   
         315//128                   315//128  
         256//63                    256//63   
         693//256                   693//256  
        1024//231                  1024//231  
        3003//1024                 3003//1024 
        2048//429                  2048//429  
        6435//2048                 6435//2048 
       32768//6435                32768//6435 
      109395//32768              109395//32768
       65536//12155               65536//12155
      230945//65536              230945//65536
      262144//46189              262144//46189

To understand the process that gives rise to these numbers, consider the successive values of the variables \(i\) and \(q\) each time the loop is executed. Initially, \(i\) and \(q\) are both set to \(1\); hence, after the first passage through the loop, the statement q = i // q gives \(q\) the value \(\frac{1}{1}\). Next time around, \(i = 2\) and \(q = \frac{1}{1}\), so \(q\)’s new value is \(\frac{2}{1}\). On the third iteration, \(i = 3\) and \(q = \frac{2}{1}\), yielding \(\frac{i}{q} \rightarrow \frac{3}{2}\). If this is still confusing, try thinking of \(\frac{i}{q}\) as \(i \times \frac{1}{q}\). The crucial observation is that on every passage through the loop, \(q\) is inverted, becoming \(\frac{1}{q}\).

If you unwind these operations, and look at the multiplications and divisions that go into each element of the series, a pattern emerges:

\[\frac{1}{1}, \quad \frac{2}{1}, \quad \frac{1 \cdot 3}{2}, \quad \frac{2 \cdot 4}{1 \cdot 3}, \quad \frac{1 \cdot 3 \cdot 5}{2 \cdot 4} \quad \frac{2 \cdot 4 \cdot 6}{1 \cdot 3 \cdot 5}\]

The general form is:

\[\frac{1 \cdot 3 \cdot 5 \cdot \cdots \cdot n}{2 \cdot 4 \cdot \cdots \cdot (n-1)} \quad (\text{odd } n) \qquad \frac{2 \cdot 4 \cdot 6 \cdot \cdots \cdot n}{1 \cdot 3 \cdot 5 \cdot \cdots \cdot (n-1)} \quad (\text{even } n).

The functions \(1 \cdot 3 \cdot 5 \cdot \cdots \cdot n\) for odd \(n\) and \(2 \cdot 4 \cdot 6 \cdot \cdots \cdot n\) for even \(n\) have a name! They are known as double factorials, with the notation \(n!!\). Terrible terminology, no? Better to have named them “semi-factorials.” And if I didn’t know better, I would read \(n!!\) as “the factorial of the factorial.” The double factorial of n is defined as the product of n and all smaller positive integers of the same parity. Thus our peculiar sequence of zigzag quotients is simply \(\frac{n!!}{(n-1)!!}\).

A 2012 article by Henry W. Gould and Jocelyn Quaintance (behind a paywall, regrettably) surveys the applications of double factorials. They turn up more often than you might guess. In the middle of the 17th century John Wallis came up with this identity:

\[\frac{\pi}{2} = \frac{2 \cdot 2 \cdot 4 \cdot 4 \cdot 6 \cdot 6 \cdots}{1 \cdot 3 \cdot 3 \cdot 5 \cdot 5 \cdot 7 \cdots} = \lim_{n \rightarrow \infty} \frac{((2n)!!)^2}{(2n + 1)!!(2n - 1)!!}\]

An even weirder series, involving the cube of a quotient of double factorials, sums to \(\frac{2}{\pi}\). That one was discovered by (who else?) Srinivasa Ramanujan.

Gould and Quaintance also discuss the double factorial counterpart of binomial coefficients. The standard binomial coefficient is defined as:

\[\binom{n}{k} = \frac{n!}{k! (n-k)!}.\]

The double version is:

\[\left(\!\binom{n}{k}\!\right) = \frac{n!!}{k!! (n-k)!!}.\]

Note that our zigzag numbers fit this description and therefore qualify as double factorial binomial coefficients. Specifically, they are the numbers:

\[\left(\!\binom{n}{1}\!\right) = \left(\!\binom{n}{n - 1}\!\right) = \frac{n!!}{1!! (n-1)!!}.\]

The regular binomial \(\binom{n}{1}\) is not very interesting; it is simply equal to \(n\). But the doubled version \(\left(\!\binom{n}{1}\!\right)\), as we’ve seen, dances a livelier jig. And, unlike the single binomial, it is not always an integer. (The only integer values are \(1\) and \(2\).)

Seeing the zigzag numbers as ratios of double factorials explains quite a few of their properties, starting with the alternation of evens and odds. We can also see why all the even numbers in the sequence are powers of 2. Consider the case of \(n = 6\). The numerator of this fraction is \(2 \cdot 4 \cdot 6 = 48\), which acquires a factor of \(3\) from the \(6\). But the denominator is \(1 \cdot 3 \cdot 5 = 15\). The \(3\)s above and below cancel, leaving \(\frac{16}{5}\). Such cancelations will happen in every case. Whenever an odd factor \(m\) enters the even sequence, it must do so in the form \(2 \cdot m\), but at that point \(m\) itself must already be present in the odd sequence.

Is the sequence of zigzag numbers a reasonable answer to the question, “What happens when you divide instead of multiply in \(n!\)?” Or is the computer program that generates them just a buggy algorithm? My personal judgment is that \(\frac{1}{n!}\) is a more intuitive answer, but \(\frac{n!!}{(n - 1)!!}\) is more interesting.

Furthermore, the mere existence of the zigzag sequence broadens our horizons. As noted above, if you insist that the division algorithm must always chug along the list of \(n\) factors in order, at each stop dividing the number on the left by the number on the right, then there are only \(n\) possible outcomes, and they all look much alike. But the zigzag solution suggests wilder possibilities. We can formulate the task as follows. Take the set of factors \(\{1 \dots n\}\), select a subset, and invert all the elements of that subset; now multiply all the factors, both the inverted and the upright ones. If the inverted subset is empty, the result is the ordinary factorial \(n!\). If all of the factors are inverted, we get the inverse \(\frac{1}{n!}\). And if every second factor is inverted, starting with \(n - 1\), the result is an element of the zigzag sequence.

These are only a few among the many possible choices; in total there are \(2^n\) subsets of \(n\) items. For example, you might invert every number that is prime or a power of a prime \((2, 3, 4, 5, 7, 8, 9, 11, \dots)\). For small \(n\), the result jumps around but remains consistently less than \(1\):

Prime powers

If I were to continue this plot to larger \(n\), however, it would take off for the stratosphere. Prime powers get sparse farther out on the number line.

Here’s a question. We’ve seen factorial variants that go to zero as \(n\) goes to infinity, such as \(1/n!\). We’ve seen other variants grow without bound as \(n\) increases, including \(n!\) itself, and the zigzag numbers. Are there any versions of the factorial process that converge to a finite bound other than zero?

My first thought was this algorithm:

function greedy_balance(n)
    q = 1
    while n > 0
        q = q > 1 ? q /= n : q *= n
        n -= 1
    return q

We loop through the integers from \(n\) down to \(1\), calculating the running product/quotient \(q\) as we go. At each step, if the current value of \(q\) is greater than \(1\), we divide by the next factor; otherwise, we multiply. This scheme implements a kind of feedback control or target-seeking behavior. If \(q\) gets too large, we reduce it; too small and we increase it. I conjectured that as \(n\) goes to infinity, \(q\) would settle into an ever-narrower range of values near \(1\).

Running the experiment gave me another surprise:

Greedy balance linear

That sawtooth wave is not quite what I expected. One minor peculiarity is that the curve is not symmetric around \(1\); the excursions above have higher amplitude than those below. But this distortion is more visual than mathematical. Because \(q\) is a ratio, the distance from \(1\) to \(10\) is the same as the distance from \(1\) to \(\frac{1}{10}\), but it doesn’t look that way on a linear scale. The remedy is to plot the log of the ratio:

Greedy balance

Now the graph is symmetric, or at least approximately so, centered on \(0\), which is the logarithm of \(1\). But a larger mystery remains. The sawtooth waveform is very regular, with a period of \(4\), and it shows no obvious signs of shrinking toward the expected limiting value of \(\log q = 0\). Numerical evidence suggests that as \(n\) goes to infinity the peaks of this curve converge on a value just above \(q = \frac{5}{3}\), and the troughs approach a value just below \(q = \frac{3}{5}\). (The corresponding base-\(10\) logarithms are roughly \(\pm0.222\). I have not worked out why this should be so. Perhaps someone will explain it to me.

The failure of this greedy algorithm doesn’t mean we can’t find a divisive factorial that converges to \(q = 1\). If we work with the logarithms of the factors, this procedure becomes an instance of a well-known compu­tational problem called the number partitioning problem. You are given a set of real numbers and asked to divide it into two sets whose sums are equal, or as close to equal as possible. It’s a certifiably hard problem, but it has also been called (PDF) “the easiest hard problem.”For any given \(n\), we might find that inverting some other subset of the factors gives a better approximation to \(n! = 1\). For small \(n\), we can solve the problem by brute force: Just look at all \(2^n\) subsets and pick the best one.

I have computed the optimal partitionings up to \(n = 30\), where there are a billion possibilities to choose from.

Optimum balance graph

The graph is clearly flatlining. You could use the same method to force convergence to any other value between \(0\) and \(n!\).

And thus we have yet another answer to the question in the tweet that launched this adventure. What happens when you divide instead of multiply in n!? Anything you want.

by Brian Hayes at February 10, 2019 08:48 AM UTC

I recently wrote here about big convex polygons in grids, a problem for which we know very precise answers. This naturally raises the question: what about higher dimensions? How many vertices can be part of a convex polyhedron in an grid, or more generally a convex polytope in a -dimensional grid of side length ? Here we do still know some pretty good answers, at least up to constant factors in spaces of constant dimension.

The problem is included in a 2008 survey by Imre Bárány,1 according to whom the maximum number of vertices is

For instance, in three dimensional grids the maximum number of vertices is .

One way to find polyhedra with this many vertices is to take the convex hull of the points in a ball,2 3 or in scaled copies of any fixed smooth convex body. Another way, which should generate polyhedra with a somewhat less irregular appearance and (up to constant factors) the same number of vertices, is to take the Minkowski sum of all line segments (up to scaling and translation) that will fit into a smaller grid, of side length . For instance, the truncated rhombicuboctahedron below4 is the Minkowski sum of all the line segments that fit into a unit cube. Its 96 vertices lie in a grid. In general, this method produces a zonohedron whose complexity can be analyzed in terms of a -dimensional arrangement of hyperplanes. As long as this arrangement is not too degenerate (which it appears not to be, but I haven’t worked out the details carefully) this should give a number of vertices within a constant factor of the number coming from the convex hull construction.

Truncated rhombicuboctahedron

A matching upper bound comes from a 1963 paper by G. K. Andrews,5 and Bárány writes that although several more proofs have been published none of them is easy. I’m not sure whether the difficulty is in getting the exact bound or in the fact that Andrews and the later proofs allow more general shapes with volume that don’t fit into a grid, but it’s not hard to get close to the right bound simply by counting the number of possible facets of a given volume . By using lattice basis reduction the integer vectors in the hyperplane through any facet have a nearly-orthogonal basis whose product of lengths is proportional to . By considering how this product of lengths can be broken down into factors of different scales, and counting how many integer vectors of those lengths exist, it follows that the number of possible facets of volume is . Combining this with the surface area of a grid polytope gives the correct upper bound on the number of vertices up to a polylog factor.

What about when the dimension is not constant? An easy construction for high dimensions is to take all points with a fixed distance from the grid center. There are possible values for the distance, so this construction produces a convex polytope with vertices. It comes from a 1946 paper by Behrend, who uses this idea to find dense sets of integers with no arithmetic progressions.6 It is never worse to use the convex hull of the ball than the points on a sphere, and a celebrated paper by Elkin from 2011 (in the appendix of the published version) gives another proof of the bound for convex hulls of balls (for ) in which the constant factor of the is universal, not depending on . So when is singly exponential in , becomes constant and the convex hull technique produces vertices, improving Behrend’s construction for progression-free sets by the same factor.7

  1. Bárány, Imre (2008), “Extremal problems for convex lattice polytopes: a survey”, Surveys on Discrete and Computational Geometry, Contemporary Mathematics 453, Amer. Math. Soc., pp. 87–103, doi:10.1090/conm/453/08796, MR2405678

  2. Bárány, Imre and Larman, David (1998), “The convex hull of the integer points in a large ball”, Math. Ann. 312 (1), pp. 167–181, doi:10.1007/s002080050217, MR1645957

  3. Balog, Antal and Deshouillers, Jean-Marc (1999), “On some convex lattice polytopes”. Number Theory in Progress, Vol. 2 (Zakopane-Kościelisko, 1997), de Gruyter, pp. 591–606, MR1689533

  4. Ruen, Tom (2014), “Truncated rhombicuboctahedron”, CC-BY-SA 4.0, File:Truncated rhombicuboctahedron2.png on Wikimedia commons. 

  5. Andrews, George E. (1963), “A lower bound for the volume of strictly convex bodies with many boundary lattice points”, Trans. Amer. Math. Soc. 106, pp. 270–279, doi:10.2307/1993769, MR0143105

  6. Behrend, F. A. (1946), “On sets of integers which contain no three terms in arithmetical progression”, Proc. Nat. Acad. Sci. 32 (12), pp. 331–332, doi:10.1073/pnas.32.12.331, MR0018694

  7. Elkin, Michael (2011), “An improved construction of progression-free sets”, Israel J. Math. 184, pp. 93–128, arXiv:0801.4310, doi:10.1007/s11856-011-0061-1, MR2823971. The paragraph describing Elkin’s results was updated from an earlier more tentative version in the original post. 

(Discuss on Mastodon)

by David Eppstein at February 09, 2019 03:07 PM UTC

Minimax Testing of Identity to a Reference Ergodic Markov Chain, by Geoffrey Wolfer and Aryeh Kontorovich (arXiv). This work studies distributional identity testing on Markov chains from a single trajectory, as recently introduced by Daskalakis, Dikkala, and Gravin: we wish to test whether a Markov chain is equal to some reference chain, or far from it. This improves on previous work by considering a stronger distance measure than before, and showing that the sample complexity only depends on properties of the reference chain (which we are trying to test identity to). It additionally proves instance-by-instance bounds (where the sample complexity depends on properties of the specific chain we wish to test identity to).

Almost Optimal Distribution-free Junta Testing, by Nader H. Bshouty (arXiv). This paper provides a \(\tilde O(k/\varepsilon)\)-query algorithm with two-sided error for testing if a Boolean function is a \(k\)-junta (that is, its value depends only on \(k\) of its variables) in the distribution-free model (where distance is measured with respect to an unknown distribution from which we can sample). This complexity is a quadratic improvement over the \(\tilde O(k^2)/\varepsilon\)-query algorithm of Chen, Liu, Servedio, Sheng, and Xie. This complexity is also near-optimal, as shown in a lower bound by Saglam (which we covered back in August).

Exponentially Faster Massively Parallel Maximal Matching, by Soheil Behnezhad, MohammadTaghi Hajiaghayi, and David G. Harris (arXiv). The authors consider maximal matching in the Massively Parallel Computation (MPC) model. They show that one can compute a maximal matching in \(O(\log \log \Delta)\)-rounds, with \(O(n)\) space per machine. This is an exponential improvement over the previous works, which required either \(\Omega(\log n)\) rounds or \(n^{1 + \Omega(1)}\) space per machine. Corollaries of their result include approximation algorithms for vertex cover, maximum matching, and weighted maximum matching.

by Gautam Kamath at February 08, 2019 06:30 PM UTC

The following message just went out to ACM SIGecom members:

———- Forwarded message ———
From: Monique Chang <>
Date: Mon, Feb 4, 2019 at 2:20 PM
Subject: 2019 ACM SIGecom Election: Candidate Slate Announcement
To: <>

Dear ACM SIGecom Member,

The ACM SIGecom Nominating Committee has proposed the following candidates for the 2019 ACM SIGecom election.

(Running Unopposed)

Nicole Immorlica


Scott Kominers

Ariel Procaccia


Hu Fu

Katrina Ligett

In accordance with the ACM SIG Bylaws, additional candidates may be placed on the ballot by petition. All candidates must be ACM Professional Members, as well as members of the SIG. Anyone interested in petitioning must inform ACM Headquarters, Pat Ryan (, and SIGecom’s Secretary-Treasurer, Jenn Wortman Vaughan (, of their intent to petition by 15 March 2019. Petitions must be submitted to ACM Headquarters for verification by 2 April 2019.

Monique Chang

ACM SIG Elections Coordinator

Office of Policy and Administration

Three things for members of our community to note:

  1. It’s important vote (once the link goes out; note that the current email is just an announcement and an invitation for additional candidates to petition to be included on the ballot). The SIG leadership is very important for the ongoing direction of our organization. Your vote makes a difference, because our elections are often decided by small margins.
  1. If you didn’t get this email, you’re likely not registered as a member of our SIG. Membership costs only $5 for students and $10 for others; AFAIK, you don’t have to be an ACM member to be a SIG member. Our number of members is an important signal to the ACM about the strength of our community (which is why we have set our fees so low). Votes like this one are also restricted to members! If your membership has lapsed, or if you’ve never taken the plunge, this might be a good occasion to do so, by clicking on the link below:

  1. Thanks to our nominations chair, David Parkes, who put together the slate of candidates just listed, and also to all of the candidates who agreed to serve. Our community is really lucky to have such a strong and deep pool of volunteers, and this is one more example. Indeed, in advance, I’d particularly like to thank those candidates who *don’t* win, whoever they turn out to be: it’s thankless to stick one’s neck out for an election only to see someone else get chosen (often by a small margin; see #1), but your willingness to serve is much appreciated.

by Kevin Leyton-Brown at February 08, 2019 02:44 AM UTC

We study the Fourier spectrum of functions $f\colon \{0,1\}^{mk} \to \{-1,0,1\}$ which can be written as a product of $k$ Boolean functions $f_i$ on disjoint $m$-bit inputs. We prove that for every positive integer $d$, \[ \sum_{S \subseteq [mk]: |S|=d} |\hat{f_S}| = O(m)^d . \] Our upper bound is tight up to a constant factor in the $O(\cdot)$. Our proof builds on a new "level-$d$ inequality" that bounds above $\sum_{|S|=d} \hat{f_S}^2$ for any $[0,1]$-valued function $f$ in terms of its expectation, which may be of independent interest. As a result, we construct pseudorandom generators for such functions with seed length $\tilde O(m + \log(k/\varepsilon))$, which is optimal up to polynomial factors in $\log m$, $\log\log k$ and $\log\log(1/\varepsilon)$. Our generator in particular works for the well-studied class of combinatorial rectangles, where in addition we allow the bits to be read in any order. Even for this special case, previous generators have an extra $\tilde O(\log(1/\varepsilon))$ factor in their seed lengths. Using Schur-convexity, we also extend our results to functions $f_i$ whose range is $[-1,1]$.

at February 07, 2019 02:59 PM UTC

We study the approximation of halfspaces $h:\{0,1\}^n\to\{0,1\}$ in the infinity norm by polynomials and rational functions of any given degree. Our main result is an explicit construction of the "hardest" halfspace, for which we prove polynomial and rational approximation lower bounds that match the trivial upper bounds achievable for all halfspaces. This completes a lengthy line of work started by Myhill and Kautz (1961). As an application, we construct a communication problem with essentially the largest possible gap, of $n$ versus $2^{-\Omega(n)},$ between the sign-rank and discrepancy. Equivalently, our problem exhibits a gap of $\log n$ versus $\Omega(n)$ between the communication complexity with unbounded versus weakly unbounded error, improving quadratically on previous constructions and completing a line of work started by Babai, Frankl, and Simon (FOCS 1986). Our results further generalize to the $k$-party number-on-the-forehead model, where we obtain an explicit separation of $\log n$ versus $\Omega(n/4^{n})$ for communication with unbounded versus weakly unbounded error. This gap is a quadratic improvement on previous work and matches the state of the art for number-on-the-forehead lower bounds.

at February 07, 2019 02:57 PM UTC

As a grad student in the late 80's I had the opportunity to witness many great and often surprising theorems in computational complexity. Let me tell you about one of them, the Immerman-Szelepcsényi result that nondeterministic space is closed under complement. I wish I had the original emails for this story but instead I'm working from memory and apologies if I get some of the details wrong. I'm expanding from a short version from the early days of this blog.

I started my graduate work at UC Berkeley in 1985 and then moved to MIT in the summer of '86, following my advisor Michael Sipser. In the summer of 1987, Neil Immerman, then at Yale, proved his famous result building on his work in descriptive complexity In those days you didn't email papers, he made copies and sent them by US postal mail to several major researchers in complexity including Sipser. But Sipser was away for the summer, I believe in Russia, and the paper sat in his office.

Immerman also sent the paper to a Berkeley professor, probably Manuel Blum, who gave it to one of his students who decided to speak about the result in a student-led seminar. I forgot who was the student, maybe Moni Naor. I was still on the Berkeley email list so I got the talk announcement and went into complexity ecstasy over the news. I asked Moni (or whomever was giving the talk) if he could tell me details and he sent me a nice write-up of the proof. Given the importance of the result, I sent the proof write-up out to the MIT theory email list.

Guess who was on the MIT theory list? Neil Immerman. Neil wrote back with his own explanation of the proof. Neil explained how it came out of descriptive complexity but as a pure write-up of a proof of the theorem, Moni did an excellent job.

We found out about Robert Szelepcsényi when his paper showed up a few months later in the Bulletin of the European Association for Theoretical Computer Science. Szelepcsényi came to the problem from formal languages, whether context-sensitive languages (nondeterministic linear space) was closed under complement. Szelepcsényi, an undergrad in Slovakia at the time, heard about the problem in a class he took. Szelepcsényi's proof was very similar to Immerman. Szelepcsényi's paper took longer to get to US researchers but likely was proven and written about the same time as Immerman.

Even though both papers were published separately we refer to the result as Immerman-Szelepcsényi and is now just some old important theorem you see in introductory theory classes.

by Lance Fortnow ( at February 07, 2019 12:47 PM UTC

We prove a query complexity lower bound for $QMA$ protocols that solve approximate counting: estimating the size of a set given a membership oracle. This gives rise to an oracle $A$ such that $SBP^A \not\subset QMA^A$, resolving an open problem of Aaronson [2]. Our proof uses the polynomial method to derive a lower bound for the $SBQP$ query complexity of the $AND$ of two approximate counting instances. We use Laurent polynomials as a tool in our proof, showing that the "Laurent polynomial method" can be useful even for problems involving ordinary polynomials.

at February 07, 2019 08:16 AM UTC