Theory of Computing Blog Aggregator

I originally planned this summer to finish the work on my Introduction to Theoretical Computer Science book, and in particular write the two missing chapters on space complexity and interactive proof systems. Needless to say, this summer did not go as planned and I won’t be able to write these chapters. However, I still intend to go over the existing chapters, fixing typos, adding examples, exercises, and generally making it friendlier to beginning undergraduate students.

Toward this end, I would be grateful for people posting bugs, typos, and suggestions as GitHub issues (I currently have 267 closed and 14 open issues which I hope to get to soon). Of course, if you are technically inclined and there’s a simple local fix, you can also make a pull request.

Aside from these fixes, I am making two more “global” changes to the book. First, I am adding a “non mathy overview” for each chapter. While some students got a lot from reading the book prior to lectures, others were intimidated by the mathematical notation, and so I hope this more gentle introduction will be helpful. I am also adding more examples & solved exercises toward this end.

Another change is that I now follow the more traditional way of presenting deterministic finite automata before Turing machines – DFAs are still optional and can be skipped without missing anything, but some instructors find them as a good introduction to Turing Machines. Thus the order of presentation of materials in the book is roughly as follows:

  1. Introduction, representing objects as strings – Representing numbers, lists, etc. Specifying computational tasks as functions mapping binary strings to binary strings, Cantor’s theorem.
  2. Finite functions and Boolean circuits – Every function can be computed by some circuit, circuits as straightline programs, representing circuits as strings, universal circuit evaluator, counting lower bound.
  3. Computing on unbounded inputs – DFAs (optional), Turing Machines, equivalence between Turing machines, RAM machines and programming languages, λ calculus (optional), cellular automata (optional)
  4. Uncomputability – Universal Turing machine, Halting problem, reductions, Rice’s Theorem. Optional: Gödel’s incompleteness theorem, uncomputability of quantified arithmetic statements, context free grammars.
  5. Efficient computation – Modeling running time, time hierarchy theorem, P and EXP
  6. NP and NP completeness – Polynomial-time reductions, Cook-Levin Theorem (using circuits), definition of NP using “proof system”/”verifying algorithms” (no non-deterministic TMs), NP completeness, consequences of P=NP: search to decision, optimal machine learning, etc..
  7. Randomized computation: Worst-case randomized computation, defining BPP, Sipser-Gács, does BPP=P? (a little on derandomization)
  8. Cryptography: One time pad, necessity of long keys for information theoretic crypto, pseudorandom generators and stream ciphers, taste of public key and “magic” (ZKP, FHE, MPC)
  9. Quantum computing: Some quantum mechanics background – double slit experiment, Bell’s inequality. Modeling quantum computation. Bird’s eye view of Shor’s algorithm and quantum Fourier transform.

by Boaz Barak at July 10, 2020 05:29 PM UTC

Ron Graham passed away, but he lives on…

Cropped from tribute by Tom Leighton

Ron Graham just passed away Monday at the age of {84} in La Jolla near UCSD.

Today Ken and I wish to say a few words about Ron.

Tributes are being written as we write, including this from the Simons Foundation. Here is the American Mathematical Society announcement, which we saw first:

Ron Graham, a leader in discrete mathematics and a former president of both the AMS (1993-1994) and the MAA (2003-2004), died on July 6. He was 84. Graham published more than 350 papers and books with many collaborators, including more than 90 with his wife, Fan Chung, and more than 30 with Paul Erdős. He was known for his infectious enthusiasm, his originality, and his accessibility to anyone who had a mathematics question.

A tribute by Brady Haran embeds several short videos of Ron and his work. Fan’s own page for Ron has much more. We have made a collage of images from his life:

Ron was special and will be greatly missed by all. We at GLL send our thoughts to his dear wife, Fan. Ken and I knew Ron for many years. Ken knew Ron since a visit to Bell Labs in the 1980s and meeting Fan too at STOC 1990. I knew Ron since I was at Yale in the 1970’s—a long time ago. I recall fondly meeting him for the first time when he was at Bell Labs.

Some Stories

Ken and I thought we would give some personal stories about Graham.

{\bullet } Ken’s story is told here. In breaking a confidence by telling Erdős the secret about Bobby Fischer recounted there, Ken hoped that it would spread behind the scenes to enough people that Fischer would be less blamed for failing to play Anatoly Karpov in 1975. Since Erdős was staying with the Grahams, presumably it would have emerged there. The social excursion during STOC 1990 was a dinner cruise in Baltimore’s harbor. Ron and Fan and Ken found each other right away, and some questions to Ken about chess quickly went to the Fischer topic. At least Ken knows the secret was told at least once.

{\bullet } Ron told me once that he was the accountant for Erdős. One of Ron’s jobs was to keep track of the prize money that Erdős owed. Ron would send out the checks to whoever solved the next problem. One of the brilliant insights of Erdős was to make the problems hard, but at least some where solvable. Ron told me that for years no one would actually cash the checks. They would frame them and proudly display them.

Ron said that he liked this for the obvious reason—less cash for Erdős to have to pay. But the advent of color xerox machines in the 1970’s changed this. He told me that people began cashing the checks and displaying the color copy. Bummer.

{\bullet } My first talk at Bell Labs was on my work on the planar separator theorem—joint work with Bob Tarjan. At the beginning of the talk I saw that Ron had a pile of papers on his desk. He was a manager and I guessed he had some paper work to do. I gave my talk. At the end I when up to Ron in the back and he said:

I did not get any work done.

I still fondly remember that as high praise.

{\bullet } Graham loved to do hand stands. I recall walking around Bell Labs one day when out of the blue Ron did a full handstand. He said that he liked to do these on the hand rail of the stairs. The trick he said was: “To not fall down.”

I searched for him doing handstands and found out he and Fan lived in a modern beautiful house.

When two mathematicians found a circular home designed by architect Kendrick Bangs Kellogg in La Jolla, they treasured their unique discovery.

Fun and Games

Ron kept a simply organized page of all his papers. They are not sorted by subject or kind, but the titles are so descriptive that you can tell at a glance where the fun is. A number of them are expositions in the popular magazines of the AMS and MAA.

Among them, we’ll mention this note from 2016, titled “Inserting Plus Signs and Adding.” It is joint with Steve Butler, who penned his own reminiscence for Lance and Bill’s blog, and Richard Strong.

Say that a number {w} is “reducible” to a number {v} in one step (in base {b}) if there is a way to insert one or more {+} signs into the base-{b} representation of {w} so that the resulting numbers add up to {v}. For example, 1935 is reducible to 99 via {1 + 93 + 5}. The number 99 reduces only to 18 via {9+9}, and 18 reduces only to 9, which cannot be reduced further. Thus Ron’s birth year took {3} reduction steps to become a single digit. However, doing {1+9+3+5} gives 18 straightaway and thus saves a step. The paper gives cases where inserting {+} everywhere is not a quickest way to reduce to a single digit.

Definition 1 For any base {b} and number {n \geq 1} denoting an input length, not magnitude, define {f_b(n)} to be the least integer {m} such that all base-{b} numbers of length {n} can be reduced to a single digit within {m} steps.

The question—of a complexity theoretic nature—is:

Given {b}, what is the growth rate of {f_b(n)} as {n \rightarrow \infty}?

Here are some possible answers—which would you expect to be correct in the case where {b} is base 10?

  • {f_{10}(n) = \Theta(\sqrt{n})}.

  • {f_{10}(n) = \Theta(n^{1/10})}.

  • {f_{10}(n) = O(\log n)}.

  • {f_{10}(n) = O(\log\log n)}.

  • {f_{10}(n) = O(\alpha(n))}, where {\alpha} is the inverse Ackermann function.

Your expectation might be wrong—see the paper for the answer and its nifty proof. For a warmup, if you want to answer without looking at the paper, prove that the final reduced digit is the same regardless of the sequence of reductions.

Ron is also known for very big integers, including one that held the record for largest to appear in a published mathematical proof. You can find it among the above tributes and also on a T-shirt. We could also mention his role in the largest proof known to date—at 200 terabytes it almost doubles the size of the tables for proving results of seven-piece chess endgames.

If you desire serious fun, look also to Ron’s books. He wrote several, including co-authoring the nonpareil textbook Concrete Mathematics with Don Knuth and Oren Patashnik.

Some Prizes

Ron in the tradition famously followed by Erdős like to put money on problems. A $10 dollar problem was much easier than a $100 one. A $1,000 one is extremely hard, and so on. In Ron’s paper on his favorite problems he stated this one:

Let {H_{n} = \sum_{j=1}^{n} \frac{1}{j}}. Challenge: prove the inequality for all {n \ge 1},

\displaystyle  \sum_{d | n} d \le H_{n} + \exp(H_{n})\log(H_{n}).

And he put the prize at $1,000,000. He added:

Why is this reward so outrageous? Because this conjecture is equivalent to the Riemann Hypothesis! A single {n} violating would imply there are infinitely many zeroes of the Riemann zeta function off the critical line {R(z) = 1}. Of course, the $1,000,000 prize is not from me but rather is offered by the Clay Mathematics Institute since the Riemann Hypothesis is one of their six remaining Millennium Prize Problems. We hope to live to see progress in the Challenges and Conjectures mentioned in this note, especially the last one!

Alas Ron did not get to see this resolved. Nor of course did Erdős, nor may any of us. But Ron is prominently mentioned on another Simons page where Erdős lives on, and so may Ron.

Open Problems

Ron died at age {84}. Perhaps he liked that it is the sum of a twin prime {41 + 43}, and also three times a perfect number. We will always remember {84} because of Ron.

by RJLipton+KWRegan at July 10, 2020 04:08 PM UTC

Authors: Zeev Nutov, Elad Shoham
Download: PDF
Abstract: We consider the Budgeted Submodular Maximization problem, that seeks to maximize an increasing submodular function subject to budget constraints. Extending a result of Khuller, Moss, and Naor for the Budgeted Coverage problem, Sviridenko showed that the greedy algorithm combined with guessing 3 most profitable elements of an optimal solution has approximation ratio $\alpha=1-\frac{1}{e} \approx 0.632$. We show that just $2$ guesses suffice to achieve ratio $\alpha$, 1 guess suffices to achieve ratio $0.899 \alpha \approx 0.568$, while ratio $0.68\alpha \approx 0.43$ can be achieved without any guessing. We note that ratio $\alpha-\epsilon$ can be achieved using ${(1/\epsilon)}^{O(1/\epsilon^4)}n\log n$ value oracle calls, but this algorithm is impractical already for large values of $\epsilon$. Among practical algorithms, our is the currently best known one.

at July 10, 2020 12:00 AM UTC

Authors: Jim Mainprice, Nathan Ratliff, Marc Toussaint, Stefan Schaal
Download: PDF
Abstract: Algorithmic solutions for the motion planning problem have been investigated for five decades. Since the development of A* in 1969 many approaches have been investigated, traditionally classified as either grid decomposition, potential fields or sampling-based. In this work, we focus on using numerical optimization, which is understudied for solving motion planning problems. This lack of interest in the favor of sampling-based methods is largely due to the non-convexity introduced by narrow passages. We address this shortcoming by grounding the motion planning problem in differential geometry. We demonstrate through a series of experiments on 3 Dofs and 6 Dofs narrow passage problems, how modeling explicitly the underlying Riemannian manifold leads to an efficient interior-point non-linear programming solution.

at July 10, 2020 01:33 AM UTC

Authors: Massimo Cairo, Shahbaz Khan, Romeo Rizzi, Sebastian Schmidt, Alexandru I. Tomescu
Download: PDF
Abstract: Given a directed graph $G$ and a pair of nodes $s$ and $t$, an \emph{$s$-$t$ bridge} of $G$ is an edge whose removal breaks all $s$-$t$ paths of $G$ (and thus appears in all $s$-$t$ paths). Computing all $s$-$t$ bridges of $G$ is a basic graph problem, solvable in linear time.

In this paper, we consider a natural generalisation of this problem, with the notion of "safety" from bioinformatics. We say that a walk $W$ is \emph{safe} with respect to a set $\mathcal{W}$ of $s$-$t$ walks, if $W$ is a subwalk of all walks in $\mathcal{W}$. We start by considering the maximal safe walks when $\mathcal{W}$ consists of: all $s$-$t$ paths, all $s$-$t$ trails, or all $s$-$t$ walks of $G$. We show that the first two problems are immediate linear-time generalisations of finding all $s$-$t$ bridges, while the third problem is more involved. In particular, we show that there exists a compact representation computable in linear time, that allows outputting all maximal safe walks in time linear in their length.

We further generalise these problems, by assuming that safety is defined only with respect to a subset of \emph{visible} edges. Here we prove a dichotomy between the $s$-$t$ paths and $s$-$t$ trails cases, and the $s$-$t$ walks case: the former two are NP-hard, while the latter is solvable with the same complexity as when all edges are visible. We also show that the same complexity results hold for the analogous generalisations of \emph{$s$-$t$ articulation points} (nodes appearing in all $s$-$t$ paths).

We thus obtain the best possible results for natural "safety"-generalisations of these two fundamental graph problems. Moreover, our algorithms are simple and do not employ any complex data structures, making them ideal for use in practice.

at July 10, 2020 01:24 AM UTC

Authors: Markus Hecher, Jorge Fandinno
Download: PDF
Abstract: It is well-know that deciding consistency for normal answer set programs (ASP) is NP-complete, thus, as hard as the satisfaction problem for classical propositional logic (SAT). The best algorithms to solve these problems take exponential time in the worst case. The exponential time hypothesis (ETH) implies that this result is tight for SAT, that is, SAT cannot be solved in subexponential time. This immediately establishes that the result is also tight for the consistency problem for ASP. However, accounting for the treewidth of the problem, the consistency problem for ASP is slightly harder than SAT: while SAT can be solved by an algorithm that runs in exponential time in the treewidth k, it was recently shown that ASP requires exponential time in k \cdot log(k). This extra cost is due checking that there are no self-supported true atoms due to positive cycles in the program. In this paper, we refine the above result and show that the consistency problem for ASP can be solved in exponential time in k \cdot log({\lambda}) where {\lambda} is the minimum between the treewidth and the size of the largest strongly-connected component in the positive dependency graph of the program. We provide a dynamic programming algorithm that solves the problem and a treewidth-aware reduction from ASP to SAT that adhere to the above limit.

at July 10, 2020 12:00 AM UTC

Authors: Gabriel L. Duarte, Hiroshi Eto, Tesshu Hanaka, Yasuaki Kobayashi, Yusuke Kobayashi, Daniel Lokshtanov, Lehilton L. C. Pedrosa, Rafael C. S. Schouery, Uéverton S. Souza
Download: PDF
Abstract: The cut-set $\partial(S)$ of a graph $G=(V,E)$ is the set of edges that have one endpoint in $S\subset V$ and the other endpoint in $V\setminus S$, and whenever $G[S]$ is connected, the cut $[S,V\setminus S]$ of $G$ is called a connected cut. A bond of a graph $G$ is an inclusion-wise minimal disconnecting set of $G$, i.e., bonds are cut-sets that determine cuts $[S,V\setminus S]$ of $G$ such that $G[S]$ and $G[V\setminus S]$ are both connected. Contrasting with a large number of studies related to maximum cuts, there exist very few results regarding the largest bond of general graphs. In this paper, we aim to reduce this gap on the complexity of computing the largest bond, and the maximum connected cut of a graph. Although cuts and bonds are similar, we remark that computing the largest bond and the maximum connected cut of a graph tends to be harder than computing its maximum cut. We show that it does not exist a constant-factor approximation algorithm to compute the largest bond, unless P = NP. Also, we show that {\sc Largest Bond} and {\sc Maximum Connected Cut} are NP-hard even for planar bipartite graphs, whereas \textsc{Maximum Cut} is trivial on bipartite graphs and polynomial-time solvable on planar graphs. In addition, we show that {\sc Largest Bond} and {\sc Maximum Connected Cut} are NP-hard on split graphs, and restricted to graphs of clique-width $w$ they can not be solved in time $f(w)\times n^{o(w)}$ unless the Exponential Time Hypothesis fails, but they can be solved in time $f(w)\times n^{O(w)}$. Finally, we show that both problems are fixed-parameter tractable when parameterized by the size of the solution, the treewidth, and the twin-cover number.

at July 10, 2020 12:00 AM UTC

Authors: Guilherme C. M. Gomes, Vinicius F. dos Santos, Murilo V. G. da Silva, Jayme L. Szwarcfiter
Download: PDF
Abstract: The three-in-a-tree problem asks for an induced tree of the input graph containing three mandatory vertices. In 2006, Chudnovsky and Seymour [Combinatorica, 2010] presented the first polynomial time algorithm for this problem, which has become a critical subroutine in many algorithms for detecting induced subgraphs, such as beetles, pyramids, thetas, and even and odd-holes. In 2007, Derhy and Picouleau [Discrete Applied Mathematics, 2009] considered the natural generalization to $k$ mandatory vertices, proving that, when $k$ is part of the input, the problem is $\mathsf{NP}$-complete, and ask what is the complexity of four-in-a-tree. Motivated by this question and the relevance of the original problem, we study the parameterized complexity of $k$-in-a-tree. We begin by showing that the problem is $\mathsf{W[1]}$-hard when jointly parameterized by the size of the solution and minimum clique cover and, under the Exponential Time Hypothesis, does not admit an $n^{o(k)}$ time algorithm. Afterwards, we use Courcelle's Theorem to prove fixed-parameter tractability under cliquewidth, which prompts our investigation into which parameterizations admit single exponential algorithms; we show that such algorithms exist for the unrelated parameterizations treewidth, distance to cluster, and distance to co-cluster. In terms of kernelization, we present a linear kernel under feedback edge set, and show that no polynomial kernel exists under vertex cover nor distance to clique unless $\mathsf{NP} \subseteq \mathsf{coNP}/\mathsf{poly}$. Along with other remarks and previous work, our tractability and kernelization results cover many of the most commonly employed parameters in the graph parameter hierarchy.

at July 10, 2020 12:00 AM UTC

Authors: Suman Banerjee, Bithika Pal
Download: PDF
Abstract: Given a temporal network $\mathcal{G}(\mathcal{V}, \mathcal{E}, \mathcal{T})$, $(\mathcal{X},[t_a,t_b])$ (where $\mathcal{X} \subseteq \mathcal{V}(\mathcal{G})$ and $[t_a,t_b] \subseteq \mathcal{T}$) is said to be a $(\Delta, \gamma)$\mbox{-}clique of $\mathcal{G}$, if for every pair of vertices in $\mathcal{X}$, there must exist at least $\gamma$ links in each $\Delta$ duration within the time interval $[t_a,t_b]$. Enumerating such maximal cliques is an important problem in temporal network analysis, as it reveals contact pattern among the nodes of $\mathcal{G}$. In this paper, we study the maximal $(\Delta, \gamma)$\mbox{-}clique enumeration problem in online setting; i.e.; the entire link set of the network is not known in advance, and the links are coming as a batch in an iterative manner. Suppose, the link set till time stamp $T_{1}$ (i.e., $\mathcal{E}^{T_{1}}$), and its corresponding $(\Delta, \gamma)$-clique set are known. In the next batch (till time $T_{2}$), a new set of links (denoted as $\mathcal{E}^{(T_1,T_2]}$) is arrived.

at July 10, 2020 01:26 AM UTC

The problem of constructing hazard-free Boolean circuits (those avoiding electronic glitches) dates back to the 1940s and is an important problem in circuit design. Recently, Ikenmeyer et al. [J. ACM, 66:4 (2019), Article 25] have shown that the hazard-free circuit complexity of any Boolean function $f(x)$ is lower-bounded by the monotone circuit complexity of the monotone Boolean function which accepts an input $x$ iff $f(z)=1$ for some vector $z\leq x$. We give a short and amazingly simple proof of this interesting result. We also show that a circuit is hazard-free if and only if the circuit and its dual produce (purely syntactically) all prime implicants of the functions they compute. This extends a classical result of Eichelberger [IBM J. Res. Develop., 9 (1965)] showing this property for depth-two circuits producing no terms containing a variable together with its negation. Finally, we give a very simple non-monotone Boolean function whose hazard-free circuit complexity is super-polynomially larger than its unrestricted circuit complexity.

at July 09, 2020 07:01 PM UTC

Ronald Graham passed away on July 6 at the age of 84. We present reflections on Ronald Graham by Steve Butler.

Getting to work with Ron Graham

Ron Graham has helped transform the mathematics community and in particular been a leader in discrete mathematics for more than 50 years. It is impossible to fully appreciate the breadth of his work in one sitting, and I will not try to do so here. Ron has put his papers online and made them freely available, a valuable treasure; and there are still many a hidden gem inside of these papers that are waiting to be picked up, polished, and pushed further.

I want to share about how I got to know and work with Ron. To be fair I knew about Ron long before I ever knew Ron. He was that rare pop-star mathematician who had managed to reach out and become visible outside of the mathematical community. And so as a teenager I read about Ron in a book about Erdos. I thought to myself that this guy sounds really cool and someday I might even get to see him give a talk (if I was lucky).

I went to UC San Diego for graduate school and after a series of near-misses ended up studying under Fan Chung. I passed Ron in the stairwell once, and then also helped them move some furniture between their two adjoining homes (graduate students are great for manual labor). But I became determined to try and find a way to start a conversation with Ron and maybe work up to working on a problem. So I took the usual route: I erased the chalkboards for him.

Before his class on discrete mathematics would start, I would come in and clean the chalkboards making them pristine. It also gave me time to occasionally engage in some idle chat, and he mentioned that his papers list was far from complete. I jumped on it and got to work right away and put his papers online and have been maintaining that list for the last fifteen years. This turned out to be no small feat and required about six months of work.  Many papers had no previous online version, and there were even a few papers that Ron had written that he had forgotten about! But this gave me a reason to come to Ron and talk with him about his various papers and then he would mention some problems he was working on with others and where they were stuck and thought I might give them a try.

So I started to work on these problems and started to make progress. And Ron saw what I was able to do and would send me more problems that fit my abilities and interests, and I would come back and show him partial solutions, or computations, and then he would often times fill in the gaps. He was fun to work with, because we almost always made progress; even when we didn't make progress we still understood things more fully. Little by little our publications (and friendship) grew and we now have 25+ joint publications, and one more book that will be coming out in the next few years about the enumerating juggling patterns.

After all of that though, I discovered something. I could have just gone to Ron's door and knocked and he would have talked to me, and given me problems (though our friendship would not become so deep if I had chosen the forthright method). But almost no graduate students in math were brave enough to do it; they were scared off by his reputation. As a consequence, Ron had far fewer math graduate students than you would expect. (To any math graduate student out there, don't let fear stop you from talking with professors; many of them are much nicer than you think, and the ones that are not nice are probably not that great to work with.)

So one of the most important lessons I learned from Ron was the importance of kindness. Ron was generous and kind to everyone (and I really stress the word everyone) that he met. It didn't matter what walk of life you were in, what age you were, or what level of math (if any) that you knew, he was kind and willing to share his time and talents. He always had something in reach in his bag or pocket that he could pull out and show someone and give them an unexpected sense of wonder.

Richard Hamming once said "you can be a nice guy or you can be a great scientist", the implication being that you cannot do both. Ron showed that you can be a nice guy and a great scientist. And I believe that a significant portion of his success is owed to his being kind; all of us should learn from his examples and show more kindness towards others.

This is only one of many lessons I learned from Ron. Another thing I learned from Ron is the importance of data. I have seen multiple times when we would work on a problem and generate data resulting in what I thought were hopeless numbers to understand. But Ron looked at that same data and with a short bit of trial and error was able to make a guess of what the general form was. And almost inevitably he would be right! One way that Ron could do this was to start by factoring the values, and if all the prime factors were small he could guess that the expression was some combination of factorials and powers and then start to play with expressions until things worked out. Even when I knew what he did, I still am amazed that he was able to do it.

I will miss Ron, I will never have a collaboration as deep, as meaningful, and as personal. I am better for having worked with him, and learning from him about how to be a better mathematician and a better person.

Thank you, Ron.

by gasarch ( at July 09, 2020 03:57 PM UTC

While we at PTReview always look through the posted papers, we do not check for correctness. We make a serious attempt to make sure the paper is reasonable. In a few instances, we have decided not to post a (topically relevant) paper, because it looks absolutely wrong. Our position is: the benefit of doubt goes to the author, and a borderline paper should be posted. We are only curating relevant tech reports, not passing judgment on results.

In some borderline cases, readers familiar with the subject complained to us that the paper should be not be considered a scientific contribution (because of, say, unspecified algorithms, blatantly incorrect or unverifiable central claims). These are cases where we were also unsure of the paper. We have usually removed/not posted such papers.

If the paper author(s) feels that his/her paper should nonetheless be posted, then they should email us at As long as the paper is not complete nonsense and appears to cite relevant history, we will defer to the authors’ wishes.

by Seshadhri at July 09, 2020 12:38 AM UTC

Authors: Philip Bille, Inge Li Gørtz, Max Rishøj Pedersen, Eva Rotenberg, Teresa Anna Steiner
Download: PDF
Abstract: The classic string indexing problem is to preprocess a string $S$ into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string $P$, report all occurrences of $P$ within $S$. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-$k$ close consecutive occurrences problem (SITCCO). Here, a consecutive occurrence is a pair $(i,j)$, $i < j$, such that $P$ occurs at positions $i$ and $j$ in $S$ and there is no occurrence of $P$ between $i$ and $j$, and their distance is defined as $j-i$. Given a pattern $P$ and a parameter $k$, the goal is to report the top-$k$ consecutive occurrences of $P$ in $S$ of minimal distance. The challenge is to compactly represent $S$ while supporting queries in time close to length of $P$ and $k$. We give two new time-space trade-offs for the problem. Our first result achieves near-linear space and optimal query time, and our second result achieves linear space and near optimal query time. Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees.

at July 09, 2020 11:30 PM UTC

Authors: Micha Sharir, Noam Solomon
Download: PDF
Abstract: We study incidence problems involving points and curves in $R^3$. The current (and in fact only viable) approach to such problems, pioneered by Guth and Katz, requires a variety of tools from algebraic geometry, most notably (i) the polynomial partitioning technique, and (ii) the study of algebraic surfaces that are ruled by lines or, in more recent studies, by algebraic curves of some constant degree. By exploiting and refining these tools, we obtain new and improved bounds for point-curve incidence problems in $R^3$.

Incidences of this kind have been considered in several previous studies, starting with Guth and Katz's work on points and lines. Our results, which are based on the work of Guth and Zahl concerning surfaces that are doubly ruled by curves, provide a grand generalization of most of the previous results. We reconstruct the bound for points and lines, and improve, in certain significant ways, recent bounds involving points and circles (in Sharir, Sheffer and Zahl), and points and arbitrary constant-degree algebraic curves (in Sharir, Sheffer and Solomon). While in these latter instances the bounds are not known (and are strongly suspected not) to be tight, our bounds are, in a certain sense, the best that can be obtained with this approach, given the current state of knowledge.

As an application of our point-curve incidence bound, we show that the number of triangles spanned by a set of $n$ points in $R^3$ and similar to a given triangle is $O(n^{15/7})$, which improves the bound of Agarwal et al. Our results are also related to a study by Guth et al.~(work in progress), and have been recently applied in Sharir, Solomon and Zlydenko to related incidence problems in three dimensions.

at July 09, 2020 11:39 PM UTC

Authors: Xinrui Jia, Kshiteej Sheth, Ola Svensson
Download: PDF
Abstract: An instance of colorful k-center consists of points in a metric space that are colored red or blue, along with an integer k and a coverage requirement for each color. The goal is to find the smallest radius \r{ho} such that there exist balls of radius \r{ho} around k of the points that meet the coverage requirements. The motivation behind this problem is twofold. First, from fairness considerations: each color/group should receive a similar service guarantee, and second, from the algorithmic challenges it poses: this problem combines the difficulties of clustering along with the subset-sum problem. In particular, we show that this combination results in strong integrality gap lower bounds for several natural linear programming relaxations. Our main result is an efficient approximation algorithm that overcomes these difficulties to achieve an approximation guarantee of 3, nearly matching the tight approximation guarantee of 2 for the classical k-center problem which this problem generalizes.

at July 09, 2020 12:00 AM UTC

Authors: Šimon Schierreich, Ondřej Suchý
Download: PDF
Abstract: In the \textsc{Waypoint Routing Problem} one is given an undirected capacitated and weighted graph $G$, a source-destination pair $s,t\in V(G)$ and a set $W\subseteq V(G)$, of \emph{waypoints}. The task is to find a walk which starts at the source vertex $s$, visits, in any order, all waypoints, ends at the destination vertex $t$, respects edge capacities, that is, traverses each edge at most as many times as is its capacity, and minimizes the cost computed as the sum of costs of traversed edges with multiplicities. We study the problem for graphs of bounded treewidth and present a new algorithm for the problem working in $2^{O(\mathrm{tw})}\cdot n$ time, significantly improving upon the previously known algorithms. We also show that this running time is optimal for the problem under Exponential Time Hypothesis.

at July 09, 2020 12:00 AM UTC

Authors: Amandeep Singh Bhatia, Shenggen Zheng
Download: PDF
Abstract: In recent years, the modeling interest has increased significantly from the molecular level to the atomic and quantum scale. The field of computational chemistry plays a significant role in designing computational models for the operation and simulation of systems ranging from atoms and molecules to industrial-scale processes. It is influenced by a tremendous increase in computing power and the efficiency of algorithms. The representation of chemical reactions using classical automata theory in thermodynamic terms had a great influence on computer science. The study of chemical information processing with quantum computational models is a natural goal. In this paper, we have modeled chemical reactions using two-way quantum finite automata, which are halted in linear time. Additionally, classical pushdown automata can be designed for such chemical reactions with multiple stacks. It has been proven that computational versatility can be increased by combining chemical accept/reject signatures and quantum automata models.

at July 09, 2020 12:00 AM UTC

Authors: Polina Rozenshtein, Giulia Preti, Aristides Gionis, Yannis Velegrakis
Download: PDF
Abstract: When searching for interesting structures in graphs, it is often important to take into account not only the graph connectivity, but also the metadata available, such as node and edge labels, or temporal information. In this paper we are interested in settings where such metadata is used to define a similarity between edges. We consider the problem of finding subgraphs that are dense and whose edges are similar to each other with respect to a given similarity function. Depending on the application, this function can be, for example, the Jaccard similarity between the edge label sets, or the temporal correlation of the edge occurrences in a temporal graph. We formulate a Lagrangian relaxation-based optimization problem to search for dense subgraphs with high pairwise edge similarity. We design a novel algorithm to solve the problem through parametric MinCut, and provide an efficient search scheme to iterate through the values of the Lagrangian multipliers. Our study is complemented by an evaluation on real-world datasets, which demonstrates the usefulness and efficiency of the proposed approach.

at July 09, 2020 11:22 PM UTC

Authors: Georg Anegg, Haris Angelidakis, Adam Kurpisz, Rico Zenklusen
Download: PDF
Abstract: There has been a recent surge of interest in incorporating fairness aspects into classical clustering problems. Two recently introduced variants of the $k$-Center problem in this spirit are Colorful $k$-Center, introduced by Bandyapadhyay, Inamdar, Pai, and Varadarajan, and lottery models, such as the Fair Robust $k$-Center problem introduced by Harris, Pensyl, Srinivasan, and Trinh. To address fairness aspects, these models, compared to traditional $k$-Center, include additional covering constraints. Prior approximation results for these models require to relax some of the normally hard constraints, like the number of centers to be opened or the involved covering constraints, and therefore, only obtain constant-factor pseudo-approximations. In this paper, we introduce a new approach to deal with such covering constraints that leads to (true) approximations, including a $4$-approximation for Colorful $k$-Center with constantly many colors---settling an open question raised by Bandyapadhyay, Inamdar, Pai, and Varadarajan---and a $4$-approximation for Fair Robust $k$-Center, for which the existence of a (true) constant-factor approximation was also open. We complement our results by showing that if one allows an unbounded number of colors, then Colorful $k$-Center admits no approximation algorithm with finite approximation guarantee, assuming that $\mathrm{P} \neq \mathrm{NP}$. Moreover, under the Exponential Time Hypothesis, the problem is inapproximable if the number of colors grows faster than logarithmic in the size of the ground set.

at July 09, 2020 11:29 PM UTC

Authors: Loukas Georgiadis, Evangelos Kosinas
Download: PDF
Abstract: A directed graph $G=(V,E)$ is twinless strongly connected if it contains a strongly connected subgraph without any pair of antiparallel (or twin) edges. The twinless strongly connected components (TSCCs) of a directed graph $G$ are its maximal twinless strongly connected subgraphs. These concepts have several diverse applications, such as the design of telecommunication networks and the structural stability of buildings. A vertex $v \in V$ is a twinless strong articulation point of $G$, if the deletion of $v$ increases the number of TSCCs of $G$. Here, we present the first linear-time algorithm that finds all the twinless strong articulation points of a directed graph. We show that the computation of twinless strong articulation points reduces to the following problem in undirected graphs, which may be of independent interest: Given a $2$-vertex-connected (biconnected) undirected graph $H$, find all vertices $v$ that belong to a vertex-edge cut pair, i.e., for which there exists an edge $e$ such that $H \setminus \{v,e\}$ is not connected. We develop a linear-time algorithm that not only finds all such vertices $v$, but also computes the number of edges $e$ such that $H \setminus \{v,e\}$ is not connected. This also implies that for each twinless strong articulation point $v$, that is not a strong articulation point in a strongly connected digraph $G$, we can compute the number of TSCCs in $G \setminus v$. We note that the problem of computing all vertices that belong to a vertex-edge cut pair can be solved in linear-time by exploiting the structure of $3$-vertex connected (triconnected) components of $H$, represented by an SPQR tree of $H$. Our approach, however, is conceptually simple and thus likely to be more amenable to practical implementations.

at July 09, 2020 12:00 AM UTC

Authors: David P. Woodruff, Amir Zandieh
Download: PDF
Abstract: To accelerate kernel methods, we propose a near input sparsity time algorithm for sampling the high-dimensional feature space implicitly defined by a kernel transformation. Our main contribution is an importance sampling method for subsampling the feature space of a degree $q$ tensoring of data points in almost input sparsity time, improving the recent oblivious sketching method of (Ahle et al., 2020) by a factor of $q^{5/2}/\epsilon^2$. This leads to a subspace embedding for the polynomial kernel, as well as the Gaussian kernel, with a target dimension that is only linearly dependent on the statistical dimension of the kernel and in time which is only linearly dependent on the sparsity of the input dataset. We show how our subspace embedding bounds imply new statistical guarantees for kernel ridge regression. Furthermore, we empirically show that in large-scale regression tasks, our algorithm outperforms state-of-the-art kernel approximation methods.

at July 09, 2020 11:36 PM UTC

Authors: Miika Hannula, Juha Kontinen, Martin Lück, Jonni Virtema
Download: PDF
Abstract: Second-order Boolean logic is a generalization of QBF, whose constant alternation fragments are known to be complete for the levels of the exponential time hierarchy. We consider two types of restriction of this logic: 1) restrictions to term constructions, 2) restrictions to the form of the Boolean matrix. Of the first sort, we consider two kinds of restrictions: firstly, disallowing nested use of proper function variables, and secondly stipulating that each function variable must appear with a fixed sequence of arguments. Of the second sort, we consider Horn, Krom, and core fragments of the Boolean matrix. We classify the complexity of logics obtained by combining these two types of restrictions. We show that, in most cases, logics with k alternating blocks of function quantifiers are complete for the kth or (k-1)th level of the exponential time hierarchy. Furthermore, we establish NL-completeness for the Krom and core fragments, when k=1 and both restrictions of the first sort are in effect.

at July 09, 2020 12:00 AM UTC

Authors: Toshiki Saitoh, Ryo Yoshinaka, Hans L. Bodlaender
Download: PDF
Abstract: For a graph class $\mathcal{C}$, the $\mathcal{C}$-\textsc{Edge-Deletion} problem asks for a given graph $G$ to delete the minimum number of edges from $G$ in order to obtain a graph in $\mathcal{C}$. We study the $\mathcal{C}$-\textsc{Edge-Deletion} problem for $\mathcal{C}$ the permutation graphs, interval graphs, and other related graph classes. It follows from Courcelle's Theorem that these problems are fixed parameter tractable when parameterized by treewidth. In this paper, we present concrete FPT algorithms for these problems. By giving explicit algorithms and analyzing these in detail, we obtain algorithms that are significantly faster than the algorithms obtained by using Courcelle's theorem.

at July 09, 2020 11:26 PM UTC

Authors: Huairui Chu, Mingyu Xiao, Zhe Zhang
Download: PDF
Abstract: We show that the CNF satisfiability problem can be solved $O^*(1.2226^m)$ time, where $m$ is the number of clauses in the formula, improving the known upper bounds $O^*(1.234^m)$ given by Yamamoto 15 years ago and $O^*(1.239^m)$ given by Hirsch 22 years ago. By using an amortized technique and careful case analysis, we successfully avoid the bottlenecks in previous algorithms and get the improvement.

at July 09, 2020 12:00 AM UTC

To put it mildly, 2020 is not shaping up to be a great year, so it is worthwhile to emphasize the good news, wherever we may find them.

Karlin, Klein, and Oveis Gharan have just posted a paper in which, at long last, they improve over the 1.5 approximation ratio for metric TSP which was achieved, in 1974, by Christofides. For a long time, it was suspected that the Held-Karp relaxation of metric TSP had an approximation ratio better than 1.5, but there was no viable approach to prove such a result. In 2011, two different approaches were developed to improve 1.5 in the case of shortest-path metrics on unweighted graphs: one by Oveis Gharan, Saberi and Singh and one by Momke and Svensson. The algorithm of Karlin, Klein and Oveis Gharan (which does not establish that the Held-Karp relaxation has an integrality gap better than 1.5) takes as a starting point ideas from the work of Oveis Gharan, Saberi and Singh.

Yesterday, Bloom and Sisask posted a paper in which they show that there is a constant c>0 such that, for every sufficiently large N, if A \subseteq \{1,\ldots, N \} has cardinality at least N / (\log N)^{1+c}, then A contains a non-trivial length-3 arithmetic progression. Without context, this may seem like a strange result to get excited about, but it sits at the nexus of a number of fundamental results and open questions in combinatorics. Gil Kalai has written an excellent post telling the story of this problem, so instead of writing a worse version of it I will refer the reader to Gil’s blog.

Back to bad news, the day after Harvard announced that it would deliver courses online only in 2020-21, the Trump administration announced that it would void student visas of students who are not attending in-person classes in 2020-21. Back to good news, Harvard and MIT announced that they will sue the federal government over this, and other universities, including the University of California system, are planning similar responses. Apart from the action, I was really heartened to read MIT’s President statement on the matter (thanks to Vinod Vaikuntanathan for bringing it my attention) which is worth reproducing:

To the members of the MIT community,

On Monday, in a surprising development, a division of Immigration and Customs Enforcement announced that it will not permit international students on F-1 visas to take a full online course load this fall while studying in the United States. As I wrote yesterday, this ruling has potentially serious implications for MIT’s international students and those enrolled at institutions across the country.

This morning, in response, MIT and Harvard jointly filed suit against ICE and the US Department of Homeland Security in federal court in Massachusetts. In the lawsuit, we ask the court to prevent ICE and DHS from enforcing the new guidance and to declare it unlawful.

The announcement disrupts our international students’ lives and jeopardizes their academic and research pursuits. ICE is unable to offer the most basic answers about how its policy will be interpreted or implemented. And the guidance comes after many US colleges and universities either released or are readying their final decisions for the fall – decisions designed to advance their educational mission and protect the health and safety of their communities.

Our international students now have many questions – about their visas, their health, their families and their ability to continue working toward an MIT degree. Unspoken, but unmistakable, is one more question: Am I welcome?

At MIT, the answer, unequivocally, is yes.

MIT’s strength is its people – no matter where they come from. I know firsthand the anxiety of arriving in this country as a student, excited to advance my education, but separated from my family by thousands of miles. I also know that welcoming the world’s brightest, most talented and motivated students is an essential American strength.

While we pursue legal protections for our international students, we will continue to stay in close touch with them through email and updates on the International Students Office’s website. If you have questions, you may write to the ISO at


L. Rafael Reif

This way of talking like a human being, and like you actually care about the matter at hand, is a big contrast with the robotic statements that usually come out of campus leadership. The corresponding message from UC Berkeley’s Chancellor is the way such statements usually are like:

Dear campus community,

Yesterday, the Department of Homeland Security issued new guidance to universities related to international students and fall instruction requirements. The guidance is deeply concerning: it could potentially force the return of many international students to their home countries if they are unable to find the appropriate balance of in-person and remote classes. These requirements run counter to our values of being an inclusive community and one that has a long tradition of welcoming international students from around the globe. International students enrich campus life immeasurably, through their participation in classes, research collaborations and extracurricular activities.

We will explore all of our options, legal and otherwise, to counter the deleterious effects of these policies that imp act the ability for international students to achieve their academic goals. It is not only important for UC Berkeley but for all of higher education across the U.S. to take every step possible to mitigate these policies that send a message of exclusion to our international community of scholars. We will partner with our professional associations to advocate for sound legislation that continues to support international educational exchange.

More immediately, we are working with colleagues across our campus to identify a path that will allow us to comply with these requirements while ensuring a healthy learning environment, and paying attention to the needs of our international students. We recognize the concern and anxiety these new rules have created, and we are moving quickly to ensure that we offer the proper balance of online and in-person classes so that our students can remain in the U.S. and satisfy their visa requirements, and that those students residing outside the U.S. can maintain their enrollment status.

We expect to announce more details soon. Should you have any questions, please contact the Berkeley International Office at


Carol Christ

Lisa Alvarez-Cohen
Vice Provost for Academic Planning and Senior International Officer

It is interesting to think about where this difference in tone is coming from. Carol Christ is a renown humanities scholar who, I suppose, writes well. She comes across as charismatic and caring, and she is definitely straight-talking in person. Probably, as for everything else, Berkeley has a byzantine process to create announcements and press releases, and if Stephen Colbert was the Chancellor of UC Berkeley, after a couple of weeks on the job he would sound just as deeply concerned and just as into exploring all options, while meanwhile working to identify a path and paying attention about something that is totally fucked up and needs action today.

Which brings me to all the statements in support of Black Lives Matter that have been coming out of every scholarly institution in the last few days. While their messages are generally unobjectionable, there is a certain sameness to their form (“we say their names…”, “we will do the work…”, “we see you…”) and they don’t sound at all like the way the people putting them out speak. This has complicated causes, including the fact that many such statements came out of letter-writing campaigns that demanded statements in a very specific way, without leaving a lot of room for individual expression. The association of American Poets, for example, put out a statement of solidarity with the Black community; in response, a letter with 1800 signatories claimed that it was too weak a statement and that it was, in fact, itself an act of violence against Black people; several resignations followed. The Board of the National Book Critics Circle was working on such a statement, and the work devolved into acrimony and several rounds of “I am outraged and I resign,” “no I am outraged at your outrage and I resign, “well then I am outraged that you are outraged at her outrage” until almost the whole board was gone in a “sequence of events [that] was bizarre and bloody in an end-of-a-Tarantino-movie way.”

Also, people in America talk about race the way UC Berkeley administrators talk about anything, that is extremely carefully and vacuously. But, back to the statements about foreign students, the difference between the administrative cultures at MIT and Berkeley is not the only difference between the statements of Reif and Christ: clearly a big difference is that Reif is an immigrant himself. When Trayvon Martin was killed, Obama talked about the killing in a way that was very different, and much more meaningful, than other politicians: if I had a son, Obama said, he would look a lot like Trayvon. If there were more people of color in positions of academic leadership, I think that we would have seen an academic response to Black Lives Matter that would have been less fearful, dogmatic and robotic and more meaningful and productive. Or perhaps we would have all ended up like the National Book Critic Circle, it’s hard to say.

by luca at July 08, 2020 09:11 PM UTC


Thomas Bloom and Olof Sisask: Breaking the logarithmic barrier in Roth’s theorem on arithmetic progressions,    arXiv:200703528


Once again Extraordinary news regarding Roth Theorem! (I thank Ryan Alweiss for telling me about it and Rahul Santhanam for telling me about Thomas and Olof’s earlier attempts.)

Suppose that R_n  is a subset of \{1,2,\dots, n \} of maximum cardinality not containing an arithmetic progression of length 3. Let r_3(n)=|R_n|. Roth proved that r_3(n)=o(n).

A few days ago Thomas Bloom and Olof Sisask proved that for some c>0

r_3(n) \le \frac {n}{\log^{1+c} n}

This is an extraordinary result!!! I will tell you a little more about it below.

Ron Graham

I just heard yesterday the sad news that Ron Graham passed away. Ron was an extraordinary mathematician and an extraordinary person. I first met Ron in Montreal in 1978 and we met many times since then. Ron will be dearly missed.

Back to the new bounds on Roth’s theorem

From an abstract of a lecture by Thomas and Olof: “This is the integer analogue of a result of Bateman and Katz for the model setting of vector spaces over a finite field, and the proof follows a similar structure.”

A catchy (weaker) formulation which goes back to Erdos and Turan is:

Let a_n be a sequence of integers so that \sum \frac{1}{a_n} = \infty, then the sequence contains an arithmetic progression of length three!!

Bloom and Sisask’s result implies, of course, Van der Korput’s result that the primes contain infinitely many 3-terms arithmetic progression as well as Green’s 2005 result asserting it for every  dense subset of primes.

Szemeredi’s celabrated result extended Roth’s theorem to arithmetic progression of any fixed size, and Green-Tao celebrated 2008 result asserts that the primes (or a dense subsets of primes) contain arithmetic progression of any length. (The case of 3-term AP is so far much simpler for all the results mentioned below.)


A little more about the history of the problem below the fold

Roth, Szemeredi, Heath-Brown, and Bourgain; Salem-Spencer and Behrend.

Let’s wrire $r_3(n)=n/g(n)$. Roth proved that g(n) \ge \log\log n. Szemeredi and Heath-Brown improved it to g(n) \ge \log^c n for some $latex c>0$ (Szemeredi’s argument gave c=1/4.) Jean Bourgain improved the bound in 1999 to c=1/2 and in 2008 to c=3/4 (up to lower order terms).

Erdös and Turan who posed the problem in 1936 described a set not containing an arithmetic progression of size n^c.  Salem and Spencer improved this bound to g(n) \le e^{logn/ loglogn}. Behrend’s upper bound from 1946 is of the form g(n) \le e^{C\sqrt {\log n}}. A small improvement was achieved  by Elkin and is discussed here.  (Look also at the remarks following that post.)


In 2010 Tom Sanders was able to refine Bourgain’s argument and proved that g(n) \ge (\log n)^{3/4}. A few month later  Tom have managed to reach the logarithmic barrier and to prove that

g(n) \ge (\log n)/(\log \log n)^{6}.

We reported about this outstanding achievement in this blog post and quoted from his paper: “There are two main new ingredients in the present work: the first is a way of transforming sumsets introduced by Nets Katz and Paul Koester in 2008, and the second is a result on the L_p-invariance of convolutions due to Ernie Croot and Olof Sisask (2010).”

The exponent 6 for the loglog term was improved to 4 by Thomas Bloom and recently to 3 by Thomas Schoen in his paper: Improved bound in Roth’s theorem on arithmetic progressions. Schoen uses ingredients from Bateman and Katz’s work (see below).

Cap sets  – Meshulam

A closely related problem  in \Gamma=\{0,1,2\}^n. It is called the cap set problem. A subset of \Gamma is called a cap set if it contains no arithmetic progression of size three or, alternatively, no three vectors that sum up to 0(modulo 3). If $latex A_n$  is a cap set of maximum size in \Gamma we can ask how the function f(n)=|A_n| behaves. In 1995 Roy Meshulam proved, using Roth’s argument, that f(n) \le 3^n/n . Edell found an example of a cap set of size 2.2^n.  Again the gap is exponential.  What is the truth? Improving Meshulam’s result may be closely related to crossing the \log n barrier for Roth’s theorem. In 2007 Tom Sanders  managed to achieve it , not for the cup problem, but for a related problem over Z/4Z.

Bateman and Katz

In 2011, Michael Bateman and Nets Katz improved, after many years of attempts by many, the Roth-Meshulam bound.  They proved using Fourier methods that f(n) \le 3^n/n^{1+c} for some c>0! This was very exciting.   See these two posts on Gowers’s blog (I,II). This raised the question if the new method allows breaking the  logarithmic barrier for Roth’s theorem.

Polymath 6

Tim Gowers proposed in 2011 polymath6  to try to break the logarithmic barrier for Roth based on the Bateman-Katz breakthrough. (Here is the wiki; and a related post by Sanders, and a document by Katz) This project did not get off the ground. We can regard the news as giving support that the polymath6 project was timely and of an appropriate level, and also as giving some support to an advantage of the conventional way of doing mathematics compared to the polymath way.

 Croot-Lev-Pach-Ellenberg-Gijswijt solution to the Cap set problem via the polynomial method

Next and quite recently came a startling development  – the Croot-Lev-Pach-Ellenberg-Gijswijt capset bound f(n) \le 2.756^n. (Croot, Lev, and Pach gave an exponential improvement for the Z/4Z case (see this post) and a few weeks later Ellenberg and Gijswijt used the method for the Z/3Z case (see this post).)

A natural question that many people asked was how this development relates to improving the bounds for Roth perhaps even towards the Behrend bound. We discussed it a little over here and in other places. This is still an interesting possibility.

The new result: Bloom and Sisask

However, just a few months few years after the Bateman-Katz result have become obsolete for the cap-set problem, the Bateman-Katz method prevailed in this wonderful breakthrough of Bloom and Sisask giving g(n) \ge \log^c n for c>1.


An old post and poll

In an old post we asked: “How does r_3(n) behave? Since we do not really know, will it help talking about it? Can we somehow look beyond the horizon and try to guess what the truth is? (I still don’t know if softly discussing this or other mathematical problems is a fruitful idea, but it can be enjoyable.)” We even had a poll collecting people’s predictions about r_3(n).  Somewhat surprisingly 18.18% of answerers predicted that r_3(n) behaves like \frac{1}{(\log n)^c} for some c<1.


by Gil Kalai at July 08, 2020 04:00 PM UTC

I want to dive into some classic results in robust control and try to relate them to our current data-driven mindset. I’m going to try to do this in a modern way, avoiding any frequency domain analyses.

Suppose you want to solve some optimal control problem: you spend time modeling the dynamics of your system, how it responds to stimuli, and which objectives you’d like to maximize and constraints you must adhere to. Each of these modeling decisions explicitly encodes both your beliefs about reality and your mental criteria of success and failure. Robustness aims to quantify the effects of oversight on your systems behavior. Perhaps your model wasn’t accurate enough, or perhaps you forgot to include some constraint in your objective. What are the downstream consequences?

In the seventies, it was believed that optimization-based frameworks for control had “natural robustness.” The solutions of optimal control problems were often robust to phenomena not explicitly modeled by the engineer. As a simple example, suppose you have an incorrect model of the dynamical system you are trying to steer. How accurate do you need to be in order for this policy to be reasonably successful?

To focus in on this, let’s study the continuous-time linear quadratic regulator (LQR). I know I’ve been arguing that we should be moving away from LQR in order to understand the broader challenges in learning and control, but the LQR baseline has so many lessons to teach us. Please humor me again for a few additional reasons: First, most of the history I want to tell arises from studying continuous-time LQR in the 1970s. It’s worth understanding that history with a modern perspective. Second, LQR does admit elegant closed form formulae that are helpful for pedagogy, and they are particularly nice in continuous time.

LQR in Continuous Time

Suppose we have a dynamical system that we model as an ODE:

Here, as always, $x_t$ is the state, $u_t$ is the control input signal, and $A$ and $B$ are matrices of appropriate dimensions. The goal of the continuous-time LQR problem is to minimize the cost functional

over all possible control inputs $u_t$. Let’s assume for simplicity that $Q$ is a positive semidefinite matrix and $R$ is positive definite.

The optimal LQR policy is static state feedback: there is some matrix $K$ such that

for all time. $K$ has a closed form solution that can be found by solving a continuous algebraic Riccatti equation (CARE) for a matrix $P$:

and then setting

Importantly, we take the solution of the CARE where $P$ that is positive definite. If a positive definite solution of the CARE exists, then it is optimal for continuous time LQR. There are a variety of ways to prove this condition is sufficient, including an appeal to dynamic programming in continuous time. A simple argument I like uses the quadratic structure of LQR to derive the necessity of the CARE solution. (I found this argument in Joao Hespansha’s book).

Regardless, showing a positive definite CARE solution exists takes considerably more work. It suffices to assume that the pair $(A,B)$ is controllable and the pair $(Q,A)$ is detectable. But proving these conditions are sufficient requires a lot of manipulation of linear algebra, and I don’t think I could cleanly distill a proof into a blog post. I mention this just to reiterate that while LQR is definitely the simplest problem to study, its analysis in continuous time on an infinite time horizon is nontrivial. LQR is not really “easy.” It’s merely the easiest problem in a space of rather hard problems.

Gain margins

Let’s now turn to robustness. Suppose there is a mismatch between our modeled dynamics and reality. For example, what if the actual system is

for some matrix $B_\star$. Such model mismatches occur all the time. For example, in robotics, we can send a signal “u” to the joint of some robot. This would be some voltage that would need to be linearly transformed into some torque by a motor. It requires a good deal of calibration to make sure that the output of the motor is precisely the force dictated by the voltage output from our controller. Is there a way to guarantee some leeway in the mapping from voltage to torque?

An attractive feature of LQR is that we can quantify precisely how much slack we have directly from the CARE solution. We can use the solution of the CARE to build a Lyapunov function to guarantee stability of the system. Recall that a Lyapunov function is a function $V$ that maps states to real numbers, is nonnegative everywhere, is equal to $0$ only when $x=0$, and whose value is strictly decreasing along any trajectory of a dynamical system. In equations:

If you have a Lyapunov function, then all trajectories must converge to $x=0$: if you are at any nonzero state, the value of $V$ will decrease. If you are at $0$, then you will be at a global minimum of $V$ and hence can’t move to any other state.

Let $P$ be the solution of the CARE and let’s posit that $V(x) = x^\top P x$ is a Lyapunov function. Since $P$ is positive definite, we have $V(x)\geq 0$ and $V(x)=0$ if and only if $x=0$. To prove that the derivative of the Lyapunov function is negative, we can first compute the derivative:

Note that it is sufficient to show that $(A-B_\star K)^\top P + P(A-B_\star K)$ is a negative definite matrix as this would prove that the derivative is negative for all nonzero $x_t$. To prove that this expression is negative definite, let’s apply a bit of algebra to generate some sufficient conditions. Using the definition of $K$ and the fact that $P$ solves the CARE gives the following chain of equalities:

Here, the first equality is simply expanding the matrix product. The second equation uses the fact that $P$ is a solution to the CARE. The third equality uses the definition of $K$. The final equation is an algebraic rearrangement.

With this final expression, we can cook up a huge number of conditions under which we get “robustness for free.” First, consider the base case where $B=B_\star$. Since $R$ is positive definite and $Q$ is positive semidefinite, the entire expression is negative definite, and hence we have proven the system is stable.

Second, there is a famous result that LQR has “large gain margins.” The gain margin of a control system is an interval $(t_0,t_1)$ such that for all $t$ in this interval, our control system is stable with the controller $tK$. Another way of thinking about the gain margin is to assume that $B_\star = tB$, and to find the largest interval such that the system $(A,B_\star)$ is stabilized by a control policy $K$. For LQR, there are very large margins: if we plug in the identity $B_\star=tB$, we find that $x^\top P x$ is a Lyapunov function provided that $t \in (\tfrac{1}{2},\infty)$. LQR control turns out to be robust to a wide range of perturbations to the matrix $B$. Intuitively, it makes sense that if we would like to drive a signal to zero and have more control authority than we anticipated then our policy will still drive the system to zero. This is the range of $t \in [1,\infty)$. The other part of the interval is perhaps more interesting: for the LQR problem, even if we only have half of the control authority we had planned for, we still will successfully stabilize our system from any initial condition.

In discrete time, you can derive similar formulae with essentially the same argument. Unfortunately, the expressions are not as elegant. Also, note that you cannot expect infinite gain margins in discrete time. In continuous time a differential equation $\dot{x}_t = M x_t$ is stable if all of the eigenvalues of $M$ have negative real parts. In discrete time, you need all of the eigenvalues to have magnitude less than $1$. For almost any random set triple $(A,B,K)$, $A-t B K$ is going to have large eigenvalues for $t$ large enough. Nonetheless, you can certainly derive analogous conditions as to which errors are tolerable.

There are a variety of other conditions that can be derived from our matrix expression. Most generally, the control system will be stable provided that

The LQR gain margins fall out naturally from this expression when we assume $B_\star = t B$. However, we can guarantee much more general robustness using this inequality. For example, if we assume that $B_\star = BM$ for some square matrix $M$, then $K$ stabilizes the pair $(A,B_\star)$ if all of the eigenvalues of $M+M^\top $ are greater than $1$.

Perhaps more in line with what we do in machine learning, suppose we are able to collect a lot of data, do some uncertainty quantification, and guarantee a bound $|B-B_\star|_2<\epsilon$. Then as long as

we will be guaranteed stable execution. This expression depends on the matrices $P$, $Q$, and $R$, so it has a different flavor of the infinite gain margin conditions which held irrespective of the dynamics or the cost. Moreover, if $P$ has large eigenvalues, then we are only able to guarantee safe execution for small perturbations to $B$. This foreshadows issues I’ll dive into in later posts. I want to flag here that these calculations reveal some fragilities of LQR: While the controller is always robust to perturbations along the direction of the matrix $B$, you can construct examples where the system is highly sensitive to tiny perturbations orthogonal to $B$. I’ll return in the next post to start to unpack how optimal control has some natural robustness, but it has natural fragility as well.

at July 08, 2020 12:00 AM UTC

The UCSD CS department created a new postdoc program, modeled after the CI fellows program. To apply, you need to identify a UCSD theory faculty as a mentor, contact them and see if they are interested. If so, both you and the mentor need to apply. The deadline for both applications is Aug 7, so time is of the essence.


by shacharlovett at July 07, 2020 09:36 PM UTC

The Forrelation problem, first introduced by Aaronson [AA10] and Aaronson and Ambainis [AA15], is a well studied computational problem in the context of separating quantum and classical computational models. Variants of this problem were used to give tight separations between quantum and classical query complexity [AA15]; the first separation between poly-logarithmic quantum query complexity and bounded-depth circuits of super-polynomial size, a result that also implied an oracle separation of the classes BQP and PH [RT19]; and improved separations between quantum and classical communication complexity [GRT19]. In all these separations, the lower bound for the classical model only holds when the advantage of the protocol (over a random guess) is more than $\approx 1/\sqrt{N}$, that is, the success probability is larger than $\approx 1/2 + 1/\sqrt{N}$. This is unavoidable as $\approx 1/\sqrt{N}$ is the correlation between two coordinates of an input that is sampled from the Forrelation distribution, and hence there are simple classical protocols that achieve advantage $\approx 1/\sqrt{N}$, in all these models. To achieve separations when the classical protocol has smaller advantage, we study in this work the XOR of $k$ independent copies of (a variant of) the Forrelation function (where $k\ll N$). We prove a very general result that shows that any family of Boolean functions that is closed under restrictions, whose Fourier mass at level $2k$ is bounded by $\alpha^k$ (that is, the sum of the absolute values of all Fourier coefficients at level $2k$ is bounded by $\alpha^k$), cannot compute the XOR of $k$ independent copies of the Forrelation function with advantage better than $O\left(\frac{\alpha^k}{{N^{k/2}}}\right)$. This is a strengthening of a result of [CHLT19], that gave a similar statement for $k=1$, using the technique of [RT19]. We give several applications of our result. In particular, we obtain the following separations: Quantum versus Classical Communication Complexity: We give the first example of a partial Boolean function that can be computed by a simultaneous-message quantum protocol with communication complexity $\mbox{polylog}(N)$ (where Alice and Bob also share $\mbox{polylog}(N)$ EPR pairs), and such that, any classical randomized protocol of communication complexity at most $\tilde{o}(N^{1/4})$, with any number of rounds, has quasipolynomially small advantage over a random guess. Previously, only separations where the classical protocol has polynomially small advantage were known between these models [G16, GRT19]. Quantum Query Complexity versus Bounded Depth Circuits: We give the first example of a partial Boolean function that has a quantum query algorithm with query complexity $\mbox{polylog}(N)$, and such that, any constant-depth circuit of quasipolynomial size has quasipolynomially small advantage over a random guess. Previously, only separations where the constant-depth circuit has polynomially small advantage were known [RT19].

at July 07, 2020 04:03 PM UTC

If there were ever a time for liberals and progressives to put aside their internal squabbles, you’d think it was now. The President of the United States is a racist gangster, who might not leave if he loses the coming election—all the more reason to ensure he loses in a landslide. Due in part to that gangster’s breathtaking incompetence, 130,000 Americans are now dead, and the economy tanked, from a pandemic that the rest of the world has under much better control. The gangster’s latest “response” to the pandemic has been to disrupt the lives of thousands of foreign scientists—including several of my students—by threatening to cancel their visas. (American universities will, of course, do whatever they legally can to work around this act of pure spite.)

So how is the left responding to this historic moment?

This weekend, 536 people did so by … trying to cancel Steven Pinker, stripping him of “distinguished fellow” and “media expert” status (whatever those are) in the Linguistics Society of America for ideological reasons.

Yes, Steven Pinker: the celebrated linguist and cognitive scientist, author of The Language Instinct and How the Mind Works (which had a massive impact on me as a teenager) and many other books, and academic torch-bearer for the Enlightenment in our time. For years, I’d dreaded the day they’d finally come for Steve, even while friends assured me my fears must be inflated since, after all, they hadn’t come for him yet.

I concede that the cancelers’ logic is impeccable. If they can get Pinker, everyone will quickly realize that there’s no longer any limit to who they can get—including me, including any writer or scientist who crosses them. If you’ve ever taken, or aspire to take, any public stand riskier than “waffles are tasty,” then don’t delude yourself that you’ll be magically spared—certainly not by your own progressive credentials.

I don’t know if the “charges” against Pinker merit a considered response (Pinker writes that some people wondered if they were satire). For those who care, though, here’s a detailed and excellent takedown by the biologist and blogger Jerry Coyne, and here’s another by Barbara Partee.

So, it seems Pinker once used the term “urban crime,” which can be a racist dogwhistle—except that in this case, it literally meant “urban crime.” Pinker once referred to Bernie Goetz, whose 1984 shooting of four robbers in the NYC subway polarized the US at the time, as a “mild-mannered engineer,” in a sentence whose purpose was to contrast that description with the ferocity of Goetz’s act. Pinker “appropriated” the work of a Black scholar, Harvard Dean Lawrence Bobo, which apparently meant approvingly citing him in a tweet. Etc. Ironically, it occurred to me that the would-be Red Guards could’ve built a much stronger case against Pinker had they seriously engaged with his decades of writing—writing that really does take direct aim at their whole worldview, they aren’t wrong about that—rather than superficially collecting a few tweets.

What Coyne calls the “Purity Posse” sleazily gaslights its readers as follows:

We want to note here that we have no desire to judge Dr. Pinker’s actions in moral terms, or claim to know what his aims are. Nor do we seek to “cancel” Dr. Pinker, or to bar him from participating in the linguistics and LSA communities (though many of our signatories may well believe that doing so would be the right course of action).

In other words: many of us “may well believe” that Pinker’s scientific career should be ended entirely. But magnanimously, for now, we’ll settle for a display of our power that leaves the condemned heretic still kicking. So don’t accuse us of wanting to “cancel” anyone!

In that same generous spirit:

Though no doubt related, we set aside questions of Dr. Pinker’s tendency to move in the proximity of what The Guardian called a revival of “scientific racism”, his public support for David Brooks (who has been argued to be a proponent of “gender essentialism”), his expert testimonial in favor of Jeffrey Epstein (which Dr. Pinker now regrets), or his dubious past stances on rape and feminism.

See, even while we make these charges, we disclaim all moral responsibility for making them. (For the record, Alan Dershowitz asked Pinker for a linguist’s opinion of a statute, so Pinker provided it; Pinker didn’t know at the time that the request had anything to do with Epstein.)

Again and again, spineless institutions have responded to these sorts of ultimatums by capitulating to them. So I confess that the news about Pinker depressed me all weekend. The more time passed, though, the more it looked like the Purity Posse might have actually overplayed its hand this time. Steven Pinker is not weak prey.

Let’s start with what’s missing from the petition: Noam Chomsky pointedly refused to sign. How that must’ve stung his comrades! For that matter, virtually all of the world’s well-known linguists refused to sign. Ray Jackendoff and Michel DeGraff were originally on the petition, but their names turned out to have been forged (were others?).

But despite the flimsiness of the petition, suppose the Linguistics Society of America caved. OK, I mused, how many people have even heard of the Linguistics Society of America, compared to the number who’ve heard of Pinker or read his books? If the LSA expelled Pinker, wouldn’t they be forever known to the world only as the organization that had done that?

I’m tired of the believers in the Enlightenment being constantly on the defensive. “No, I’m not a racist or a misogynist … on the contrary, I’ve spent decades advocating for … yes, I did say that, but you completely misunderstood my meaning, which in context was … please, I’m begging you, can’t we sit and discuss this like human beings?”

It’s time for more of us to stand up and say: yes, I am a center-left extremist. Yes, I’m an Enlightenment fanatic, a radical for liberal moderation and reason. If liberalism is the vanilla of worldviews, then I aspire to be the most intense vanilla anyone has ever tasted. I’m not a closeted fascist. I’m not a watered-down leftist. I’m something else. I consider myself ferociously anti-racist and anti-sexist and anti-homophobic and pro-downtrodden, but I don’t cede to any ideological faction the right to dictate what those terms mean. The world is too complicated, too full of ironies and surprises, for me to outsource my conscience in that way.

Enlightenment liberalism at least has the virtue that it’s not some utopian dream: on the contrary, it’s already led to most of the peace and prosperity that this sorry world has ever known, wherever and whenever it’s been allowed to operate. And while “the death of the Enlightenment” gets proclaimed every other day, liberal ideals have by now endured for centuries. They’ve outlasted kings and dictators, the Holocaust and the gulag. They certainly have it within them to outlast some online sneerers.

Yes, sometimes martyrdom (or at least career martyrdom) is the only honorable course, and yes, the childhood bullies did gift me with a sizeable persecution complex—I’ll grant the sneerers that. But on reflection, no, I don’t want to be a martyr for Enlightenment values. I want Enlightenment values to win, and not by vanquishing their opponents but by persuading them. As Pinker writes:

A final comment: I feel sorry for the signatories. Moralistic dudgeon is a shallow and corrosive indulgence, & policing the norms of your peer group a stunting of the intellect. Learning new ideas & rethinking conventional wisdom are deeper pleasures … and ultimately better for the world. Our natural state is ignorance, fallibility, & self-deception. Progress comes only from broaching & evaluating ideas, including those that feel unfamiliar and uncomfortable.

Spend a lot of time on Twitter and Reddit and news sites, and it feels like the believers in the above sentiment are wildly outnumbered by the self-certain ideologues of all sides. But just like the vanilla in a cake can be hard to taste, so there are more Enlightenment liberals than it seems, even in academia—especially if we include all those who never explicitly identified that way, because they were too busy building or fixing or discovering or teaching, and because they mistakenly imagined that if they just left the Purity Posse alone then the Posse would do likewise. If that’s you, then please ask yourself now: what is my personal break-point for speaking up?

by Scott at July 07, 2020 02:21 PM UTC

Sublinear algorithms in times of social distancing…always something exciting. This month we have a slew of results on sublinear algorithms for classic graph problems.

(Ed: We have removed a previously posted paper due to correctness concerns raised by our readers. Please look at the post on our paper policy.)

Palette Sparsification Beyond (∆ + 1) Vertex Coloring by Noga Alon and Sepehr Assadi (arXiv). A basic fact from graph theory is that any graph has a \((\Delta+1)\)-coloring, where \(\Delta\) is the maximum degree. Followers of property testing are likely familiar with a fantastic result of Assadi-Chen-Khanna (ACK) on sublinear algorithms, that gives a sublinear algorithm for \((\Delta+1)\)-coloring. (The running time is \(\widetilde{O}(n^{3/2})\), where \(n\) is the number of vertices.) The key tool is a palette sparsification theorem: suppose each vertex is given a “palette” of \((\Delta+1)\) colors. Each vertex randomly sparsifies its palette by sampling \(O(\log n)\) colors, and is constrained to only use these colors. Remarkably, whp the graph can still be properly colored. This tool is at the heart of sublinear time/space algorithms for coloring. This paper gives numerous extensions to this theorem, where one can tradeoff a larger initially palette for a smaller final sample. Another extension is for triangle-free graphs, where the initial palette is of size \(O(\Delta/\ln \Delta)\) and the sample is of size \(O(\Delta^\gamma + \sqrt{\ln n})\) (for parameter \(\gamma < 1\). This leads to an \(O(n^{3/2 + \gamma})\) time algorithm for \(O(\Delta/\ln \Delta)\) coloring of triangle-free graphs.

When Algorithms for Maximal Independent Set and Maximal Matching Run in Sublinear-Time by Sepehr Assadi and Shay Solomon (arXiv). Taking off from sublinear coloring algorithms, one can ask if there are sublinear time algorithms for Maximal Independent Set (MIS) and Maximal Matching (MM). Alas, ACK prove that this is impossible. This paper investigates when one can get a sublinear time algorithm for these problems. For graph \(G\), let \(\beta(G)\) be the “neighborhood independence number”, the size of the largest independent set contained in a vertex neighborhood. This papers shows that both problems can be solved in \(\widetilde{O}(n \beta(G))\) time. Examples of natural classes of graphs where \(\beta(G)\) is constant: line graphs and unit-disk graphs. An interesting aspect is that MIS algorithm is actually deterministic! It’s the simple marking algorithm that rules out neighborhoods of chosen vertices; the analysis shows that not much time is wasted in remarking the same vertex.

Sublinear Algorithms and Lower Bounds for Metric TSP Cost Estimation by Yu Chen, Sampath Kannan, and Sanjeev Khanna (arXiv). This paper studies sublinear algorithms for the metric TSP problem. The input is an \(n \times n\) distance matrix. One can 2-approximate the TSP by computing the MST, and a result of Czumaj-Sohler gives a \((1+\varepsilon)\)-approximation algorithm for the latter, running in \(O(n\varepsilon^{-O(1)})\) time. The main question is: can one beat the 2-factor approximation in sublinear time? This paper considers the graphic TSP setting, where the distance matrix corresponds to the shortest path metric of an unweighted graph. One result is a \((2-\varepsilon_0)\)-approximation algorithm (for an explicit constant \(\varepsilon_0\)) that runs in \(\widetilde{O}(n)\) time. For the important \((1,2)\) TSP setting (all distances are either 1 or 2), the paper gives a \(O(n^{1.5})\) time 1.63-approximation algorithm. Interestingly, there is a lower bound showing that \((1+\varepsilon)\)-approximations, for arbitrarily small \(\varepsilon\), cannot be achieved in \(o(n^2)\) time. One of the key tools is sublinear algorithms for estimating the maximum matching size, itself a well-studied problem in the community.

by Seshadhri at July 07, 2020 06:26 AM UTC

We study graph computations in an enhanced data streaming setting, where a space-bounded client reading the edge stream of a massive graph may delegate some of its work to a cloud service. We seek algorithms that allow the client to verify a purported proof sent by the cloud service that the work done in the cloud is correct. A line of work starting with Chakrabarti et al. (ICALP 2009) has provided such algorithms, which we call schemes, for several statistical and graph-theoretic problems, many of which exhibit a tradeoff between the length of the proof and the space used by the streaming verifier. This work designs new schemes for a number of basic graph problems---including triangle counting, maximum matching, topological sorting, and single-source shortest paths---where past work had either failed to obtain smooth tradeoffs between these two key complexity measures or only obtained suboptimal tradeoffs. Our key innovation is having the verifier compute certain nonlinear sketches of the input stream, leading to either new or improved tradeoffs. In many cases, our schemes in fact provide optimal tradeoffs up to logarithmic factors. Specifically, for most graph problems that we study, it is known that the product of the verifier's space cost $v$ and the proof length $h$ must be at least $\Omega(n^2)$ for $n$-vertex graphs. However, matching upper bounds are only known for a handful of settings of $h$ and $v$ on the curve $h \cdot v=\tilde{\Theta}(n^2)$. For example, for counting triangles and maximum matching, schemes with costs lying on this curve are only known for $(h=\tilde{O}(n^2), v=\tilde{O}(1))$, $(h=\tilde{O}(n), v=\tilde{O}(n))$, and the trivial $(h=\tilde{O}(1), v=\tilde{O}(n^2))$. A major message of this work is that by exploiting nonlinear sketches, a significant ``portion'' of costs on the tradeoff curve $h \cdot v = n^2$ can be achieved.

at July 06, 2020 05:16 PM UTC

One of the major open problems in complexity theory is proving super-logarithmic lower bounds on the depth of circuits (i.e., $\mathbf{P}\not\subseteq\mathbf{NC}^1$). Karchmer, Raz, and Wigderson (Computational Complexity 5(3/4), 1995) suggested to approach this problem by proving that depth complexity behaves “as expected” with respect to the composition of functions $f \diamond g$. They showed that the validity of this conjecture would imply that $\mathbf{P}\not\subseteq\mathbf{NC}^1$. Several works have made progress toward resolving this conjecture by proving special cases. In particular, these works proved the KRW conjecture for every outer function $f$, but only for few inner functions $g$. Thus, it is an important challenge to prove the KRW conjecture for a wider range of inner functions. In this work, we extend significantly the range of inner functions that can be handled. First, we consider the $\textit{monotone}$ version of the KRW conjecture. We prove it for every monotone inner function $g$ whose depth complexity can be lower bounded via a query-to-communication lifting theorem. This allows us to handle several new and well-studied functions such as the $s\textbf{-}t$-connectivity, clique, and generation functions. In order to carry this progress back to the $\textit{non-monotone}$ setting, we introduce a new notion of $\textit{semi-monotone}$ composition, which combines the non-monotone complexity of the outer function $f$ with the monotone complexity of the inner function $g$. In this setting, we prove the KRW conjecture for a similar selection of inner functions $g$, but only for a specific choice of the outer function $f$.

at July 06, 2020 11:22 AM UTC

GANs, originally discovered in the context of unsupervised learning, have had far reaching implications to science, engineering, and society. However, training GANs remains challenging (in part) due to the lack of convergent algorithms for nonconvex-nonconcave min-max optimization. In this post, we present a new first-order algorithm for min-max optimization which is particularly suited to GANs. This algorithm is guaranteed to converge to an equilibrium, is competitive in terms of time and memory with gradient descent-ascent and, most importantly, GANs trained using it seem to be stable.

GANs and min-max optimization

Starting with the work of Goodfellow et al., Generative Adversarial Nets (GANs) have become a critical component in various ML systems; for prior posts on GANs, see here for a post on GAN architecture, and here and here for posts which discuss some of the many difficulties arising when training GANs.

Mathematically, a GAN consists of a generator neural network $\mathcal{G}$ and a discriminator neural network $\mathcal{D}$ that are competing against each other in a way that, together, they learn the unknown distribution from which a given dataset arises. The generator takes a random “noise” vector as input and maps this vector to a sample; for instance, an image. The discriminator takes samples – “fake” ones produced by the generator and “real” ones from the given dataset – as inputs. The discriminator then tries to classify these samples as “real” or “fake”. As a designer, we would like the generated samples to be indistinguishable from those of the dataset. Thus, our goal is to choose weights $x$ for the generator network that allow it to generate samples which are difficult for any discriminator to tell apart from real samples. This leads to a min-max optimization problem where we look for weights $x$ which minimize the rate (measured by a loss function $f$) at which any discriminator correctly classifies the real and fake samples. And, we seek weights $y$ for the discriminator network which maximize this rate.

Min-max formulation of GANs

where $\zeta$ is a random sample from the dataset, and $\xi \sim N(0,I_d)$ is a noise vector which the generator maps to a “fake” sample. $f_{\zeta, \xi}$ measures how accurately the discriminator $\mathcal{D}(y;\cdot)$ distinguishes $\zeta$ from $\mathcal{G}(x;\xi)$ produced by the generator using the input noise $\xi$.

In this formulation, there are several choices that we have to make as a GAN designer, and an important one is that of a loss function. One concrete choice is from the paper of Goodfellow et al.: the cross-entropy loss function:

See here for a summary and comparison of different loss functions.

Once we fix the loss function (and the architecture of the generator and discriminator), we can compute unbiased estimates of the value of $f$ and its gradients $\nabla_x f$ and $\nabla_y f$ using batches consisting of random Gaussian noise vectors $\xi_1,\ldots, \xi_n \sim N(0,I_d)$ and random samples from the dataset $\zeta_1, \ldots, \zeta_n$. For example, the stochastic batch gradient

gives us an unbiased estimate for $\nabla_x f(x,y)$.

But how do we solve the min-max optimization problem above using such a first-order access to $f$?

Gradient descent-ascent and variants

Perhaps the simplest algorithm we can try for min-max optimization is gradient descent-ascent (GDA). As the generator wants to minimize with respect to $x$ and the discriminator wants to maximize with respect to $y$, the idea is to do descent steps for $x$ and ascent steps for $y$. How exactly to do this is not clear, and one strategy is to let the generator and discriminator alternate:

Other variants include, for instance, optimistic mirror descent (OMD) (see also here and here for applications of OMD to GANs, and here for an analysis of OMD and related methods)

The advantage of such algorithms is that they are quite practical. The problem, as we discuss next, is that they are not always guaranteed to converge. Most of these guarantees only hold for special classes of loss functions $f$ that satisfy properties such as concavity (see here and here) or monotonicity, or under the assumptions that these algorithms are provided with special starting points (see here, here).

Convergence problems with current algorithms

Unfortunately there are simple functions for which some min-max optimization algorithms may never converge to any point. For instance GDA may not converge on $f(x,y) = xy$ (see Figure 1, and our previous post for a more detailed discussion).

Figure 1. GDA on $f(x,y) = xy, \, \, \, \, x,y \in [-5,5]$ (the red line is the set of global min-max points). GDA is non-convergent from almost every initial point.

As for examples relevant to ML, when using GDA to train a GAN on a dataset consisting of points sampled from a mixture of four Gaussians in $\mathbb{R}^2$, we observe that GDA tends to cause the generator to cycle between different modes corresponding to the four Gaussians. We also used GDA to train a GAN on the subset of the MNIST digits which have “0” or “1” as their label, which we refer to as the 0-1 MNIST dataset. We observed a cycling behavior for this dataset as well: After learning how to generate images of $0$’s, the GAN trained by GDA then forgets how to generate $0$’s for a long time and only generates $1$’s.

Figure 2. Mode oscillation when GDA is used to train GANs on the four Guassian mixture dataset (left) and the 0-1 MNIST dataset (right).

In algorithms such as GDA where the discriminator only makes local updates, cycling can happen for the following reason: Once the discriminator learns to identify one of the modes (say mode “A”), the generator can update $x$ in a way that greatly decreases f, by (at least temporarily) ìfoolingî the discriminator. The generator does this by learning to generate samples from a different mode (say mode “B”) which the discriminator has not yet learned to identify, and stops generating samples from mode A. However, after many iterations, the discriminator ìcatches upî to the generator and learns how to identify mode B. Since the generator is no longer generating samples from mode A, the discriminator may then ìforgetî how to identify samples from this mode. And this can cause the generator to switch back to generating only mode A.

Our first-order algorithm

To solve the min-max optimization problem, at any point $(x,y)$, we should ideally allow the discriminator to find the global maximum, $\max_z f(x,z)$. However, this may be hard for nonconcave $f$. But we could still let the discriminator run a convergent algorithm (such as gradient ascent) until it reaches a first-order stationary point, allowing it to compute an approximation $h$ for the global max function. (Note that even though $\max_z f(x,z)$ is only a function of $x$, since $h$ is a “local’’ approximation it could also depend on the initial point $y$ where we start gradient ascent.) And we also empower the generator to simulate the discriminator’s update by running gradient ascent (see our paper for discriminators with access to a more general class of first-order algorithms).

Idea 1: Use a local approximation to global max

Starting at the point $(x,y)$, update $y$ by computing multiple gradient ascent steps for $y$ until a point $w$ is reached where is close to zero and define $h(x,y) := f(x,w)$.

We would like the generator to minimize $h(\cdot,y)$. To minimize $h$, we would ideally like to update $x$ in the direction $-\nabla_x h$. However, $h$ may be discontinuous in $x$ (see our previous post for why this can happen). Moreover, even at points where $h$ is differentiable, computing the gradient of $h$ can take a long time and requires a large amount of memory.

Thus, realistically, we only have access to the value of $h$. A naive approach to minimizing $h$ would be to propose a random update to $x$, for instance an update sampled from a standard Gaussian, and then only accept this update if it causes the value of $h$ to decrease. Unfortunately, this does not lead to fast algorithms as even at points where $h$ is differentiable, in high dimensions, a random Gaussian step will be almost orthogonal to the steepest descent direction $-\nabla_x h(x,y)$, making the progress slow.

Another idea is to have the generator propose at each iteration an update in the direction of the gradient $-\nabla_x f(x,y)$, and to then have the discriminator update $y$ using gradient ascent. To see why this may be a reasonable thing to do, notice that once the generator proposes an update $v$ to $x$, the discriminator will only make updates which increase the value of f or, $h(x+v,y) \geq f(x+v,y)$. And, since $y$ is a first-order stationary point for $f(x, \cdot)$ (because $y$ was computed using gradient ascent in the previous iteration), we also have that $h(x,y)=f(x,y)$. Hence,

This means that decreasing $h$ requires us to decrease $f$ (the converse is not true). So it indeed makes sense to move in the direction $-\nabla_x f(x,y)$!

While making updates using $-\nabla_x f(x,y)$ may allow the generator to decrease $h$ more quickly than updating in a random direction, it is not always the case that updating in the direction of $-\nabla_x f$ will lead to a decrease in $h$ (and doing so may even lead to an increase in $h$!). Instead, our algorithm has the generator perform a random search by proposing an update in the direction of a batch gradient with mean $-\nabla_x f$, and accepts this move only if the value of $h$ (the local approximation) decreases. The accept-reject step prevents our algorithm from cycling between modes, and using the batch gradient for the random search allows our algorithm to be competitive with prior first-order methods in terms of running time.

Idea 2: Use zeroth-order optimization with batch gradients

Sample a batch gradient $v$ with mean $-\nabla_x f(x,y)$.
If $h(x+ v, y) < h(x,y) $ accept the step $x+v$; otherwise reject it.

A final issue, that applies even in the special case of minimization, is that converging to a local minimum point does not mean that point is desirable from an application standpoint. The same is true for the more general setting of min-max optimization. To help our algorithm escape undesirable local min-max equilibria, we use a randomized accept-reject rule inspired by simulated annealing. Simulated annealing algorithms seek to minimize a function via a randomized search, while gradually decreasing the acceptance probability of this search; in some cases this allows one to reach the global minimum of a nonconvex function (see for instance this paper). These three ideas lead us to our algorithm.

Our algorithm

Input: Initial point $(x,y)$, $f: \mathbb{R}^d \times \mathbb{R}^d\rightarrow \mathbb{R}$
Output: A local min-max equilibrium $(x,y)$

For $i = 1,2, \ldots$

Step 1: Generate a batch gradient $v$ with mean $-\nabla_x f(x,y)$ and propose the generator update $x+v$.

Step 2: Compute $h(x+v, y) = f(x+v, w)$, by simulating a discriminator update $w$ via gradient ascent on $f(x+v, \cdot)$ starting at $y$.

Step 3: If $h(x+v, y)$ is less than $h(x,y) = f(x,y)$, accept both updates: $(x,y) = (x+v, w)$. Else, accept both updates with some small probability.

In our paper, we show that our algorithm is guaranteed to converge to a type of local min-max equilibrium in $\mathrm{poly}(\frac{1}{\varepsilon},d, b, L)$ time whenever $f$ is bounded by some $b>0$ and has $L$-Lipschitz gradients. Our algorithm does not require any special starting points, or any additional assumptions on $f$ such as convexity or monotonicity. (See Definition 3.2 and Theorem 3.3 in our paper.)

Figure 3. GDA (left) and a version of our algorithm (right) on $f(x,y) = xy, \, \, \, \, x,y \in [-5,5]$. While GDA is non-convergent from almost every initial point, our algorithm converges to the set of global min-max points (the red line). To ensure it converges to a (local) equilibrium, our algorithm's generator proposes multiple updates, simulates the discriminator's response, and rejects updates which do not lead to a net decrease in $f$. It only stops if it can't find such an update after many attempts. (To stay inside $[-5,5]\times [-5,5]$ this version of our algorithm uses projected gradients.)

So, how does our algorithm perform in practice?

When training a GAN on the mixture of four Gaussians dataset, we found that our algorithm avoids the cycling behavior observed in GDA. We ran each algorithm multiple times, and evaluated the results visually. By the 1500’th iteration GDA learned only one mode in 100% of the runs, and tended to cycle between two or more modes. In contrast, our algorithm was able to learn all four modes 68% of the runs, and three modes 26% of the runs.

Figure 4. GAN trained using GDA and our algorithm on a four Gaussian mixture dataset. While GDA cycles between the Gaussian modes (red dots), our algorithm learns all four modes.

When training on the 0-1 MNIST dataset, we found that GDA tends to briefly generate shapes that look like a combination of $0$’s and $1$’s, then switches to generating only $1$’s, and then re-learns how to generate $0$’s. In contrast, our algorithm seems to learn how to generate both $0$’s and $1$’s early on and does not stop generating either digit. We repeated this simulation multiple times for both algorithms, and visually inspected the images at the 1000’th iteration. GANs trained using our algorithm generated both digits by the 1000’th iteration in 86% of the runs, while those trained using GDA only did so in 23% of the runs.

Figure 5. We trained a GAN with GDA and our algorithm on the 0-1 MNIST dataset. During the first 1000 iterations, GDA (left) forgets how to generate $0$'s, while our algorithm (right) learns how to generate both $0$'s and $1$'s early on and does not stop generating either digit.

While here we have focused on comparing our algorithm to GDA, in our paper we also include a comparison to Unrolled GANs, which exhibits cycling between modes. We also present results for CIFAR-10 (see Figures 3 and 7 in our paper), where we compute FID scores to track the progress of our algorithm. See our paper for more details; the code is available on GitHub.


In this post we have shown how to develop a practical and convergent first-order algorithm for training GANs. Our algorithm synthesizes an approximation to the global max function based on first-order algorithms, random search using batch gradients, and simulated annealing. Our simulations show that a version of this algorithm can lead to more stable training of GANs. And yet the amount of memory and time required by each iteration of our algorithm is competitive with GDA. This post, together with the previous post, show that different local approximations to the global max function $\max_z f(x,z)$ can lead to different types of convergent algorithms for min-max optimization. We believe that this idea should be useful in other applications of min-max optimization.

at July 06, 2020 09:00 AM UTC

This post is about the work of Stuart Haber and W. Scott Stornetta from 1991 on How to Time-Stamp a Digital Document and their followup paper Improving the Efficiency and Reliability of Digital Time-Stamping. In many ways, this work introduced the idea of a chain of hashes to create a...

at July 06, 2020 02:58 AM UTC

In this post I speculated on why I could not find anywhere a table of which cases of Hilbert's 10th problem were solvable, unsolvable, and unknown. (I then made such a table. It was very clunky,  which may answer the question.)

I told my formal lang theory class about Hilbert's 10th problem as a natural example of an undecidable question- that is, an example that had nothing to do with Turing Machines. On the final I asked

Give an example of an undecidable problem that has nothing to do with Turing Machines.

Because of the pandemic this was a 2-day take home final which was open-book, open-notes, open-web. So they could have looked at my slides.

And indeed, most of them did give Hilbert's 10 problem (more formally, the set of all polynomials in many vars over Z which have a Diophantine solution).

But some did not. Some said there could never be such a problem (this is an incorrect answer), Some were incoherent. One just wrote ``Kruskal Trees''  (not sure if he was referring to MSTs or WQOs or to something that Clyde Kruskal did in class one day).

One student said that the problem of, given a CFG G, is the complement of L(G) also CFG.
This is indeed undecidable and does not have to do with TMs. I doubt the student could have proven that. I doubt I could have proven that. I do not doubt that my advisor Harry Lewis could have proven that, and indeed I emailed him asking for a proof and he emailed me a sketch, which I wrote out in more detail here.

The most interesting answer was given by some students who apparently looked at the web (rather than at my slides) for lists of problems and found the following called Matrix Mortality:

{ (M_1,...,M_L) : such that some product of these matrices (you are allowed to use a matrix many times) is the 0 matrix}

Why was this the most interesting? The TA did not know this problem was undecidable until he saw it on the exams and looked it up. I did not know it was undecidable until my TA told me.

I then raised the question: How many matrices to you need and how big do their dimensions have to be?

Unlike H10, there IS a table of this. In this paper they have such a table. I state some results:

6 matrices, 3x3
4 matrices, 5x5
3 matrices 9x9
2 matrices 15x15

2 matrices 2x2

So there are some gaps to fill, but there is not the vast gulf that exists between dec and undec for Hilberts 10th problem. I also note that the paper was about UNDEC but mentioned the DEC results, where as the papers on H10 about UNDEC seem to never mention the DEC.

I am glad to know another natural Undec problem and I will likely tell my students about it next spring. And much like H10, I won't be proving it.

An open problem in education: how come some of my students got it wrong? gave an answer that was not in my notes or slides? One student told me it was easier to google

Natural Undecidable Questions

then look through my slides. Another one said:

In class you said `this is a natural undecidable problem'.

On the exam you said `a problem that does not mention Turing Machines'

I did not know they were the same. 

That student submitted the Matrix problem stated above. It IS a fair point that `natural' is an
undefined term.  But the problem on the final used the well defined concept `does not mention Turing Machines'

by GASARCH ( at July 06, 2020 02:19 AM UTC

I’ve written a number of posts about curvilinear triangles that are not the Reuleaux triangle, including MIT’s Kresge Auditorium, triforce string art, valve covers, a patio table, and the logo of Whale Cove, Nunavut. I’ve long intended to write about another obvious topic in this theme, the curved-triangle rotor of the Wankel engine, but was finally pushed into doing so by seeing that two recent popular mathematics books, How Round Is Your Circle? (2008) and Icons of Mathematics (2011) repeat the falsehood that Wankel rotors are Reuleaux triangles. They are not.

Wikipedia has a good visualization of how Wankel engines work, which I’ve copied below. They go through the same four steps as a conventional four-stroke combustion engine, in which a piston pulls away from the combustion chamber, sucking in a mixture of fuel and air, pushes back towards the chamber, compressing the mixture, ignites the mixture, pushing the piston back out and applying force to the drive shaft, and then pushes back towards the chamber, pushing the exhaust out. The difference is that in a Wankel engine, these four steps happen at four different locations within the combustion chamber, as the gases within it are pushed around by a curved triangular piston, the rotor of the engine.

Animation of a Wankel engine by Y tambe from

The driveshaft in the engine is the fixed smaller gear in the center of the animation; in the actual engine, this gearwheel would itself be spinning, but this is not shown. The triangular rotor connects to the driveshaft by an eccentric planetary gear, and spins around the driveshaft like a hula hoop around a spinning dancer. The gears have teeth and radii in the ratio 3:2, causing the driveshaft to spin three times faster than the rotor. As it does so, the three corners of the rotor (the “apex seals”) stay in contact with the outer wall of the engine, called its stator, so that the gases in the engine do not leak between different phases.

The shape of the stator is not determined by the curve of the rotor itself, but only by the trajectory of the moving apex seals. This trajectory is a curve called an epitrochoid. If you’ve ever played with a spirograph, you know what an epitroichoid is: it’s what you get by fixing one circular disk, letting another circular disk rotate around it, placing a point somewhere within the rotating disk, and tracing the curve that it follows. Here’s another Wikipedia animation:

Animation of an epitrochoid by Sam Derbyshire from

Different ratios of radii between the inner and outer disk give you different numbers of lobes in the curve, and different placements of the moving point in the outer disk (closer to or farther from the disk center) give you curves that are closer to a circle or more curvy. Placing the moving point on the outer circle itself gives you pointy rather than curvy epitrochoids, and placing it even farther out turns the inner bulges of these curves into self-crossing loops.

Spirograph trajectories differ from rotating apex seal trajectories in at least three ways: in the Wankel engine, the central circle (the driveshaft) rotates rather than being held stationary, the outer circle (the planetary gear) surrounds the central circle rather than being outside it, and the point whose motion is being traced (the apex seal) is outside the outer circle rather than inside it. Nevertheless, the shape is still a two-lobed epitrochoid; see the “double generation theorem” of the Bernoullis, as described by Nash,1 for why the same curve can be generated in multiple ways. Modulo the scale of the whole system, there is one free parameter controlling the precise shape of this epitrochoid: the ratio of the distances from the center of the rotor to the apex seals and to the planetary gear. If the apex seals are too close in, the planetary gear will bash into into the stator; if they are too far out, the stator will be close to circular and there will be little change in pressure from one part of the combustion cycle to another, losing engine efficiency. The choice made in actual engines is not the one that places the apex seals as close as possible, but seems to involve more careful optimization that considers the shape and size of the regions formed by the rotor and stator at different stages of the combustion cycle.

Once the stator shape has been determined, one can then proceed to answer the question we started with: what is the shape of the rotor? The main design constraint is that it should touch or at least stay close to the inner bulge of the stator (on its “side seals”), to prevent exhaust gas from flowing back around to the intake. The shape that achieves this can be understood by a thought experiment in which we imagine the rotor as somehow being fixed in space while the vehicle containing it rotates around it, rather than vice versa. As the vehicle rotates, its stator passes through parts of the space that cannot be occupied by the rotor. The parts of space that remain untouched by the rotating stator are available to be used by the rotor, and should be used by it if we want a rotor that stays in contact with the stator on its side seals. Mathematically, this is described as an “envelope” of the positions of the rotating stator with respect to the fixed rotor. This envelope is a curved triangle, but not a Reuleaux triangle. Its curves are flatter than a Reuleaux triangle’s arcs, but also they are not circular arcs. As an envelope of algebraic curves, they are presumably algebraic themselves, but of higher order; trigonometric formulas are given by Shung and Pennock.2

In practice, the rotor shape varies from its ideal envelope-of-epitrochoid form, in a couple of different ways. First, as Drogosz explains,3 for ease of manufacturing it is often approximated by circular arcs rather than exactly following the envelope shape. As long as the approximation stays within the envelope, the rotor will avoid colliding with the stator, and the side seal contact is not so important near the corners of the triangle, so that’s where the approximation is most noticeable. Second, real Wankel rotors often have scoops taken out from the middles of their sides, to form mini-combustion chambers that guide and shape the combustion gases within the engine.

For more details of all this, see:

  1. Nash, David H. (1977), “Rotary engine geometry”, Mathematics Magazine 2: 87–89, doi:10.1080/0025570X.1977.11976621, JSTOR:2689731 

  2. Shung, J. B. & Pennock, G. R. (1994), “Geometry for trochoidal-type machines with conjugate envelopes”, Mechanism and Machine Theory 29 (1): 25–42, doi:10.1016/0094-114X(94)90017-5 

  3. Drogosz, P. (2010), “Geometry of the Wankel rotary engine”, Journal of KONES 17 (3): 69–74, 

(Discuss on Mastodon)

by David Eppstein at July 05, 2020 02:59 PM UTC

The Isolation Lemma states that when random weights are assigned to the elements of a finite set $E$, then in any given family of subsets of $E$, exactly one set has the minimum weight, with high probability. In this note, we present two proofs for the fact that it is impossible to efficiently derandomize the Isolation Lemma for arbitrary families. The first proof is from Chari, Rohatgi and Srinivasan and uses the potential method. An alternate proof is due to the first author of this note. It uses the polynomial method. However, it is not written anywhere. The main purpose of this note is to present that proof. Additionally we show that the above lower bounds are almost tight with respect to various parameters.

at July 05, 2020 06:22 AM UTC

Some different ideas for marking the Fourth

“Founding Frenemies” source

John Adams and Thomas Jefferson did not use Zoom. Their correspondence, from 1777 up to their deaths hours apart on July 4, 1826, fills a 600-page book.

Today, Independence Day in the US, we consider the kind of intellectual fireworks represented by the correspondence.

Jefferson and Adams were intellectual opposites as well as political rivals. Adams favored a strong central government to bridle human passions, whereas Jefferson’s support for the French Revolution continued beyond its devolution into the Reign of Terror. They debated many other points of politics, philosophy, and culture.

Abigail Adams, the wife of John, joined in some of the exchanges. Because she often stayed in Massachusetts while he was in Philadelphia or New York or elsewhere, the husband and wife exchanged many letters—over 1,100 in all. His letter to her on July 3, 1776, instituted the use of fireworks to celebrate anniversaries of the Declaration of Independence.

Today there is not much in the way of fireworks displays. Most have been canceled because we cannot allow crowds to view them. In the Buffalo area, some townships are having small displays with limited access, and some displays are being set on high points for possible area viewing. So we felt we should write about fireworks of a different kind, a kind that is not restricted by the pandemic and might thrive through it. But first we’ll make a point about the history of fireworks.

Fireworks: Ancient, Early, and Modern

Fireworks go back at least 1,100 years to China, where chemists discovered the fun of stuffing volatile compounds into tubes of bamboo or paper and setting them off. Some have pyrotechnics going back another 1,000 years, to about 200 BCE, insofar as bamboo was known to pop with a loud sound when dried and heated. Gunpowder traveled best of the compounds and made its way into Europe at least by the 1200s. The first recorded wide-scale fireworks display in England was in 1486 for the wedding of King Henry VII to Elizabeth of York, which ended the Wars of the Roses. Shakespeare mentions fireworks in Love’s Labours Lost. The Mughals in India from the 1500s to the 1800s made fireworks a diversion for noble women on the Diwali holiday:

Cleveland Museum of Art source

Our point is that 1776 isn’t even halfway back to the beginning of using fireworks for celebrations, even just in the West. Can we even call it “Early”? Lavish displays to mark major events were common by the mid-1700s. A royal display in 1749 was accompanied by orchestral music commissioned from George Frideric Handel and went ahead despite rain. Over 12,000 people also paid to attend the main rehearsal six days earlier, many braving an hours-long traffic jam on approaches to the London Bridge. That feels quite modern to us. Adams’s letter mentioned other social features we know today:

It ought to be solemnized with Pomp and Parade, with Shews, Games, Sports, Guns, Bells, Bonfires and Illuminations from one End of this Continent to the other from this Time forward forever more.

The pandemic has curtailed others of these. The major North American team sports have not resumed either. Some parades have been run in “reverse” mode: the floats and performers stay put while spectators drive by slowly in cars.

Adams’s letter has another, earlier, passage that chills today. The letter begins by saying that the Declaration was supposed to have been made in December, 1775, and enumerates plans the colonies had made contingent on this. He then says that what caused the plans to be aborted was an outbreak of disease:

All these Causes however in Conjunction would not have disappointed Us, if it had not been for a Misfortune, which could not be foreseen, and perhaps could not have been prevented, I mean the Prevalence of the small Pox among our Troops. . . . This fatal Pestilence compleated our Destruction.—It is a Frown of Providence upon Us, which We ought to lay to heart.

The ellipsis is in the letter—as Ken’s children have pointed out, trailing off thought with dots in letters or e-mails or Facebook posts or texts is a distinctive habit of us older folk. Thus a specific outbreak of a contagious disease changed our history then as now.

Ideas: Ancient, Early, and Modern

We have remarked on how the pandemic has affected opportunities to exchange ideas and how to compensate. One impacted series that both of us intended to visit this spring has been the series of workshops at the Simons Institute in Berkeley.

Still, the Simons Foundation has continued its other ways to stimulate ideas. Here we offer our congratulations to Venkatesan Guruswami, Omer Reingold, and David Woodruff, who have just been appointed as Simons investigators for 2020.

In briefly talking about their work, we want to make a point about how the pandemic enables taking the long view of ideas—in a way that appointments such as these promote. It is easy to get wrapped up in immediate aspects of a current hot problem and not be aware that it has a history. The history may not involve exactly the same ideas as the problem, but related ideas whose importance was appreciated much earlier. “Early” may not mean the Middle Ages or the 1700s as with fireworks, but it can mean times before any of us were born.

Venkatesan and Omer and David each have done some stellar research, broadly in various parts of theory. They each have many results, but we thought we would highlight just one result each. We picked a result that we think is representative, is deep, is beautiful, and is one that we personally admire the most.

“Ancient” Times

Venkatesan did important work on a problem that was created before complexity theory existed. Our favorite is his ground-breaking work on list decoding.

What is the best way to encode data to protect it against various kinds of errors? This is still open. But Venkatesan changed the landscape.

The questions about error correcting codes go back to the 1940’s. Usually the first results are credited to Richard Hamming in 1947. Soon the notion of list decoding was introduced. The cool idea is that doing not require an answer, but allow a list of possible answers. The hope is that with other information about the message we might be able to select the answer.

Venkatesan and Ken’s colleague Atri Rudra found explicit codes that achieve list-decoding capacity, that is, they have optimal redundancy.

What we like so much is the model is so natural and so powerful. There are many applications of list decoding to complexity theory. See Madhu Sudan’s survey for some additional comments.

Early Times

Omer did his most important work on problems that were first studied in the early days of complexity theory. Our favorite is his beautiful work on small-memory deterministic graph walks.

Is {\mathsf{L < NL}}? This is still open. But Omer made a huge contribution to our understanding of fundamental complexity classes. Romas Aleliunas, Dick Karp, Laszlo Lovasz, and Charlie Rackoff proved earlier that random small space could navigate undirected graphs provided they could flip coins. In a sense Omer removed the coins to get his result that undirected graph connectivity is in {\mathsf{L}}. The previous result was easy—I can say that because I (Dick) was a co-author on it—but Omer’s theorem is deep.

Omer’s proof drew heavily on expander graphs and the zig-zag product from his earlier work with Salil Vadhan and Avi Wigderson for creating them.

Modern Times

David did his most important work on problems that were only created relatively recently. Our favorite is his work on approximately counting distinct elements. This work is joint with Daniel Kane and Jelani Nelson and appeared at PODS 2010. It was the first streaming algorithm with an optimal combination of space usage and update time. Here is the relevant table from their paper (KNW):

Streaming algorithms are relatively new and parts of data science are newer. But working with data is old, as old as codes. This finally leads us to pose an outlandish question:

Can all of this work be usefully interpreted from the standpoint of coding theory?

This is outlandish, because the word “code” does not even appear in either Reingold’s paper or KNW. But part of holding coding theory to be a paradigm, as both Ken and I experienced in graduate school, is that its perspective should expand. Is this capable of creating intellectual fireworks? We’ll see.

Open Problems

Have a safe and happy fourth of July.

[some small fixes]

by RJLipton+KWRegan at July 05, 2020 12:41 AM UTC

Nagoya and Mie Universities (Japan) are looking for several postdoctoral researchers to work on quantum computing, especially on the following subjects:

1) quantum algorithms,
2) quantum complexity theory,
3) theoretical aspects of quantum programming languages,
4) development of software verification tools for quantum programs,
5) quantum information theory.


by shacharlovett at July 03, 2020 11:18 AM UTC