Theory of Computing Blog Aggregator

Authors: Laura Merker, Torsten Ueckerdt
Download: PDF
Abstract: We introduce the novel concepts of local and union book embeddings, and, as the corresponding graph parameters, the local page number ${\rm pn}_\ell(G)$ and the union page number ${\rm pn}_u(G)$. Both parameters are relaxations of the classical page number ${\rm pn}(G)$, and for every graph $G$ we have ${\rm pn}_\ell(G) \leq {\rm pn}_u(G) \leq {\rm pn}(G)$. While for ${\rm pn}(G)$ one minimizes the total number of pages in a book embedding of $G$, for ${\rm pn}_\ell(G)$ we instead minimize the number of pages incident to any one vertex, and for ${\rm pn}_u(G)$ we instead minimize the size of a partition of $G$ with each part being a vertex-disjoint union of crossing-free subgraphs. While ${\rm pn}_\ell(G)$ and ${\rm pn}_u(G)$ are always within a multiplicative factor of $4$, there is no bound on the classical page number ${\rm pn}(G)$ in terms of ${\rm pn}_\ell(G)$ or ${\rm pn}_u(G)$.

We show that local and union page numbers are closer related to the graph's density, while for the classical page number the graph's global structure can play a much more decisive role. We introduce tools to investigate local and union book embeddings in exemplary considerations of the class of all planar graphs and the class of graphs of tree-width $k$. As an incentive to pursue research in this new direction, we offer a list of intriguing open problems.

at July 24, 2019 01:26 AM UTC

Authors: Björn Feldkord, Till Knollmann, Manuel Malatyali, Friedhelm Meyer auf der Heide
Download: PDF
Abstract: We extend the Mobile Server Problem, introduced in SPAA'17, to a model where k identical mobile resources, here named servers, answer requests appearing at points in the Euclidean space. In order to reduce communication costs, the positions of the servers can be adapted by a limited distance m_s per round for each server. The costs are measured similar to the classical Page Migration Problem, i.e., answering a request induces costs proportional to the distance to the nearest server, and moving a server induces costs proportional to the distance multiplied with a weight D.

We show that, in our model, no online algorithm can have a constant competitive ratio, i.e., one which is independent of the input length n, even if an augmented moving distance of (1+\delta)m_s is allowed for the online algorithm. Therefore we investigate a restriction of the power of the adversary dictating the sequence of requests: We demand locality of requests, i.e., that consecutive requests come from points in the Euclidean space with distance bounded by some constant m_c. We show constant lower bounds on the competitiveness in this setting (independent of n, but dependent on k, m_s and m_c).

On the positive side, we present a deterministic online algorithm with bounded competitiveness when augmented moving distance and locality of requests is assumed. Our algorithm simulates any given algorithm for the classical k-Page Migration problem as guidance for its servers and extends it by a greedy move of one server in every round. The resulting competitive ratio is polynomial in the number of servers k, the ratio between m_c and m_s, the inverse of the augmentation factor 1/\delta and the competitive ratio of the simulated k-Page Migration algorithm.

at July 24, 2019 01:22 AM UTC

Authors: Angelos Charalambidis, Christos Nomikos, Panos Rondogiannis
Download: PDF
Abstract: A classical result in descriptive complexity theory states that Datalog expresses exactly the class of polynomially computable queries on ordered databases. In this paper we extend this result to the case of higher-order Datalog. In particular, we demonstrate that on ordered databases, for all $k\geq2$, $k$-order Datalog captures $(k-1)$-EXPTIME. This result suggests that higher-order extensions of Datalog possess superior expressive power and they are worthwhile of further investigation both in theory and in practice. This work is under consideration for publication in Theory and Practices of Logic Programming.

at July 24, 2019 01:21 AM UTC

Authors: Nicolas Berkouk, Grégory Ginot, Steve Oudot
Download: PDF
Abstract: In this paper we provide an explicit connection between level-sets persistence and derived sheaf theory over the real line. In particular we construct a functor from 2-parameter persistence modules to sheaves over $\mathbb{R}$, as well as a functor in the other direction. We also observe that the 2-parameter persistence modules arising from the level sets of Morse functions carry extra structure that we call a Mayer-Vietoris system. We prove classification, barcode decomposition, and stability theorems for these Mayer-Vietoris systems, and we show that the aforementioned functors establish a pseudo-isometric equivalence of categories between derived constructible sheaves with the convolution or (derived) bottleneck distance and the interleaving distance of strictly pointwise finite-dimensional Mayer-Vietoris systems. Ultimately, our results provide a functorial equivalence between level-sets persistence and derived pushforward for continuous real-valued functions.

at July 24, 2019 01:26 AM UTC

Authors: Neil Immerman, Rik Sengupta
Download: PDF
Abstract: In this note, we provide details of the $k$-dimensional Weisfeiler-Leman Algorithm and its analysis from Immerman-Lander (1990). In particular, we present an optimized version of the algorithm that runs in time $O(n^{k+1}\log n)$, where $k$ is fixed (not varying with $n$).

at July 24, 2019 12:00 AM UTC

Authors: Dániel Gerbner
Download: PDF
Abstract: We consider problems that can be solved by asking certain queries. The deterministic query complexity $D(P)$ of a problem $P$ is the smallest number of queries needed to ask in order to find the solution (in the worst case), while the non-deterministic query complexity $D_0(P)$ is the smallest number of queries needed to ask, in case we know the solution, to prove that it is indeed the solution (in the worst case). Equivalently, $D(P)$ is the largest number of queries needed to find the solution in case an Adversary is answering the queries, while $D_0(P)$ is the largest number of queries needed to find the solution in case an Adversary chooses the input. We define a series of quantities between these two values, $D_k(P)$ is the largest number of queries needed to find the solution in case an Adversary chooses the input, and answers the queries, but he can change the input at most $k$ times. We give bounds on $D_k(P)$ for various problems $P$.

at July 24, 2019 01:21 AM UTC

Authors: Shahaboddin Shamshirband, Amir Mosavi, Kwok-wing Chau
Download: PDF
Abstract: Intelligent algorithms are recently used in the optimization process in chemical engineering and application of multiphase flows such as bubbling flow. This overview of modeling can be a great replacement with complex numerical methods or very time-consuming and disruptive measurement experimental process. In this study, we develop the adaptive network-based fuzzy inference system (ANFIS) method for mapping inputs and outputs together and understand the behavior of the fluid flow from other output parameters of the bubble column reactor. Neural cells can fully learn the process in their memory and after the training stage, the fuzzy structure predicts the multiphase flow data. Four inputs such as x coordinate, y coordinate, z coordinate, and air superficial velocity and one output such as pressure gradient are considered in the learning process of the ANFIS method. During the learning process, the different number of the membership function, type of membership functions and the number of inputs are examined to achieve the intelligent algorithm with high accuracy. The results show that as the number of inputs increases the accuracy of the ANFIS method rises up to R^2>0.99 almost for all cases, while the increment in the number of rules has a effect on the intelligence of artificial algorithm. This finding shows that the density of neural objects or higher input parameters enables the moded for better understanding. We also proposed a new evaluation of data in the bubble column reactor by mapping inputs and outputs and shuffle all parameters together to understand the behaviour of the multiphase flow as a function of either inputs or outputs. This new process of mapping inputs and outputs data provides a framework to fully understand the flow in the fluid domain in a short time of fuzzy structure calculation.

at July 24, 2019 12:04 AM UTC

Authors: Sankardeep Chakraborty, Kunihiko Sadakane, Srinivasa Rao Satti
Download: PDF
Abstract: We present linear time {\it in-place} algorithms for several basic and fundamental graph problems including the well-known graph search methods (like depth-first search, breadth-first search, maximum cardinality search), connectivity problems (like biconnectivity, $2$-edge connectivity), decomposition problem (like chain decomposition) among various others, improving the running time (by polynomial multiplicative factor) of the recent results of Chakraborty et al. [ESA, 2018] who designed $O(n^3 \lg n)$ time in-place algorithms for a strict subset of the above mentioned problems. The running times of all our algorithms are essentially optimal as they run in linear time. One of the main ideas behind obtaining these algorithms is the detection and careful exploitation of sortedness present in the input representation for any graph without loss of generality. This observation alone is powerful enough to design some basic linear time in-place algorithms, but more non-trivial graph problems require extra techniques which, we believe, may find other applications while designing in-place algorithms for different graph problems in the future.

at July 23, 2019 12:00 AM UTC

Authors: IA Sainul, Sankha Deb, AK Deb
Download: PDF
Abstract: Robotic grasping of arbitrary objects even in completely known environments still remains a challenging problem. Most previously developed algorithms had focused on fingertip grasp, failing to solve the problem even for fully actuated hands/grippers during adaptive/wrapping type of grasps, where each finger makes contact with object at several points. Kinematic closed form solutions are not possible for such an articulated finger which simultaneously reaches several given goal points. This paper, presents a framework for computing best grasp for an underactuated robotic gripper, based on a novel object slicing method. The proposed method quickly find contacts using an object slicing technique and use grasp quality measure to find the best grasp from a pool of grasps. To validate the proposed method, implementation has been done on twenty-four household objects and toys using a two finger underactuated robot gripper. Unlike the many other existing approaches, the proposed approach has several advantages: it can handle objects with complex shapes and sizes; it does not require simplifying the objects into primitive geometric shape; Most importantly, it can be applied on point clouds taken using depth sensor; it takes into account gripper kinematic constraints and generates feasible grasps for both adaptive/enveloping and fingertip types of grasps.

at July 24, 2019 12:03 AM UTC

Authors: Rahul Vaze, Jayakrishnan Nair
Download: PDF
Abstract: Can the popular shortest remaining processing time (SRPT) algorithm achieve a constant competitive ratio on multiple servers when server speeds are adjustable (speed scaling) with respect to the flow time plus energy consumption metric? This question has remained open for a while, where a negative result in the absence of speed scaling is well known. The main result of this paper is to show that multi-server SRPT can be constant competitive, with a competitive ratio that only depends on the power-usage function of the servers, but not on the number of jobs/servers or the job sizes (unlike when speed scaling is not allowed). When all job sizes are unity, we show that round-robin routing is optimal and can achieve the same competitive ratio as the best known algorithm for the single server problem. Finally, we show that a class of greedy dispatch policies, including policies that route to the least loaded or the shortest queue, do not admit a constant competitive ratio. When job arrivals are stochastic, with Poisson arrivals and i.i.d. job sizes, we show that random routing and a simple gated-static speed scaling algorithm achieves a constant competitive ratio.

at July 24, 2019 01:25 AM UTC

Authors: Sayan Bandyapadhyay, Tanmay Inamdar, Shreyas Pai, Kasturi Varadarajan
Download: PDF
Abstract: In this paper, we consider the colorful $k$-center problem, which is a generalization of the well-known $k$-center problem. Here, we are given red and blue points in a metric space, and a coverage requirement for each color. The goal is to find the smallest radius $\rho$, such that with $k$ balls of radius $\rho$, the desired number of points of each color can be covered. We obtain a constant approximation for this problem in the Euclidean plane. We obtain this result by combining a "pseudo-approximation" algorithm that works in any metric space, and an approximation algorithm that works for a special class of instances in the plane. The latter algorithm uses a novel connection to a certain matching problem in graphs.

at July 23, 2019 12:00 AM UTC

Andreev's Problem asks the following: Given an integer $d$ and a subset of $S \subseteq \mathbb{F}_q \times \mathbb{F}_q$, is there a polynomial $y = p(x)$ of degree at most $d$ such that for every $a \in \mathbb{F}_q$, $(a,p(a)) \in S$? We show an $\text{AC}^0[\oplus]$ lower bound for this problem. This problem appears to be similar to the list recovery problem for degree $d$-Reed-Solomon codes over $\mathbb{F}_q$ which asks the following: Given subsets $A_1,\ldots,A_q$ of $\mathbb{F}_q$, output all (if any) Reed-Solomon codewords contained in $A_1\times \cdots \times A_q$. For our purpose, we study this problem when $A_1, \ldots, A_q$ are random subsets of a given size, which may be of independent interest.

at July 23, 2019 05:29 PM UTC

Let me report today on a major breakthrough in random graph theory and probabilistic combinatorics. Congratulations to Matan, Frank, and Vojtek!

Artist: Heidi Buck. “Catch a Dragon by the Tail 2”source )

Upper tails via high moments and entropic stability by Matan Harel, Frank Mousset, and Wojciech Samotij


Suppose that X is a bounded-degree polynomial with nonnegative coefficients on the p-biased discrete hypercube. Our main result gives sharp estimates on the logarithmic upper tail probability of X whenever an associated extremal problem satisfies a certain entropic stability property. We apply this result to solve two long-standing open problems in probabilistic combinatorics: the upper tail problem for the number of arithmetic progressions of a fixed length in the p-random subset of the integers and the upper tail problem for the number of cliques of a fixed size in the random graph G_{n,p}. We also make significant progress on the upper tail problem for the number of copies of a fixed regular graph H in G_{n,p}. To accommodate readers who are interested in learning the basic method, we include a short, self-contained solution to the upper tail problem for the number of triangles in G_{n,p} for all p=p(n) satisfying \frac {\log n}{n} \ll p \ll 1.

The introduction gives very nicely the rich history of the problem.  Here is a 2002 paper by Svante Janson and Andrzej Ruciński on the infamous upper tail.  (And a lot have happened since then on this problem and on non linear large deviation theory). Following, there is a lovely section with a short solution for the case of triangles.

Forthcoming reference [48] talks about lower tails! Stay tuned!

by Gil Kalai at July 23, 2019 06:51 AM UTC

Authors: Oscar Defrain, Lhouari Nourine, Simon Vilmin
Download: PDF
Abstract: It is well known that every closure system can be represented by an implicational base, or by the set of its meet-irreducible elements. In Horn logic, these are respectively known as the Horn expressions and the characteristic models. In this paper, we consider the problem of translating between the two representations in acyclic convex geometries. Quite surprisingly, we show that the problem in this context is already harder than the dualization in distributive lattices, a generalization of the well-known hypergraph dualization problem for which the existence of an output quasi-polynomial time algorithm is open. In light of this result, we consider a proper subclass of acyclic convex geometries, namely ranked convex geometries, as those that admit a ranked implicational base analogous to that of ranked posets. For this class, we provide output quasi-polynomial time algorithms based on hypergraph dualization for translating between the two representations. This improves the understanding of a long-standing open problem.

at July 23, 2019 11:37 PM UTC

Authors: Ronald de Wolf
Download: PDF
Abstract: This is a set of lecture notes suitable for a Master's course on quantum computation and information from the perspective of theoretical computer science. The first version was written in 2011, with many extensions and improvements in subsequent years. The first 10 chapters cover the circuit model and the main quantum algorithms (Deutsch-Jozsa, Simon, Shor, Hidden Subgroup Problem, Grover, quantum walks, Hamiltonian simulation and HHL). They are followed by 2 chapters about complexity, 4 chapters about distributed ("Alice and Bob") settings, and a final chapter about quantum error correction. Appendices A and B give a brief introduction to the required linear algebra and some other mathematical and computer science background. All chapters come with exercises, with some hints provided in Appendix C.

at July 23, 2019 11:29 PM UTC

Authors: Michal R. Przybylek, Pawel Siedlecki
Download: PDF
Abstract: In this paper we revisit the classical problem of polynomial interpolation, with a slight twist; namely, polynomial evaluations are available up to a group action of the unit circle on the complex plane. It turns out that this new setting allows for a phaseless recovery of a polynomial in a polynomial time.

at July 23, 2019 11:34 PM UTC

Authors: Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi
Download: PDF
Abstract: Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters. As current approaches suffer from limited bandwidth of the network, we propose the use of communication compression in the decentralized training context. We show that Choco-SGD $-$ recently introduced and analyzed for strongly-convex objectives only $-$ converges under arbitrary high compression ratio on general non-convex functions at the rate $O\bigl(1/\sqrt{nT}\bigr)$ where $T$ denotes the number of iterations and $n$ the number of workers. The algorithm achieves linear speedup in the number of workers and supports higher compression than previous state-of-the art methods. We demonstrate the practical performance of the algorithm in two key scenarios: the training of deep learning models (i) over distributed user devices, connected by a social network and (ii) in a datacenter (outperforming all-reduce time-wise).

at July 23, 2019 11:59 PM UTC

Authors: Sankardeep Chakraborty, Roberto Grossi, Kunihiko Sadakane, Srinivasa Rao Satti
Download: PDF
Abstract: Deterministic finite automata are one of the simplest and most practical models of computation studied in automata theory. Their conceptual extension is the non-deterministic finite automata which also have plenty of applications. In this article, we study these models through the lens of succinct data structures where our ultimate goal is to encode these mathematical objects using information-theoretically optimal number of bits along with supporting queries on them efficiently. Towards this goal, we first design a succinct data structure for representing any deterministic finite automaton $\mathcal{D}$ having $n$ states over a $\sigma$-letter alphabet $\Sigma$ using $(\sigma-1) n\log n + O(n \log \sigma)$ bits of space, which can determine, given an input string $x$ over $\Sigma$, whether $\mathcal{D}$ accepts $x$ in $O(|x| \log \sigma)$ time, using constant words of working space. When the input deterministic finite automaton is acyclic, not only we can improve the above space-bound significantly to $(\sigma -1) (n-1)\log n+ 3n + O(\log^2 \sigma) + o(n)$ bits, we also obtain optimal query time for string acceptance checking. More specifically, using our succinct representation, we can check if a given input string $x$ can be accepted by the acyclic deterministic finite automaton using time proportional to the length of $x$, hence, the optimal query time. We also exhibit a succinct data structure for representing a non-deterministic finite automaton $\mathcal{N}$ having $n$ states over a $\sigma$-letter alphabet $\Sigma$ using $\sigma n^2+n$ bits of space, such that given an input string $x$, we can decide whether $\mathcal{N}$ accepts $x$ efficiently in $O(n^2|x|)$ time. Finally, we also provide time and space-efficient algorithms for performing several standard operations such as union, intersection, and complement on the languages accepted by deterministic finite automata.

at July 23, 2019 11:55 PM UTC

Authors: Maciej Drwal
Download: PDF
Abstract: We consider the robust version of items selection problem, in which the goal is to choose representatives from a family of sets, preserving constraints on the allowed items' combinations. We prove NP-hardness of the deterministic version, and establish polynomially solvable special cases. Next, we consider the robust version in which we aim at minimizing the maximum regret of the solution under interval parameter uncertainty. We show that this problem is hard for the second level of polynomial-time hierarchy. We develop an exact solution algorithm for the robust problem, based on cut generation, and present the results of computational experiments.

at July 23, 2019 11:37 PM UTC

Authors: Edith Hemaspaandra, Lane A. Hemaspaandra, Jörg Rothe
Download: PDF
Abstract: Prior work on the complexity of bribery assumes that the bribery happens simultaneously, and that the briber has full knowledge of all voters' votes. But neither of those assumptions always holds. In many real-world settings, votes come in sequentially, and the briber may have a use-it-or-lose-it moment to decide whether to bribe/alter a given vote, and at the time of making that decision, the briber may not know what votes remaining voters are planning on casting.

In this paper, we introduce a model for, and initiate the study of, bribery in such an online, sequential setting. We show that even for election systems whose winner-determination problem is polynomial-time computable, an online, sequential setting may vastly increase the complexity of bribery, in fact jumping the problem up to completeness for high levels of the polynomial hierarchy or even PSPACE. On the other hand, we show that for some natural, important election systems, such a dramatic complexity increase does not occur, and we pinpoint the complexity of their bribery problems in the online, sequential setting.

at July 23, 2019 11:20 PM UTC

Authors: Samuel C. Gutekunst, David P. Williamson
Download: PDF
Abstract: The traveling salesman problem (TSP) is a fundamental problem in combinatorial optimization. Several semidefinite programming relaxations have been proposed recently that exploit a variety of mathematical structures including, e.g., algebraic connectivity, permutation matrices, and association schemes. The main results of this paper are twofold. First, de Klerk and Sotirov [9] present an SDP based on permutation matrices and symmetry reduction; they show that it is incomparable to the subtour elimination linear program, but generally dominates it on small instances. We provide a family of \emph{simplicial TSP instances} that shows that the integrality gap of this SDP is unbounded. Second, we show that these simplicial TSP instances imply the unbounded integrality gap of every SDP relaxation of the TSP mentioned in the survey on SDP relaxations of the TSP in Section 2 of Sotirov [24]. In contrast, the subtour LP performs perfectly on simplicial instances. The simplicial instances thus form a natural litmus test for future SDP relaxations of the TSP.

at July 23, 2019 11:58 PM UTC

Authors: Jacob Turner
Download: PDF
Abstract: Since the first solutions finding minimally weighted routes in weighted digraphs, a plethora of literature has appeared improving the performance of shortest-path queries for use in real-world applications. In this paper, we detail how an advanced pre-processing technique for routing algorithms (which create objects known as Contraction Hierarchies) may be combined with the connection scan algorithm, an algorithm originally designed to work with public transportation networks using time tables. This provides an improvement over bi-directional Dijkstra or A${}^*$ search on Contraction Hierarchies.

at July 23, 2019 11:36 PM UTC

Authors: Minming Li, Pinyan Lu, Yuhao Yao, Jialin Zhang
Download: PDF
Abstract: In this paper, we study the two-facility location game on a line with optional preference where the acceptable set of facilities for each agent could be different and an agent's cost is his distance to the closest facility within his acceptable set. The objective is to minimize the total cost of all agents while achieving strategyproofness. We design a deterministic strategyproof mechanism for the problem with approximation ratio of 2.75, improving upon the earlier best ratio of n/2+1.

at July 23, 2019 11:43 PM UTC

Authors: Marc Hellmuth, Manuela Geiß, Peter F. Stadler
Download: PDF
Abstract: Reciprocal best match graphs (RBMGs) are vertex colored graphs whose vertices represent genes and the colors the species where the genes reside. Edges identify pairs of genes that are most closely related with respect to an underlying evolutionary tree. In practical applications this tree is unknown and the edges of the RBMGs are inferred by quantifying sequence similarity. Due to noise in the data, these empirically determined graphs in general violate the condition of being a ``biologically feasible'' RBMG. Therefore, it is of practical interest in computational biology to correct the initial estimate. Here we consider deletion (remove at most $k$ edges) and editing (add or delete at most $k$ edges) problems. We show that the decision version of the deletion and editing problem to obtain RBMGs from vertex colored graphs is NP-hard. Using known results for the so-called bicluster editing, we show that the RBMG editing problem for $2$-colored graphs is fixed-parameter tractable.

A restricted class of RBMGs appears in the context of orthology detection. These are cographs with a specific type of vertex coloring known as hierarchical coloring. We show that the decision problem of modifying a vertex-colored graph (either by edge-deletion or editing) into an RBMG with cograph structure or, equivalently, to an hierarchically colored cograph is NP-complete.

at July 23, 2019 12:00 AM UTC

Authors: Alex L. Wang, Fatma Kilinc-Karzan
Download: PDF
Abstract: We consider the Generalized Trust Region Subproblem (GTRS) of minimizing a nonconvex quadratic objective over a nonconvex quadratic constraint. A lifting of this problem recasts the GTRS as minimizing a linear objective subject to two nonconvex quadratic constraints. Our first main contribution is structural: we give an explicit description of the convex hull of this nonconvex set in terms of the generalized eigenvalues of an associated matrix pencil. This result may be of interest in building relaxations for nonconvex quadratic programs. Moreover, this result allows us to reformulate the GTRS as the minimization of two convex quadratic functions in the original space. Our next set of contributions is algorithmic: we present an algorithm for solving the GTRS up to an epsilon additive error based on this reformulation. We carefully handle numerical issues that arise from inexact generalized eigenvalue and eigenvector computations and establish explicit running time guarantees for these algorithms. Notably, our algorithms run in linear (in the size of the input) time. Furthermore, our algorithm for computing an epsilon-optimal solution has a slightly-improved running time dependence on epsilon over the state-of-the-art algorithm. Our analysis shows that the dominant cost in solving the GTRS lies in solving a generalized eigenvalue problem -- establishing a natural connection between these problems. Finally, generalizations of our convex hull results allow us to apply our algorithms and their theoretical guarantees directly to equality-, interval-, and hollow- constrained variants of the GTRS. This gives the first linear-time algorithm in the literature for these variants of the GTRS.

at July 23, 2019 11:42 PM UTC

Authors: Xiao-ke Zhu, Tai-ning Chen, Jing He, Wei Zhou
Download: PDF
Abstract: CPU-SIMD/GPU/TPUs will be increasingly powerful. The algorithm using neural network and heterogeneous computing framework will bring significant performance improvement. In this paper we prove a novel neural network-based sorting algorithm, NNS which hold lower time complexity than $O(nlogn)$ and easy implement in heterogeneous framework executed by CPU and GPU. Our initial results show that our learned sorting algorithm can increases by $2$X than std::sort(). More importantly, this work provides just a glimpse of using neural network to enhance or even replace classical algorithm and also the benefit of designing algorithm specifically for heterogeneous computing frameworks.

at July 23, 2019 11:45 PM UTC

Authors: Jayadev Acharya, Clément L. Canonne, Yanjun Han, Ziteng Sun, Himanshu Tyagi
Download: PDF
Abstract: We study goodness-of-fit of discrete distributions in the distributed setting, where samples are divided between multiple users who can only release a limited amount of information about their samples due to various information constraints. Recently, a subset of the authors showed that having access to a common random seed (i.e., shared randomness) leads to a significant reduction in the sample complexity of this problem. In this work, we provide a complete understanding of the interplay between the amount of shared randomness available, the stringency of information constraints, and the sample complexity of the testing problem by characterizing a tight trade-off between these three parameters. We provide a general distributed goodness-of-fit protocol that as a function of the amount of shared randomness interpolates smoothly between the private- and public-coin sample complexities. We complement our upper bound with a general framework to prove lower bounds on the sample complexity of this testing problems under limited shared randomness. Finally, we instantiate our bounds for the two archetypal information constraints of communication and local privacy, and show that our sample complexity bounds are optimal as a function of all the parameters of the problem, including the amount of shared randomness.

A key component of our upper bounds is a new primitive of domain compression, a tool that allows us to map distributions to a much smaller domain size while preserving their pairwise distances, using a limited amount of randomness.

at July 23, 2019 11:36 PM UTC

Authors: Will Ma, David Simchi-Levi, Jinglong Zhao
Download: PDF
Abstract: We study an online knapsack problem where the items arrive sequentially and must be either immediately packed into the knapsack or irrevocably discarded. Each item has a different size and the objective is to maximize the total size of items packed. While the competitive ratio of deterministic algorithms for this problem is known to be 0, the competitive ratio of randomized algorithms has, surprisingly, not been considered until now. We derive a random-threshold algorithm which is 0.432-competitive, and show that our threshold distribution is optimal.

We also consider the generalization to multiple knapsacks, where an arriving item has a different size in each knapsack and must be placed in at most one. This is equivalent to the Adwords problem where item truncation is not allowed. We derive a randomized algorithm for this problem which is 0.214-competitive.

at July 23, 2019 11:37 PM UTC

(This is a joint post with David Marcus. You'll see why later.)

In a prior I posed two infinite hat problems. Today I post the solutions. Actually this is a copy of my last post with the solutions added, so it is self contained.

A Hat Problem that you have probably seen:

1) There are an infinite number of people, numbered 1,2,3,... There are 2 colors of hats. They can all see everyone's hat but their own.

2) The adversary is going to put hats on all the people. They will guess their own hat color at the same time.

3) The people can discuss strategy ahead of time, but must use a deterministic strategy and the adversary knows the strategy.

4) The people want to minimize how many they get wrong.

5) The adversary puts on hats to maximize how many they get wrong.

I ask two questions (the answers are in a document I point to) and one meta-question:

Q1: Is there a solution where they get all but a finite number of the guesses right? (If you have read my prior post on hat puzzles, here then you can do this one.)

Q2: Is there a solution where they get all but at most (say) 18 wrong.

Answers to Q1 and Q2 are here.

How did I get into this problem? I was looking at hat problems a while back. Then I began discussing Q1 and Q2 by email (Does the term discussing have as a default that it is by email?) with David Marcus who had just read the chapter of Problems with a Point on hat puzzles. After a few emails back and fourth, he began looking on the web for answers. He found one. There is a website of hat puzzles! It was MY website papers on Hat Puzzles! It is here. And on it was a relevant paper here. We did not find any other source of the problem or its solution.

Q3: How well known is problem Q2 and the solution? I've seen Q1 around but the only source on Q2 that I know of is that paper, and now this blog post. So, please leave a comment telling me if you have seen Q2 and/or the solution someplace else, and if so where.

The responses to my last post indicated that YES the problem was out there, but the proof that you could not get all-but-18 was not well known.

I THINK that all of the proofs that you can't do all-but-18 in the comment of the last post were essentially the same as the solution I pointed to in this blog. I would be interested if there is an alternative proof.

by GASARCH ( at July 22, 2019 03:00 AM UTC

The catalytic Turing machine is a model of computation defined by Buhrman, Cleve, Kouck, Loff, and Speelman (STOC 2014). Compared to the classical space-bounded Turing machine, this model has an extra space which is filled with arbitrary content in addition to the clean space. In such a model we study if this additional filled space can be used to increase the power of computation or not, with the condition that the initial content of this extra filled space must be restored at the end of the computation. In this paper, we define the notion of unambiguous catalytic Turing machine and prove that under a standard derandomization assumption, the class of problems solved by an unambiguous catalytic Turing machine is same as the class of problems solved by a general nondeterministic catalytic Turing machine in the logspace setting.

at July 21, 2019 11:21 AM UTC

A twit long summary: The cyclic polytope is wonderful and whenever we construct an analogous object we are happy. Examples: Neighborly cubic polytopes; The amplituhedron; and as of last week, the Novik-Zheng new construction of neighborly centrally symmetric spheres!

At last: Neighborly CS spheres!

The news: Isabella Novik and Hailun Zheng’ paper  Highly neighborly centrally symmetric spheres, resolves an old standing problem in this field.

Here is the abstract:

In 1995, Jockusch constructed an infinite family of centrally symmetric 3-dimensional simplicial spheres that are cs-2-neighborly. Here we generalize his construction and show that for all d ≥ 4 and n ≥ d, there exists a centrally symmetric (d − 1)-dimensional simplicial sphere with 2n vertices that is cs-[d/2]-neighborly. This result combined with work of Adin and Stanley completely resolves the upper bound problem for centrally symmetric simplicial spheres.

Congratulations to Isabella and Hailun!

Some background to the Novik and Zheng breakthrough

Centrally symmetric bodies: A centrally symmetric (cs) polytope convex body in \mathbb R^d satisfies x \in P implies -x \in P. Centrally symmetric bodies are the unit balls of normed spaces.

Centrally symmetric simplicial spheres: A triangulation K of a  (d-1)-dimensional sphere with a set V of vertices is centrally symmetric if there is an involution \phi on V that \phi maps a face of K to a face of K and for every vertex v, \phi(v) \ne v and \{v , \phi (v)\} is not an edge of K.  The boundary complex of a cs d-polytopes is a cs triangulation of S^{d-1}.

Neighborliness. A simplicial complex K  is m-neighborly of every set of m vertices of P form a face.  (The definition was first considered  for simplicial d-polytopes P.) The cyclic d-polytope with n vertices is [d/2]-neighborly. (The only ([d/2]+1)-neighborly simplicial (d-1)-sphere is the simplex. There are many other [d/2]-neighborly simplicial $d$-polytopes and (d-1)-spheres.

cs-Neighborliness. Let K be a simplicial complex with an involution \phi on its vertices which acts on K (maps faces to faces) and has the property that \phi(v) is not adjacent to v (and \phi (v) \ne v). We will call v and \phi (v) antipodal. K is cs-m-neighborly if every set of m vertices that contains no pair of antipodal vertices is a face of K. The only cs-m-neighborly simplicial sphere is the boundary complex of the cross polytope.

The existence of cs-[d/2]-neighborly spheres. It was an important open question to understand if cs [d/2]-neighborly simplicial spheres exist. (The only cs-([d/2]+1)-neighborly spheres is the boundary complex of the cross polytope.)  The first example (whoch is not a cross polytope) was given by Grunbaum in 1969 in his paper “The importance of being straight(?).” In 1995  Jockusch constructed an infinite family cs-2-neighborly centrally symmetric 3-dimensional simplicial spheres. This problem has now been solved by Novik and Zheng.

Neighborly centrally symmetric polytopes.  In the 1960s Grünbaum noted the big difference between neighborly centrally symmetric spheres and centrally symmetric polytopes. He proved (This is Theorem 4.1 in his book “Convex polytopes”) that no cs-2-neighborly 4 polytope with 12 vertices exists.  This is an example of the important themes of “streightening” or “linearizing” combimatorial objects and  of extending theorems from the “streight” or “linear” case to more general combinatorial settings.)

This result by Grünbaum was extended in various directions. Let me mention two major results in the field:

Theorem McMullen and Shephard (1968): A cs d-dimensional polytope with 2(d + 2) vertices cannot be more than cs-⌊(d + 1)/3⌋-neighborly.

Theorem Linial and Novik (2006): A cs-2-neighborly d-dimensional polytope has at most  2^d;

Novik (2017) constructed  cs-2-neighborly d polytopes with 2^{d-1}+1 vertices. She used a 2017 breakthrough construction by Gerencsér–Harang of an acute set of size 2^{d-1}+1 in \mathbb R^d. (A set S is acute if every three points from S determine an acute triangle.)

Face numbers of centrally-symmetric polytopes and spheres. As the abstract asserts the new construction is related to questions about face numbers of centrally symmetric polytopes, spheres and other cellular objects. In fact, this was the next item in our planned posts on algebraic combinatorics of cellular objects. (The first and only post so far is here.) Here is a recent survey by Isabella Novik  A tale on centrally symmetric  polytopes and spheres.

Upper bound theorems. Neighborly polytopes and spheres are the equality cases of the upper bound theorem (proved by McMullen for polytopes and by Stanley for spheres). A version of the upper bound inequality for centrally symmetric spheres was proved by Adin and Stanley and the new construction shows that the Adin-Stanley inequality is tight. For more on the upper bound theorem and neighborliness see Section 2 of my 2000 survey  Combinatorics with geometric flavor.  See also the post How the g-conjecture came about and  and the post Richard Stanley: How the Proof of the Upper Bound Theorem (for spheres) was Found.



by Gil Kalai at July 20, 2019 06:33 PM UTC

While I wait to board a flight at my favorite location on earth—Philadelphia International Airport—I figured I might as well blog something to mark the 50th anniversary of Apollo 11. (Thanks also to Joshua Zelinsky for a Facebook post that inspired this.)

I wasn’t alive for Apollo, but I’ve been alive for 3/4 of the time after it, even though it now seems like ancient history—specifically, like a Roman cathedral being gawked at by a medieval peasant, like an achievement by some vanished, more cohesive civilization that we can’t even replicate today, let alone surpass.

Which brings me to a depressing mystery: why do so many people now deny that humans walked on the moon at all? Like, why that specifically? While they’re at it, why don’t they also deny that WWII happened, or that the Beatles existed?

Surprisingly, skepticism of the reality of Apollo seems to have gone all the way back to the landings themselves. One of my favorite stories growing up was of my mom, as a teenager, working as a waitress at an Israeli restaurant in Philadelphia, on the night of Apollo 11 landing. My mom asked for a few minutes off to listen to news of the landing on the radio. The owners wouldn’t grant it—explaining that it was all Hollywood anyway, just some actors in spacesuits on a sound stage, and obviously my mom wasn’t so naïve as to think anyone was actually walking to the moon?

Alas, as we get further and further from the event, with no serious prospect of ever replicating it past the stage of announcing an optimistic timetable (nor, to be honest, any scientific reason to replicate it), as the people involved die off, and as our civilization becomes ever more awash in social-media-fueled paranoid conspiracies, I fear that moon-landing denalism will become more common.

Because here’s the thing: Apollo could happen, but only because of a wildly improbable, once-in-history confluence of social and geopolitical factors. It was economically insane, taking 100,000 people and 4% of the US federal budget for some photo-ops, a flag-planting, some data and returned moon rocks that had genuine scientific value but could’ve been provided much more cheaply by robots. It was dismantled immediately afterwards like a used movie set, rather than leading to any greater successes. Indeed, manned spaceflight severely regressed afterwards, surely mocking the expectations of every last science fiction fan and techno-utopian who was alive at that time.

One could summarize the situation by saying that, in certain respects, the Apollo program really was “faked.” It’s just that the way they “faked” it, involved actually landing people on the moon!

by Scott at July 19, 2019 09:40 PM UTC

This is a continuation of Julien Mairal‘s guest post on CNNs, see part I here.

Stability to deformations of convolutional neural networks

In their ICML paper Zhang et al. introduce a functional space for CNNs with one layer, by noticing that for some dot-product kernels, smoothed variants of rectified linear unit activation functions (ReLU) live in the corresponding RKHS, see also this paper and that one. By following a similar reasoning with multiple layers, it is then possible to show that the functional space described in part I \{ f_w: x \mapsto \langle w , \Phi_n(x_0) \rangle; w \in L^2(\Omega,\mathcal{H}_n) \} contains CNNs with such smoothed ReLU, and that the norm \|f_w\| of such networks can be controlled by the spectral norms of filter matrices. This is consistent with previous measures of complexity for CNNs, see this paper by Bartlett et al.

A perhaps more interesting finding is that the abstract representation \Phi_n(x), which only depends on the network architecture, may provide near-translation invariance and stability to small image deformations while preserving information—that is, x can be recovered from \Phi_n(x). The original characterization we use was introduced by Mallat in his paper on the scattering transform—a multilayer architecture akin to CNNs based on wavelets, and was extended to \Phi_n by Alberto Bietti, who should be credited for all the hard work here.

Our goal is to understand under which conditions it is possible to obtain a representation that (i) is near-translation invariant, (ii) is stable to deformations, (iii) preserves signal information. Given a C^1-diffeomorphism \tau: \mathbb{R}^2 \to \mathbb{R}^2 and denoting by L_\tau x(u) = x(u-\tau(u)) its action operator (for an image defined on the continuous domain \mathbb{R}^2), the main stability bound we obtain is the following one, see Theorem 7 in Mallat’s paper if \|\nabla \tau\|_\infty \leq 1/2, for all x,

    \[ \| \Phi_n(L_\tau x) - \Phi_n(x)\| \leq \left ( C_1 (1+n) \|\nabla \tau\|_\infty + \frac{C_2}{\sigma_n} \|\tau\|_\infty \right) \|x\|, \]

where C_1, C_2 are universal constants, \sigma_n is the scale parameter of the pooling operator A_n corresponding to the “amount of pooling” performed up to the last layer, \|\tau\|_\infty is the maximum pixel displacement and \|\nabla \tau\|_\infty represents the maximum amount of deformation, see the paper for the precise definitions of all these quantities. Note that when C_2/\sigma_n \to 0, the representation \Phi_n becomes translation invariant: indeed, consider the particular case of \tau being a translation, then \nabla \tau=0 and \|\Phi_n(L_\tau x) - \Phi_n(x)\| \to 0.

The stability bound and a few additional results tell us a few things about the network architecture: (a) small patches lead to more stable representations (the dependency is hidden in C_1); (b) signal preservation for discrete signals requires small subsampling factors (and thus small pooling) between layers. In such a setting, the scale parameter \sigma_n still grows exponentially with n and near translation invariance may be achieved with several layers.

Interestingly, we may now come back to the Cauchy-Schwarz inequality from part 1, and note that if \Phi_n is stable, the RKHS norm \|f\| is then a natural quantity that provides stability to deformations to the prediction function f, in addition to measuring model complexity in a traditional sense.

Feature learning in RKHSs and convolutional kernel networks

The previous paragraph is devoted to the characterization of convolutional architectures such as CNNs but the previous kernel construction can in fact be used to derive more traditional kernel methods. After all, why should one spend efforts defining a kernel between images if not to use it?

This can be achieved by considering finite-dimensional approximations of the previous feature maps. In order to shorten the presentation, we simply describe the main idea based on the Nystrom approximation and refer to the paper for more details. Approximating the infinite-dimensional feature maps x_k (see the figure at the top of part I) can be done by projecting each point in \mathcal{H}_k onto a p_k-dimensional subspace \mathcal{F}_k leading to a finite-dimensional feature map \tilde{x}_k akin to CNNs, see the figure at the top of the post.

By parametrizing \mathcal{F}_k=\text{span}(\varphi_k(z_1),\varphi_k(z_2),\ldots,\varphi_k(z_{p_k})) with p_k anchor points Z=[z_1,\ldots,z_{p_k}], and using a dot-product kernel, a patch z from \tilde{x}_{k-1} is encoded through the mapping function

    \[ \psi_k(z) = \|z\| \kappa_k( Z^\top Z)^{-1/2} \kappa_k\left( Z^\top \frac{z}{\|z\|} \right), \]

where \kappa_k is applied pointwise. Then, computing \tilde{x}_k from \tilde{x}_{k-1} admits a CNN interpretation, where only the normalization and the matrix multiplication by \kappa_k( Z^\top Z)^{-1/2} are not standard operations. It remains now to choose the anchor points:

  • kernel approximation: a first approach consists of using a variant of the Nystrom method, see this paper and that one. When plugging the corresponding image representation in a linear classifier, the resulting approach behaves as a classical kernel machine. Empirically, we observe that the higher the number of anchor points, the better the kernel approximation, and the higher the accuracy. For instance, a two-layer network with a 300k-dimensional representations achieves about 86\% accuracy on CIFAR-10 without data augmentation (see here).
  • back-propagation, feature selection: learning the anchor points Z can also be done as in a traditional CNN, by optimizing them end-to-end. This allows using deeper lower-dimensional architectures and empirically seems to perform better when enough data is available, e.g., 92\% accuracy on CIFAR-10 with simple data augmentation. There, the subspaces \mathcal{F}_k are not learned anymore to provide the best kernel approximation, but the model seems to perform a sort of feature selection in each layer’s RKHS \mathcal{H}_k, which is not well understood yet (This feature selection interpretation is due to my collaborator Laurent Jacob).

Note that the first CKN model published here was based on a different approximation principle, which was not compatible with end-to-end training. We found this to be less scalable and effective.

Other links between neural networks and kernel methods

Finally, other links between kernels and infinitely-wide neural networks with random weights are classical, but they were not the topic of this blog post (they should be the topic of another one!). In a nutshell, for a large collection of weights distributions and nonlinear functions s: \mathbb{R} \to \mathbb{R}, the following quantity admits an analytical form

    \[ K(x,x') = \E_{w}[ s(w^\top x) s(w^\top x')], \]

where the terms s(w^\top x) may be seen as an infinitely-wide single-layer neural network. The first time such a relation appears is likely to be in the PhD thesis of Radford Neal with a Gaussian process interpretation, and it was revisited later by Le Roux and Bengio and by Cho and Saul with multilayer models.

In particular, when s is the rectified linear unit and w follows a Gaussian distribution, it is known that we recover the arc-cosine kernel. We may also note that random Fourier features also yield a similar interpretation.

Other important links have also been drawn recently between kernel regression and strongly over-parametrized neural networks, see this paper and that one, which is another exciting story.

by Sebastien Bubeck at July 17, 2019 04:02 PM UTC

by Nataly Brukhim, Naman Agarwal and Elad Hazan, based on this paper 

In a famous 1906 competition at a local fair in Plymouth, England, participants were asked to guess the weight of an ox. Out of a crowd of hundreds, no one came close to the ox’s actual weight, but the average of all guesses was almost correct. How is it that combining the opinions of laymen can somehow arrive at highly reasoned decisions, despite the weak judgment of individual members? This concept of harnessing wisdom from weak rules of thumb to form a highly accurate prediction rule, is the basis of ensemble methods and boosting. Boosting is a theoretically sound methodology that has transformed machine learning across a variety of applications; in classification and regression tasks, online learning, and many more.

In the case of online learning, examples for training a predictor are not available in advance, but are revealed one at a time. Online boosting combines a set of online prediction rules, or weak learners. At every time step, each weak learner outputs a prediction, suffers some loss and is then updated accordingly. The performance of an online learner is measured using the regret criterion, which compares the accumulated loss over time with that of the best fixed decision in hindsight. A Boosting algorithm can choose which examples are fed to each of the weak learners, as well as the losses they incur. Intuitively, the online booster can encourage some weak learners to become really good in predicting certain common cases, while allowing others to focus on edge cases that are harder to predict. Overall, the online boosting framework can achieve low regret guarantees based on the learners’ individual regret values.

However, online learning can become more challenging when our actions have consequences on the environment. This can be illustrated with the following experiment: imagine learning to balance a long pole on your hand. When you move your hand slightly, the pole tilts. You then move your hand in the opposite direction, and it bounces back and tilts to the other side. One jerk the wrong way might have you struggling for a good few seconds to rebalance. In other words, a sequence of decisions you made earlier determines whether or not the pole is balanced at any given time, rather than the single decision you make at that point.

More generally, consider cases when our environment has a state, and is in some sense “remembering” our past choices. A stateful framework, able to model a wide range of such phenomena, is a dynamical system. A dynamical system can be thought of as a function that determines, given the current state, what the state of the system will be in the next time step. Think of the physical dynamics that determines our pole’s position based on sequential hand movements. Other intuitive examples are the fluctuations of stock prices in the stock market, or the local weather temperatures; these can all be modeled with dynamical systems.

So how can boosting help us make better predictions for a dynamical system? In recent work we propose an algorithm, which we refer to as DynaBoost, that achieves this goal. In the paper we provide theoretical regret bounds, as well as an empirical evaluation in a variety of applications, such as online control and time-series prediction.

Learning for Online Control

Control theory is a field of applied mathematics that deals with the control of various physical processes and engineering systems. The objective is to design an action rule, or controller, for a dynamical system such that steady state values are achieved quickly, and the system maintains stability.

Consider a simple Linear Dynamical System (LDS):

x_{t+1} = A x_t + B u_t + w_t

where x_t,u_t are the state and control values at time t, respectively. Assume a known transition dynamics specified by the matrices A and B, and an arbitrary disturbance to the system given by w_t. The goal of the controller is to minimize a convex cost function c_t(x_t,u_t).

A provably optimal controller for the Gaussian noise case (where w_t are normally distributed) and when the cost functions are quadratic, is the Linear Quadratic Regulator (LQR). LQR computes a pre-fixed matrix K such that u_t^{LQR} = K x_t. In other words, LQR computes a linear controller – which linearly maps the state into a control at every time step.

A recent advancement in online control considers arbitrary disturbances, as opposed to normally distributed noise. In this more general setting, there is no closed form for the optimal controller. Instead, it is proposed to use a weighted sum of previously observed noises, i.e., u_t^{WL} = K x_t + \sum_{i=1}^H M_i w_{t-i} , where M_1,...,M_H are learned parameters, updated in an online fashion. This method is shown to attain vanishing average regret compared to the best fixed linear controller in hindsight, and is applicable for general convex cost functions as opposed to only quadratics.

Crucially, the state-dependent term Kx_t is not learned. Since the learned parameters of the above controller therefore considers only a fixed number of recent disturbances, we can apply existing online convex optimization techniques developed for learning with loss functions that have bounded memory.

Boosting for Online Control

Using the insights described above to remove state in online control, we can now use techniques from online boosting. DynaBoost maintains multiple copies of the base-controller above, with each copy corresponding to one stage in boosting. At every time step, the control u_t is obtained from a convex combination of the base-controllers’ outputs u_t^{WL(1)},...,u_t^{WL(N)}. To update each base-controller’s parameters, DynaBoost feeds each controller with a residual proxy cost function, and seeks to obtain a minimizing point in the direction of the residual loss function’s gradient. Stability ensures that minimizing regret over the proxy costs (which have finite memory) suffices to minimize overall regret. See our paper for the detailed description of the algorithm and its regret guarantees.

Sanity check experiment

We first conducted experiments for the standard LQR setting with i.i.d. Gaussian noise and known dynamics. We applied our boosting method to the non-optimal controller with learned parameters (Control-WL), and we observe that boosting improves its loss and achieves near-optimal performance (here the optimal controller is given by the fixed LQR solution). We have tested an LDS of different dimensions d = 1,10,100, and averaged results over multiple runs.

Correlated noise experiment

When the disturbances are not independently drawn, the LQR controller is not guaranteed to perform optimally. We experimented with two LDS settings with correlated disturbances in which (a) the disturbances w_t are generated from a Gaussian random-walk, and (b) where they are generated by a sine function applied to the time index. In these cases, boosted controllers perform better compared to the “weak” learned controller, and can also outperform the fixed LQR solution. We have also tested a Recurrent Neural Network, and observed that boosting is effective for RNNs as well.

Inverted Pendulum experiment

A more challenging experiment with a non-linear dynamical system is the Inverted Pendulum experiment. This is very similar to the pole balancing example we discussed above, and is a standard benchmark for control methods. The goal is to balance the inverted pendulum by applying torque that will stabilize it in a vertically upright position, in the presence of noise. In our experiments, we used correlated disturbances from a Gaussian random-walk. We follow the dynamics implemented in OpenAI Gym, and test the performance of different controllers: LQR, a learned controller, and boosting. The video below visualizes this experiment:

When averaging the loss value over multiple experiment runs, we get the following plot:

It can be seen that the learned controller performs much better than the LQR in the presence of correlated noise, and that boosting can improve its stability and achieve lower average loss.

Boosting for Time-Series Prediction

Similarly to the control setting, in time-series prediction tasks it is sufficient to use fixed horizons, and online boosting can be efficiently applied here as well. In time-series prediction, the data is often assumed to be generated from an autoregressive moving average (ARMA) model:

x_{t} = \sum_{i=1}^k \alpha_i x_{t-i} + \sum_{j=1}^q \beta_j w_j + w_t

In words, each data point x_t is given by a weighted sum of previous points, previous noises and a new noise term w_t, where \alpha,\beta are the coefficients vectors.

To test our boosting method, We experimented with 4 simulated settings: 1) normally distributed noises, 2) coefficients of the dynamical system slowly change over time, 3) a single, abrupt, change of the coefficients, and, 4) correlated noise: Gaussian random walk.

The weak learners tested here are the ARMA-ONS (online newton step) and ARMA-OGD (online gradient descent) algorithms for time-series prediction (See this paper for more details). We applied our boosting method, as well as a fast version of it, which applies to quadratic loss functions (we used squared difference in this case).

1) Gaussian Noise 2) Changing Coefficients

3) Abrupt Change 4) Correlated Noise

We can see in the plots above that all weak learners’ loss values (red) can be improved by online boosting methods (blue). A similar observation arises when experimenting with real-world data; we experimented with the Air Quality dataset from the UCI Machine Learning repository, that contains hourly averaged measurements of air quality properties from an Italian city throughout one year, as measured by chemical sensors. We apply similar weak learners to this task, as well as our boosting algorithms. Here we again obtain better averaged losses for boosted methods (blue) compared to the baselines (red).

by Elad Hazan at July 17, 2019 02:24 PM UTC

This post about the Rieman zeta function, among the most important and mysterious mathematical objects is kindly written by Dan Romik. It is related to his paper Orthogonal polynomial expansions for the Riemann xi function,  that we mentioned in this post.

Dan Romik on the Riemann zeta function

Recently when I was thinking about the Riemann zeta function, I had the double thrill of discovering some new results about it, and then later finding out that my new ideas were closely related to some very classical ideas due to two icons of twentieth-century mathematics, George Pólya and Pál Turán. When you are trying to stand on the shoulders of giants, it’s nice to see other giants right there beside you trying to do the same!

It all goes back to one of the most famous problems in mathematics, the Riemann Hypothesis (RH). Both Pólya and Turán were rather enamored with this problem and published about it extensively; Pólya was said to have been preoccupied with the problem to the very end of his life.(1) And they both recognized that an important first step in trying to prove something about the zeros of the zeta function is having a good representation
for the Riemann zeta function. After all, there are many different formulas that can be used to define or compute the zeta function. If you don’t choose the right one, you probably won’t get very far with your analysis.

Pólya in one of his famous attacks on the problem considered the representation of the zeta function (or more precisely of the Riemann xi function, which is a symmetrized and better-behaved version of the zeta function; see below) as a Fourier transform—a standard representation due (essentially) to Riemann. I’ll have more to say about that later.

Turán also looked at the Riemann xi function, and instead of working with one of the standard “named” representations such as the Fourier transform or Taylor series, looked around a bit more intentionally for a representation of the function that seemed particularly suited to answering the specific question of whether the zeros all lie on a line. In a 1950 address to the Hungarian Academy of Sciences, he put forward his ideas about what he thought was the correct representation to look at: the infinite series expansion of the xi function in the Hermite polynomials. About eighty years after Turán’s discovery, my own investigations led me to discover [5] that the Hermite polynomials are not the only polynomials in which it’s interesting to expand the Riemann xi function. It turns out that there are at least two other families of polynomials for which the respective expansions are no less (and, in some ways, more) well-behaved. My motto for these polynomial families, which are known to experts in special functions but have until now been somewhat esoteric (though I hope that is about to change), is that they are “the coolest polynomials that you never heard of.”

Let’s look at some of the technical details so that I can explain why these new expansions are interesting, and how they relate to Turán’s work and ultimately back to Pólya’s ideas and one of the particular threads that grew out of them. First, define the Riemann xi function as

\displaystyle \xi(s) = \frac12 s(s-1) \pi^{-s/2} \Gamma\left(\frac{s}{2}\right) \zeta(s) \qquad (s\in\mathbb{C}),

where {\Gamma(z)} is the Euler gamma function and {\zeta(s)} is the Riemann zeta function. It’s also common to denote
\displaystyle \Xi(t) = \xi\left(\frac12+it\right) \qquad (t\in\mathbb{C}).

This is Riemann’s “capital xi” function, which is still usually referred to as Riemann’s xi function. (This seems reasonable: the two functions are the same up to a trivial linear change of variables.) The main point of these definitions is that {\Xi(t)} is an entire function of the complex variable {t}, and that RH can now be reformulated as the statement that the zeros of {\Xi(t)} all lie on the real line. Moreover, the famous functional equation satisfied by the Riemann zeta function maps to the statement that {\Xi(t)} is an even function.
Now consider the following four ways of representing the xi function:

\displaystyle \Xi(t) = \sum_{n=0}^\infty (-1)^n a_{2n} t^{2n},~~~~~(1)

\displaystyle \Xi(t) = \sum_{n=0}^\infty (-1)^n b_{2n} H_{2n}(t),~~~~~(2)

\displaystyle\Xi(t) = \sum_{n=0}^\infty (-1)^n c_{2n} f_{2n}\left(\frac{t}{2}\right),~~~~~(3)

\displaystyle \Xi(t) = \sum_{n=0}^\infty (-1)^n d_{2n} g_{2n}\left(\frac{t}{2}\right).~~~~~(4)

Here, the first representation (1) is simply the Taylor expansion of {\Xi(t)}, which contains only even terms since {\Xi(t)} is an even function. The numbers {a_{2n}} are (up to the {(-1)^n} sign factor) the Taylor coefficients. Some attempts have been made to understand them, and one interesting and fairly trivial observation (again going back to facts already known to Riemann) is that they are all positive. Some additional and less trivial things can be said—see for example Section 6.1 of my paper [5], and the recent paper by Griffin, Ono, Rolen and Zagier [2]. But at the end of the day, no one has yet succeeded in using the Taylor expansion to prove anything new about the location of the zeros.

The second representation (2) is the infinite series expansion of {\Xi(t)} in the classical sequence of Hermite polynomials, defined by the well-known formula

\displaystyle H_n(t) = (-1)^n e^{t^2} \frac{d^n}{dt^n} \left( e^{-t^2} \right).

This is the representation whose use was advocated by Turán. His reasoning was that expanding a function of a complex variable (for example, in the simplest case, a polynomial) in monomials {t^n} doesn’t provide useful information to easily decide if the function has only real zeros, because the monomials have, roughly speaking, radial symmetry: their level curves are concentric circles. The Hermite polynomials on the other hand, at least heuristically, have level curves that are closer to being straight lines parallel to the real axis, Turán argued; thus, they are more suited to the geometry of the problem we are trying to solve.

Turán’s case for supporting the Hermite polynomials as the right basis to use is quite detailed—you can read about it in his papers [6,7,8] (and no, he was not able to actually prove anything about the zeros of {\Xi(t)}; this is a common theme in most of the attacks on RH to date…). I’ll simply mention that again one interesting and fairly easy observation is that the coefficients {b_{2n}} in the expansion (2)—adjusted through the introduction of the sign factor {(-1)^n}—end up being positive numbers. Their asymptotic behavior can also be analyzed: I prove a result about this in my paper (though it’s not particularly pretty).

Now comes the part that to me seems the most exciting, involving the expansions (3) and (4). These are the expansions in the more exotic families of polynomials

\displaystyle f_n(x)=(-i)^n \sum_{k=0}^n 2^k\binom{n+\frac12}{n-k}\binom{-\frac34+ix}{k},

\displaystyle g_n(x)= (-i)^n \sum_{k=0}^n \frac{(n+k+1)!}{(n-k)!(3/2)_k^2} \binom{-\frac34+ix}{k}

(where {(3/2)_n} is a Pochhammer symbol), mildly rescaled by replacing {x}  with {t/2}. In the terminology of the theory of orthogonal polynomials, the family {f_n(x)} is a special case of a two-parameter family {P_n^{(\lambda)}(x;\phi)} known as the Meixner-Pollaczek polynomials, with the parameters taking the particular values {\phi=\frac{\pi}{2}, \lambda=\frac{3}{4}}. Similarly, the family {g_n(x)} is a special case of the four-parameter family {p_n(x;a,b,c,d)} known as the continuous Hahn polynomials, with the parameters taking the particular values {a=b=c=d=\frac{3}{4}}. Their main characterizing property is that they are orthogonal sequences of polynomials for two specific weight functions on {\mathbb{R}}: the {f_n(x)} are orthogonal with respect to the weight function {w_1(x)=\left|\Gamma\left(\frac34+ix\right)\right|^2}, and the {g_n(x)} are orthogonal with respect to {w_2(x)=\left|\Gamma\left(\frac34+ix\right)\right|^4}. Again, fairly esoteric. But interesting!

There are several things that make the expansions (3)–(4) well-behaved. First, the coefficients {c_{2n}}, {d_{2n}} are again positive. This actually seems pretty relevant for questions like RH: for example, if we consider “toy” versions of (1)–(3) in which the coefficient sequences {a_n}, {b_n} and {c_n} are replaced by the sequence {\alpha^n} for fixed {0<\alpha<1}, all three expansions sum up to rescaled cosines, which are entire functions that of course have only real zeros. (Without the {(-1)^n} factor, we would get a hyperbolic cosine, which has imaginary zeros.)

Second, one can derive asymptotics for {c_{2n}} and {d_{2n}}, and they are quite a bit nicer than the asymptotic formulas for the Taylor and Hermite expansion coefficients. In my paper, I proved that \displaystyle c_{2n} \sim A \sqrt{n} e^{-B \sqrt{n}}, \qquad d_{2n} \sim C n^{4/3} e^{-D n^{2/3}} \qquad \textrm{as }n\rightarrow\infty, where {A,B,C,D} are the constants \displaystyle A = 16\sqrt{2}\pi^{3/2}, \qquad B = 4\sqrt{\pi}, \qquad C = \frac{128 \times 2^{1/3} \pi^{2/3} e^{-2\pi /3}}{\sqrt{3}}, \qquad D = 3 (4\pi)^{1/3}.

Third, the expansions have some conceptual meaning: (3) turns out to be equivalent to the expansion of the elementary function {\frac{d^2}{du^2} (u \coth(\pi u))}, {u>0}, in an orthogonal basis of functions related to the Laguerre polynomials {L_n^{1/2}(x)}. And analogously, (4) arises out of the expansion of a certain auxiliary function {\tilde{\nu}(u)} (I won’t define it here) in yet another classical family of orthogonal polynomials, the Chebyshev polynomials of the second kind.

Fourth (and fifth, sixth, …): the expansions are just… nice, in the sense that they arise in a way that seems natural when one asks certain questions, that they have excellent convergence properties, and that the coefficients {c_{2n}} and {d_{2n}} have several elegant formulas, each revealing something interesting about them. Read the paper to understand more.

I said I will get back to Pólya’s work on RH. This post is already quite long so I will say only a little bit about this. One of Pólya’s major discoveries was that there are operations on entire functions that (under certain mild assumptions) preserve the property of the function having only real zeros. Specifically this is the case for the operation of multiplying the Fourier transform of the function by the factor {e^{\lambda u^2}} for {\lambda>0}  (where {u} is the frequency variable). This opens the way to defining a family of deformations {\Xi_\lambda(t)} of the Riemann xi function arising out of this operation, and trying to generalize RH by asking for which values of {\lambda} it is the case that {\Xi_\lambda(t)} has only real zeros. Since Pólya’s work, and important later extensions of it by De Bruijn and Newman, this has become a very active topic of research, nowadays referred to under the name of the De Bruijn-Newman constant.
See the recent survey of Newman and Wu [3], a 2018 paper by Rodgers and Tao [4] proving a major conjecture of Newman, and the recent paper [9] by the Polymath15 project (mentioned by Gil in his earlier post), for the latest on this subject.

The connection I found between this topic and the idea of expanding the Riemann xi function in families of orthogonal polynomials is the following: expansions such as (2)–(4) suggest yet another natural way of “deforming” the Riemann xi function by adding a parameter {\alpha}: simply multiply the {n}th term in the expansion by {\alpha^{2n}} (the linear operator that does this is called the Poisson kernel, and generalizes the standard Poisson kernel from complex analysis and the theory of harmonic functions). It turns out—and is actually easy to prove, and really isn’t terribly surprising in the grand scheme of things—that in the case of the Hermite expansion (2), this family of deformations is the same, up to some trivial reparametrization, as the family of deformations {\Xi_\lambda(t)} that was studied in connection with the work of Pólya, De Bruijn, Newman and their successors. A nice connection between two threads of research that were not previously recognized as being related to each other, I think. Furthermore, this suggests that the Poisson kernel and associated deformations may yet have an important role to play in the context of the new expansions in the orthogonal polynomial families {f_n} and {g_n}, where we get genuinely new families of deformations of the Riemann xi function. I explore this idea in my paper and it leads to some interesting things.

So let’s summarize. The key questions you are no doubt wondering about are: where does any of this lead? And do these new ideas say anything really useful or especially relevant for the Riemann hypothesis? The answer is that I don’t know (and I’m wondering about the same things). That being said, these orthogonal polynomial expansions seem quite interesting in their own right. The Riemann zeta function is a mysterious object, and there are other things we wish to understand about it beside where its zeros are, so it’s always good to have additional points of view from which to approach it. Moreover, even on the question of the zeros there are reasons to be cautiously optimistic that this approach may have something useful to offer; see Chapter 7 of my paper for a brief discussion of why that is the case.


[1] D. Albers and G. L. Alexanderson, editors. Mathematical People: Profiles and Interviews. A K Peters, 2008.

[2] M. Griffin, K. Ono, L. Rolen and D. Zagier. Jensen polynomials for the Riemann zeta function and other sequences. Preprint (2019), arXiv:1902.07321.

[3] C. M. Newman. Constants of de Bruijn-Newman type in analytic number theory and statistical physics. To appear in Bull. Amer. Math. Soc.

[4] B. Rodgers and T. Tao. The De Bruijn-Newman constant is nonnegative. Preprint (2018), arXiv:1801.05914.

[5] D. Romik. Orthogonal polynomial expansions for the Riemann xi function. Preprint (2019), arXiv:1902.06330.

[6] P. Turán. Sur l’algèbre fonctionelle. Pages 279–290 in: Comptes Rendus du Premier Congrès des Mathématiciens Hongrois, 27 Août–2 Septembre 1950. Akadémiai Kiadó, 1952. Reprinted in Collected Papers of Paul Turán, Ed. P. Erdos, Vol. 1, pp. 677–688. Akadémiai Kiadó, 1990. An English translation of the paper by Dan Romik On functional algebra.

[7] P. Turán. Hermite-expansion and strips for zeros of polynomials. Arch. Math. 5 (1954), 148–152. Reprinted in Collected Papers of Paul Turán, Ed. P. Erdos, Vol. 1, pp. 738–742. Akadémiai Kiadó, 1990.

[8]  P. Turán. To the analytical theory of algebraic equations. Bulgar. Akad. Nauk. Otd. Mat. Fiz. Nauk. Izv. Mat. Inst. 3 (1959), 123–137. Reprinted in Collected Papers of Paul Turán, Ed. P. Erdos, Vol. 2, pp. 1080–1090. Akadémiai Kiadó, 1990.

[9] D.H.J. Polymath. Effective approximation of heat flow evolution of the Riemann ξ function, and a new upper bound for the de Bruijn-Newman constant. Preprint (2019), arXiv:1904.12438.


(1) Alexanderson writes in [1, p. 259]: “A week or so before he died, Pólya asked me to look on his desk at home for some papers on which he said he had written down some interesting ideas he had for proving RH. Of course I could find no such notes, but until the day he died he was thinking about that famous problem.”


(2) Turán’s Hungarian Academy of Sciences talk was published in a rather obscure French-language paper [6] that seems to have been largely forgotten. It’s an interesting read nonetheless, and to make it more accessible to anyone who may be interested, I recently translated it to English.


(3) Turán mentions in [8] that he discovered the results on the Hermite expansion in 1938–39, but they were not published until much later. Clearly this was not a convenient time in history for publishing such discoveries; Turán, a Hungarian Jew, spent much of World War II interned in labor camps in Hungary.

by Gil Kalai at July 17, 2019 06:22 AM UTC

Some formative books in mathematics and computing theory

LSE source: “Calculus on Clay?”

Norman Biggs is the author of the wonderful book Algebraic Graph Theory. Both Ken and I read it long ago, and both of us have it out now because of its relevance to Hao Huang’s beautiful short proof of the Boolean Sensitivity Conjecture.

Today we wish to ask, What are your top five favorite books on mathematics and theory for summer reading?

There’s an aporia in that question. A working definition of aporia is: “a self-contradiction that isn’t.” The point is that books for summer reading should be new, so how would you already know which are your favorites? Well, we are thinking of books that are so rich you can always find new things in them—and that also played formative roles earlier in our careers.

Ken knew Biggs during his first year at Oxford when Biggs was visiting there from London. He took part in a weekly sitting-room seminar organized by Peter Neumann. Biggs’s book was a central reference for Ken’s undergraduate senior thesis at Princeton, and both he and Ken presented material based on it.

Best Five Books—Dick

Here are my votes for all-time best books in mathematics and in computer science theory.

{\bullet } Algebraic Graph Theory, by Norman Biggs. A wonderful book. First appeared in 1974.

{\bullet } An Introduction to Probability Theory and Its Applications, Vol. 1, by William Feller. This is the book I used to learn probability theory.

{\bullet } An Introduction to the Theory of Numbers, by Godfrey Hardy and Edward Wright. Now updated by Andrew Wiles, Roger Heath-Brown, and Joseph Silverman.

{\bullet } Elements of Number Theory, by Ivan Vinogradov. Another small book that is loaded with ideas.

{\bullet } The Art of Counting, by Paul Erdős and Joel Spencer. This book changed my life. Today the book is of course The Probabilistic Method, by Noga Alon and Joel Spencer.

Best Five Books—Ken

Ken reaches back to his teen years but it’s still the same span of years as my list. Here he tells it:

{\bullet} All books by Martin Gardner—in particular, the books of collections of his “Mathematical Games” columns in Scientific American. Here is an overview.

{\bullet} Scarne on Dice and Scarne on Cards. Originally it was neither of these books—nor John Scarne’s Complete Guide to Gambling—but a different book on in which both Scarne and Gardner figured prominently. Alas I, Ken, cannot trace it. That’s what I used to learn probability theory.

{\bullet} Spectra of Graphs, by Dragoš Cvetković, Michael Doob, and Horst Sachs. I could put Biggs’s book here, but this is the one that got me on to the whole subject just before my senior year at Princeton. It was fresh out in 1980—I recall the tactile sensation of the dark green spanking new cover in the Fine Hall Library’s copy. A great book with pictures and algebra.

{\bullet} Ideals, Varieties, and Algorithms, by David Cox, John Little, and Donal O’Shea. Fast forward to 1997. Having realized that techniques from algebraic geometry could surmount the “Natural Proofs” barrier (see also GCT), I went whole-hog after it. See “Manic Monomials” in this post for one thing that tripped it up. The book remains incredibly stimulating. It has a sequel, Using Algebraic Geometry.

{\bullet} Quantum Computation and Quantum Information by Michael Nielsen and Isaac Chuang. As with Hardy and Wright, it has its own Wikipedia page. Dick and I can say this is nominating a competitor, but Chaung & Nielsen is really in a class by itself for the sheer richness and writing style. One odd mark of its influence: In 2006 when I reacted to the sensational and frightening accusations of cheating at the world championship match, my first thought was to apply distributional distance measures of the kind used in its later chapters. Among such measures is (quantum) fidelity, and although I focused more on Jensen-Shannon divergence before deciding on simpler stuff, my chess research website retains “fidelity” in its name as part of a multi-way reference to FIDE, faith, and playing in good faith.

Open Problems

What books most influenced you? What are your votes for the best books that might influence others?

by RJLipton+KWRegan at July 17, 2019 04:26 AM UTC

(I will post the solution to the problem in the last blog later in the week---probably Thursday. Meanwhile, enjoy these thoughts from Samir Khuller on the TCS Women 2019 meeting.)

Guest Post by Samir Khuller:

Am I even allowed here?” was the first thought that crossed my mind when I entered the room. It was packed with women (over 95%), however a few minutes later, several men had trickled in. I was at the TCS Women spotlight workshop on the day before STOC. Kudos to Barna Saha, Sofya Raskhodnikova, and Virginia Vassilevska Williams for putting this grand (and long needed) event together, which serves as a role model and showcases some of the recent work by rising stars. In addition to the Sun afternoon workshop, the event was followed by both an all women panel and a poster session (which I sadly did not attend).

The rising stars talks were given by Naama Ben-David (CMU), Andrea Lincoln (MIT), Debarati Das (Charles University) and Oxana Poburinnaya (Boston U). After a short break the inspirational talk was by Ronitt Rubinfeld from MIT.  Ronitt’s talk was on the topic of Program Checking, but she made it inspirational by putting us in her shoes as a young graduate student, three decades back, trying to make a dent in research by working on something that her advisor Manuel Blum, and his senior graduate student Sampath Kannan had been working on, and I must say she made a pretty big dent in the process! She also related those ideas to other pieces of work done since in a really elegant manner and how these pieces of work lead to work on property testing.

I am delighted to say that NSF supported the workshop along with companies such as Amazon, Akamai, Google and Microsoft. SIGACT plans to be a major sponsor next year.

The Full program for the workshop is at the following URLhere.

by GASARCH ( at July 16, 2019 11:38 PM UTC

A $k$-uniform hypergraph is said to be $r$-rainbow colorable if there is an $r$-coloring of its vertices such that every hyperedge intersects all $r$ color classes. Given as input such a hypergraph, finding a $r$-rainbow coloring of it is NP-hard for all $k \ge 3$ and $r \ge 2$. Therefore, one settles for finding a rainbow coloring with fewer colors (which is an easier task). When $r=k$ (the maximum possible value), i.e., the hypergraph is $k$-partite, one can efficiently $2$-rainbow color the hypergraph, i.e., $2$-color its vertices so that there are no monochromatic edges. In this work we consider the next smaller value of $r=k-1$, and prove that in this case it is NP-hard to rainbow color the hypergraph with $q := \lceil \frac{k-2}{2} \rceil$ colors. In particular, for $k \le 6$, it is NP-hard to $2$-color $(k-1)$-rainbow colorable $k$-uniform hypergraphs. Our proof follows the algebraic approach to promise constraint satisfaction problems. It proceeds by characterizing the polymorphisms associated with the approximate rainbow coloring problem, which are rainbow colorings of some product hypergraphs on vertex set $[r]^n$. We prove that any such polymorphism $f: [r]^n \to [q]$ must be $C$-fixing, i.e., there is a small subset $S$ of $C$ coordinates and a setting $a \in [q]^S$ such that fixing $x_{|S} = a$ determines the value of $f(x)$. The key step in our proof is bounding the sensitivity of certain rainbow colorings, thereby arguing that they must be juntas. Armed with the $C$-fixing characterization, our NP-hardness is obtained via a reduction from smooth Label Cover.

at July 16, 2019 01:19 AM UTC

We prove that for every constant $c$ and $\epsilon = (\log n)^{-c}$, there is no polynomial time algorithm that when given an instance of 3LIN with $n$ variables where an $(1 - \epsilon)$-fraction of the clauses are satisfiable, finds an assignment that satisfies at least $(\frac{1}{2} + \epsilon)$-fraction of clauses unless $\mathbf{NP} \subseteq \mathbf{BPP}$. The previous best hardness using a polynomial time reduction achieves $\epsilon = (\log \log n)^{-c}$, which is obtained by the Label Cover hardness of Moshkovitz and Raz [J. ACM, 57(5), 2010] followed by the reduction from Label Cover to 3LIN of Hastad [J. ACM, 48(4):798--859, 2001]. Our main idea is to prove a hardness result for Label Cover similar to Moshkovitz and Raz where each projection has a linear structure. This linear structure of Label Cover allows us to use Hadamard codes instead of long codes, making the reduction more efficient. For the hardness of Linear Label Cover, we follow the work of Dinur and Harsha [SIAM J. Comput., 42(6):2452--2486, 2013] that simplified the construction of Moshkovitz and Raz, and observe that running their reduction from a hardness of the problem LIN (of unbounded arity) instead of the more standard problem of solving quadratic equations ensures the linearity of the resultant Label Cover.

at July 16, 2019 01:10 AM UTC

by David Eppstein at July 15, 2019 09:57 PM UTC