Two postdoctoral positions, each for one year renewable to a second, are available to work in Luca Trevisan’s group at Bocconi University on topics related to average-case analysis of algorithms, approximation algorithms, and combinatorial constructions.
The positions have a very competitive salary and relocation benefits. Funding for travel is available.
Website: https://www.unibocconi.eu/wps/wcm/connect/3ecc85da-ac66-46ac-9db7-09ac0ef9b715/Call-2ADR-01B1-Bidsa-Erc.pdf?MOD=AJPERES&CVID=mUjaHAv
Email: L.Trevisan@unibocconi.it
I am recruiting for two postdoctoral positions, each for one year renewable to a second, to work with me at Bocconi University on topics related to average-case analysis of algorithms, approximation algorithms, and combinatorial constructions.
The positions have a very competitive salary and relocation benefits. Funding for travel is available.
Application information is at this link. The deadline is December 15. If you apply, please also send me an email (L.Trevisan at unibocconi.it) to let me know.
Computer Science and Engineering at the University of Michigan currently invites applications for multiple tenure-track and teaching faculty (lecturer) positions. We seek exceptional candidates at all levels in all areas across computer science and computer engineering. We also have a targeted search for an endowed professorship in theoretical computer science (the Fischer Chair).
Website: https://cse.engin.umich.edu/about/faculty-hiring/
Email: kuipers@umich.edu
Authors: Philipp Bamberger, Fabian Kuhn, Yannic Maus
Download: PDF
Abstract: We show that the $(degree+1)$-list coloring problem can be solved
deterministically in $O(D \cdot \log n \cdot\log^3 \Delta)$ in the CONGEST
model, where $D$ is the diameter of the graph, $n$ the number of nodes, and
$\Delta$ is the maximum degree. Using the network decomposition algorithm from
Rozhon and Ghaffari this implies the first efficient deterministic, that is,
$\text{poly}\log n$-time, CONGEST algorithm for the $\Delta+1$-coloring and the
$(degree+1)$-list coloring problem. Previously the best known algorithm
required $2^{O(\sqrt{\log n})}$ rounds and was not based on network
decompositions.
Our results also imply deterministic $O(\log^3 \Delta)$-round algorithms in MPC and the CONGESTED CLIQUE.
Authors: Xi Li, Mingyou Wu, Hanwu Chen
Download: PDF
Abstract: In this work, we consider the application of continuous time quantum
walking(CTQW) to the Maximum Clique(MC) Problem. Performing CTQW on graphs will
generate distinct periodic probability amplitude for different vertices. We
will show that the intensity of the probability amplitude at frequency indeed
implies the clique structure of some special kinds of graph. And recursive
algorithms with time complexity $O(N^5)$ in classical computers for finding the
maximum clique are proposed. We have experimented on random graphs where each
edge exists with probabilities 0.3, 0.5 and 0.7. Although counter examples are
not found for random graphs, whether these algorithms are universal is not
known to us.
Authors: Peter Bubenik, Alex Elchesen
Download: PDF
Abstract: We undertake a formal study of persistence diagrams and their metrics. We
show that barcodes and persistence diagrams together with the bottleneck
distance and the Wasserstein distances are obtained via universal constructions
and thus have corresponding universal properties. In addition, the
1-Wasserstein distance satisfies Kantorovich-Rubinstein duality. Our
constructions and results apply to any metric space with a distinguished
basepoint. For example, they can also be applied to multiparameter persistence
modules.
Authors: Ondřej Benedikt, István Módos, Zdeněk Hanzálek
Download: PDF
Abstract: This paper addresses a single machine scheduling problem with non-preemptive
jobs to minimize the total electricity cost. Two latest trends in the area of
the energy-aware scheduling are considered, namely the variable energy pricing
and the power-saving states of a machine. Scheduling of the jobs and the
machine states are considered jointly to achieve the highest possible savings.
Although this problem has been previously addressed in the literature, the
reported results of the state-of-the-art method show that the optimal solutions
can be found only for instances with up to 35 jobs and 209 intervals within 3
hours of computation. We propose an elegant pre-processing technique called
SPACES for computing the optimal switching of the machine states with respect
to the energy costs. The optimal switchings are associated with the shortest
paths in an interval-state graph that describes all possible transitions
between the machine states in time. This idea allows us to implement efficient
integer linear programming and constraint programming models of the problem
while preserving the optimality. The efficiency of the models lies in the
simplification of the optimal switching representation. The results of the
experiments show that our approach outperforms the existing state-of-the-art
exact method. On a set of benchmark instances with varying sizes and different
state transition graphs, the proposed approach finds the optimal solutions even
for the large instances with up to 190 jobs and 1277 intervals within an hour
of computation.
Authors: Thomas Fernique
Download: PDF
Abstract: We consider circle packings in the plane with circles of sizes $1$, $r\simeq
0.834$ and $s\simeq 0.651$. These sizes are algebraic numbers which allow a
compact packing, that is, a packing in which each hole is formed by three
mutually tangent circles. Compact packings are believed to maximize the density
when there are possible. We prove that it is indeed the case for these sizes.
The proof should be generalizable to other sizes which allow compact packings
and is a first step towards a general result.
Authors: Benjamin Coleman, Anshumali Shrivastava
Download: PDF
Abstract: Kernel density estimation is a simple and effective method that lies at the
heart of many important machine learning applications. Unfortunately, kernel
methods scale poorly for large, high dimensional datasets. Approximate kernel
density estimation has a prohibitively high memory and computation cost,
especially in the streaming setting. Recent sampling algorithms for high
dimensional densities can reduce the computation cost but cannot operate
online, while streaming algorithms cannot handle high dimensional datasets due
to the curse of dimensionality. We propose RACE, an efficient sketching
algorithm for kernel density estimation on high-dimensional streaming data.
RACE compresses a set of N high dimensional vectors into a small array of
integer counters. This array is sufficient to estimate the kernel density for a
large class of kernels. Our sketch is practical to implement and comes with
strong theoretical guarantees. We evaluate our method on real-world
high-dimensional datasets and show that our sketch achieves 10x better
compression compared to competing methods.
Authors: Jeff Erickson, Ivor van der Hoog, Tillmann Miltzow
Download: PDF
Abstract: We propose a new paradigm for robust geometric computations that complements
the classical fixed precision paradigm and the exact geometric computation
paradigm. We provide a framework where we study algorithmic problems under
smoothed analysis of the input, the relaxation of the problem requirements, or
the witness of a recognition problem. Our framework specifies a widely
applicable set of prerequisites that make real RAM algorithms suitable for
smoothed analysis. We prove that suitable algorithms can (under smoothed
analysis) be robustly executed with expected logarithmic bit-precision. This
shows in a formal way that inputs which need high bit-precision are contrived
and that these algorithms are likely robust for realistic input. Interestingly
our techniques generalize to problems with a natural notion of resource
augmentation (geometric packing, the art gallery problem) and recognition
problems (recognition of realizable order types or disk intersection graphs).
Our results also have theoretical implications for some ER-hard problems: These problems have input instances where their real verification algorithm requires at least exponential bit-precision which makes it difficult to place these ER-hard problems in NP. Our results imply for a host of ER-complete problems that this exponential bit-precision phenomenon comes from nearly degenerate instances.
It is not evident that problems that have a real verification algorithm belong to ER. Therefore, we conclude with a real RAM analogue to the Cook-Levin Theorem. This gives an easy proof of ER-membership, as real verification algorithms are much more versatile than ETR-formulas.
Authors: P. Mirabal, J. Abreu, D. Seco
Download: PDF
Abstract: Strings are a natural representation of biological data such as DNA, RNA and
protein sequences. The problem of finding a string that summarizes a set of
sequences has direct application in relative compression algorithms for genome
and proteome analysis, where reference sequences need to be chosen. Median
strings have been used as representatives of a set of strings in different
domains. However, several formulations of those problems are NP-Complete.
Alternatively, heuristic approaches that iteratively refine an initial coarse
solution by applying edit operations have been proposed. Recently, we
investigated the selection of the optimal edit operations to speed up
convergence without spoiling the quality of the approximated median string. We
propose a novel algorithm that outperforms state of the art heuristic
approximations to the median string in terms of convergence speed by estimating
the effect of a perturbation in the minimization of the expressions that define
the median strings. We present corpus of comparative experiments to validate
these results.
Authors: Tianhao Wang, Min Xu, Bolin Ding, Jingren Zhou, Cheng Hong, Zhicong Huang, Ninghui Li, Somesh Jha
Download: PDF
Abstract: When collecting information, local differential privacy (LDP) alleviates
privacy concerns of users because their private information is randomized
before being sent to the central aggregator. However, LDP results in loss of
utility due to the amount of noise that is added to each individual data item.
To address this issue, recent work introduced an intermediate server with the
assumption that this intermediate server did not collude with the aggregator.
Using this trust model, one can add less noise to achieve the same privacy
guarantee; thus improving the utility.
In this paper, we investigate this multiple-party setting of LDP. We first analyze the threat model and identify potential adversaries. We then make observations about existing approaches and propose new techniques that achieve a better privacy-utility tradeoff than existing ones. Finally, we perform experiments to compare different methods and demonstrate the benefits of using our proposed method.
The Department of Computer Science at The George Washington University invites applications for two tenure track positions at the Assistant, Associate or Full Professor level, beginning as early as Fall 2020. One position focuses on Machine Learning and related areas; the other position welcomes all areas of theoretical and applied computer science.
Website: https://www.gwu.jobs/postings/72053
Email: cssearch@gwu.edu
We hit the mother-lode of property testing papers this month. Stick with us, as we cover 10 (!) papers that appeared online in November.
Testing noisy linear functions for sparsity, by Xue Chen, Anindya De, and Rocco A. Servedio (arXiv). Given samples from a noisy linear model \(y = w\cdot x + \mathrm{noise}\), test whether \(w\) is \(k\)-sparse, or far from being \(k\)-sparse. This is a property testing version of the celebrated sparse recovery problem, whose sample complexity is well-known to be \(O(k\log n)\), where the data lies in \(\mathbb{R}^n\). This paper shows that the testing version of the problem can be solved (tolerantly) with a number of samples independent of \(n\), assuming technical conditions: the distribution of coordinates of \(x\) are i.i.d. and non-Gaussian, and the noise distribution is known to the algorithm. Surprisingly, all these conditions are needed, otherwise the dependence on \(n\) is \(\tilde \Omega(\log n)\), essentially the same as the recovery problem.
Pan-Private Uniformity Testing, by Kareem Amin, Matthew Joseph, Jieming Mao (arXiv). Differentially private distribution testing has now seen significant study, in both the local and central models of privacy. This paper studies a distribution testing in the pan-private model, which is intermediate: the algorithm receives samples one by one in the clear, but it must maintain a differentially private internal state at all time steps. The sample complexity turns out to be qualitatively intermediate to the two other models: testing uniformity over \([k]\) requires \(\Theta(\sqrt{k})\) samples in the central model, \(\Theta(k)\) samples in the local model, and this paper shows that \(\Theta(k^{2/3})\) samples are necessary and sufficient in the pan-private model.
Almost Optimal Testers for Concise Representations, by Nader Bshouty (ECCC). This work gives a unified approach for testing for a plethora of different classes which possess some sort of sparsity. These classes include \(k\)-juntas, \(k\)-linear functions, \(k\)-terms, various types of DNFs, decision lists, functions with bounded Fourier degree, and much more.
Unified Sample-Optimal Property Estimation in Near-Linear Time, by Yi Hao and Alon Orlitsky (arXiv). This paper presents a unified approach for estimating several distribution properties with both near-optimal time and sample complexity, based on piecewise-polynomial approximation. Some applications include estimators for Shannon entropy, power sums, distance to uniformity, normalized support size, and normalized support coverage. More generally, results hold for all Lipschitz properties, and consequences include high-confidence property estimation (outperforming the “median trick”) and differentially private property estimation.
Testing linear-invariant properties, by Jonathan Tidor and Yufei Zhao (arXiv). This paper studies property testing of functions which are in a formal sense, definable by restrictions to subspaces of bounded degree. This class of functions is a broad generalization of testing whether a function is linear, or a degree-\(d\) polynomial (for constant \(d\)). The algorithm is the oblivious one, which simply repeatedly takes random restrictions and tests whether the property is satisfied or not (similar to the classic linearity test of BLR, along with many others).
Approximating the Distance to Monotonicity of Boolean Functions, by Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, Erik Waingarten (ECCC). This paper studies the following fundamental question in tolerant testing: given a Boolean function on the hypercube, test whether it is \(\varepsilon’\)-close or \(\varepsilon\)-far from monotone. It is shown that there is a non-adaptive polynomial query algorithm which can solve this problem for \(\varepsilon’ = \varepsilon/\tilde \Theta(\sqrt{n})\), implying an algorithm which can approximate distance to monotonicity up to a multiplicative \(\tilde O(\sqrt{n})\) (addressing an open problem by Sesh). They also give a lower bound demonstrating that improving this approximating factor significantly would necessitate exponentially-many queries. Interestingly, this is proved for the (easier) erasure-resilient model, and also implies lower bounds for tolerant testing of unateness and juntas.
Testing Properties of Multiple Distributions with Few Samples, by Maryam Aliakbarpour and Sandeep Silwal (arXiv). This paper introduces a new model for distribution testing. Generally, we are given \(n\) samples from a distribution which is either (say) uniform or far from uniform, and we wish to test which is the case. The authors here study the problem where we are given a single sample from \(n\) different distributions which are either all uniform or far from uniform, and we wish to test which is the case. By additionally assuming a structural condition in the latter case (it is argued that some structural condition is necessary), they give sample-optimal algorithms for testing uniformity, identity, and closeness.
Random Restrictions of High-Dimensional Distributions and Uniformity Testing with Subcube Conditioning, by Clément L. Canonne, Xi Chen, Gautam Kamath, Amit Levi, and Erik Waingarten (ECCC, arXiv). By now, it is well-known that testing uniformity over the \(n\)-dimensional hypercube requires \(\Omega(2^{n/2})\) samples — the curse of dimensionality quickly makes this problem intractable. One option is to assume that the distribution is product, which causes the complexity to drop to \(O(\sqrt{n})\). This paper instead assumes one has stronger access to the distribution — namely, one can receive samples conditioned on being from some subcube of the domain. With this, the paper shows that the complexity drops to the near-optimal \(\tilde O(\sqrt{n})\) samples. The related problem of testing whether a distribution is either uniform or has large mean is also considered.
Property Testing of LP-Type Problems, by Rogers Epstein, Sandeep Silwal (arXiv). An LP-Type problem (also known as a generalized linear program) is an optimization problem sharing some properties with linear programs. More formally, they consist of a set of constraints \(S\) and a function \(\varphi\) which maps subsets of \(S\) to some totally ordered set, such that \(\varphi\) possesses monotonicity and locality properties. This paper considers the problem of testing whether \(\varphi(S) \leq k\), or whether at least an \(\varepsilon\)-fraction of constraints in \(S\) must be removed for \(\varphi(S) \leq k\) to hold. This paper gives an algorithm with query complexity \(O(\delta/\varepsilon)\), where \(\delta\) is a dimension measure of the problem. This is applied to testing problems for linear separability, smallest enclosing ball, smallest intersecting ball, smallest volume annulus. The authors also provide lower bounds for some of these problems as well.
Near-Optimal Algorithm for Distribution-Free Junta Testing, by Xiaojin Zhang (arXiv). This paper presents an (adaptive) algorithm for testing juntas, in the distribution-free model with one-sided error. The query complexity is \(\tilde O(k/\varepsilon)\), which is nearly optimal. Algorithms with this sample complexity were previously known under the uniform distribution, or with two-sided error, but this is the first paper to achieve it in the distribution-free model with one-sided error.
By Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever
This is a lightly edited and expanded version of the following post on the OpenAI blog about the following paper. While I usually don’t advertise my own papers on this blog, I thought this might be of interest to theorists, and a good follow up to my prior post. I promise not to make a habit out of it. –Boaz
TL;DR: Our paper shows that double descent occurs in conventional modern deep learning settings: visual classification in the presence of label noise (CIFAR 10, CIFAR 100) and machine translation (IWSLT’14 and WMT’14). As we increase the number of parameters in a neural network, initially the test error decreases, then increases, and then, just as the model is able to fit the train set, it undergoes a second descent, again decreasing as the number of parameters increases. This behavior also extends over train epochs, where a single model undergoes double-descent in test error over the course of training. Surprisingly (at least to us!), we show these phenomenon can lead to a regime where “more data hurts”—training a deep network on a larger train set actually performs worse.
Open a statistics textbook and you are likely to see warnings against the danger of “overfitting”: If you are trying to find a good classifier or regressor for a given set of labeled examples, you would be well-advised to steer clear of having so many parameters in your model that you are able to completely fit the training data, because you risk not generalizing to new data.
The canonical example for this is polynomial regression. Suppose that we get n samples of the form (x, p(x)+noise) where x is a real number and p(x) is a cubic (i.e. degree 3) polynomial. If we try to fit the samples with a degree 1 polynomial—-a linear function, then we would get many points wrong. If we try to fit it with just the right degree, we would get a very good predictor. However, as the degree grows, we get worse till the degree is large enough to fit all the noisy training points, at which point the regressor is terrible, as shown in this figure:
It seems that the higher the degree, the worse things are, but what happens if we go even higher? It seems like a crazy idea—-why would we increase the degree beyond the number of samples? But it corresponds to the practice of having many more parameters than training samples in modern deep learning. Just like in deep learning, when the degree is larger than the number of samples, there is more than one polynomial that fits the data– but we choose a specific one: the one found running gradient descent.
Here is what happens if we do this for degree 1000, fitting a polynomial using gradient descent (see this notebook):
We still fit all the training points, but now we do so in a more controlled way which actually tracks quite closely the ground truth. We see that despite what we learn in statistics textbooks, sometimes overfitting is not that bad, as long as you go “all in” rather than “barely overfitting” the data. That is, overfitting doesn’t hurt us if we take the number of parameters to be much larger than what is needed to just fit the training set — and in fact, as we see in deep learning, larger models are often better.
The above is not a novel observation. Belkin et al called this phenomenon “double descent” and this goes back to even earlier works . In this new paper we (Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever) extend the prior works and report on a variety of experiments showing that “double descent” is widely prevalent across several modern deep neural networks and for several natural tasks such as image recognition (for the CIFAR 10 and CIFAR 100 datasets) and language translation (for IWSLT’14 and WMT’14 datasets). As we increase the number of parameters in a neural network, initially the test error decreases, then increases, and then, just as the model is able to fit the train set, it undergoes a second descent, again decreasing as the number of parameters increases. Moreover, double descent also extends beyond number of parameters to other measures of “complexity” such as the number of training epochs of the algorithm.
The take-away from our work (and the prior works it builds on) is that neither the classical statisticians’ conventional wisdom that “too large models are worse” nor the modern ML paradigm that “bigger models are always better” always hold. Rather it all depends on whether you are on the first or second descent. Further more, these insights also allow us to generate natural settings in which even the age-old adage of “more data is always better” is violated!
In the rest of this blog post we present a few sample results from this recent paper.
We observed many cases in which, just like in the polynomial interpolation example above, the test error undergoes a “double descent” as we increase the complexity of the model. The figure below demonstrates one such example: we plot the test error as a function of the complexity of the model for ResNet18 networks. The complexity of the model is the width of the layers, and the dataset is CIFAR10 with 15% label noise. Notice that the peak in test error occurs around the “interpolation threshold”: when the models are just barely large enough to fit the train set. In all cases we’ve observed, changes which affect the interpolation threshold (such as changing the optimization algorithm, changing the number of train samples, or varying the amount of label noise) also affect the location of the test error peak correspondingly.
We found the double descent phenomena is most prominent in settings with added label noise— without it, the peak is much smaller and easy to miss. But adding label noise amplifies this general behavior and allows us to investigate it easily.
Using the model-wise double descent phenomenon we can obtain examples where training on more data actually hurts. To see this, let’s look at the effect of increasing the number of train samples on the test error vs. model size graph. The below plot shows Transformers trained on a language-translation task (with no added label noise):
On the one hand, (as expected) increasing the number of samples generally shifts the curve downwards towards lower test error. On the other hand, it also shifts the curve to the right: since more samples require larger models to fit, the interpolation threshold (and hence, the peak in test error) shifts to the right. For intermediate model sizes, these two effects combine, and we see that training on 4.5x more samples actually hurts test performance.
There is a regime where training longer reverses overfitting. Let’s look closer at the experiment from the “Model-wise Double Descent” section, and plot Test Error as a function of both model-size and number of optimization steps. In the plot below to the right, each column tracks the Test Error of a given model over the course of training. The top horizontal dotted-line corresponds to the double-descent of the first figure. But we can also see that for a fixed large model, as training proceeds test error goes down, then up and down again—we call this phenomenon “epoch-wise double-descent.”
Moreover, if we plot the Train error of the same models and the corresponding interpolation contour (dotted line) we see that it exactly matches the ridge of high test error (on the right).
In general, the peak of test error appears systematically when models are just barely able to fit the train set.
Our intuition is that for models at the interpolation threshold, there is effectively only one model that fits the train data, and forcing it to fit even slightly-noisy or mis-specified labels will destroy its global structure. That is, there are no “good models”, which both interpolate the train set, and perform well on the test set. However in the over-parameterized regime, there are many models that fit the train set, and there exist “good models” which both interpolate the train set and perform well on the distribution. Moreover, the implicit bias of SGD leads it to such “good” models, for reasons we don’t yet understand.
The above intuition is theoretically justified for linear models, via a series of recent works including [Hastie et al.] and [Mei-Montanari]. We leave fully understanding the mechanisms behind double descent in deep neural networks as an important open question.
The experiments above are especially interesting (in our opinion) because of how they can inform ML theory: any theory of ML must be consistent with “double descent.” In particular, one ambitious hope for what it means to “theoretically explain ML” is to prove a theorem of the form:
“If the distribution satisfies property X and architecture/initialization satisfies property Y, then SGD trained on ‘n’ samples, for T steps, will have small test error with high probability”
For values of X, Y, n, T, “small” and “high” that are used in practice.
However, these experiments show that these properties are likely more subtle than we may have hoped for, and must be non-monotonic in certain natural parameters.
This rules out even certain natural “conditional conjectures” that we may have hoped for, for example the conjecture that
“If SGD on a width W network works for learning from ‘n’ samples from distribution D, then SGD on a width W+1 network will work at least as well”
Or the conjecture
“If SGD on a certain network and distribution works for learning with ‘n’ samples, then it will work at least as well with n+1 samples”
It also appears to conflict with a “2-phase” view of the trajectory of SGD, as an initial “learning phase” and then an “overfitting phase” — in particular, because the overfitting is sometimes reversed (at least, as measured by test error) by further training.
Finally, the fact that these phenomena are not specific to neural networks, but appear to hold fairly universally for natural learning methods (linear/kernel regression, decision trees, random features) gives us hope that there is a deeper phenomenon at work, and we are yet to find the right abstraction.
We especially thank Mikhail Belkin and Christopher Olah for helpful discussions throughout this work. The polynomial example is inspired in part by experiments in [Muthukumar et al.].
The Computer Science Department at UT Austin invites applications for a Postdoctoral Fellow in theoretical computer science for the 2020-21 academic year. The Fellow will work with Dana Moshkovitz and David Zuckerman on pseudorandomness and computational complexity. Review of applicants will begin on January 15, but applications will be accepted until the position is filled.
Website: https://utaustin.wd1.myworkdayjobs.com/UTstaff/job/UT-MAIN-CAMPUS/Postdoctoral-Fellow_R_00006957
Email: maguilar@cs.utexas.edu
Applications are invited for a postdoctoral position in algorithms in the School of Informatics at the University of Edinburgh. The position is for a period of up to 3 years, and can start at any time in 2020.
Website: https://www.vacancies.ed.ac.uk/pls/corehrrecruit/erq_jobspec_version_4.jobspec?p_id=050307
Email: h.sun@ed.ac.uk
Applications are invited for a postdoctoral position in algorithms in the School of Informatics at the University of Edinburgh. The position is for a period of up to 3 years, and can start at any time in 2020.
Website: https://www.vacancies.ed.ac.uk/pls/corehrrecruit/erq_jobspec_version_4.display_form
Email: h.sun@ed.ac.uk
The Algorithms and Randomness Center (ARC) at Georgia Institute of Technology is seeking multiple postdoctoral fellows starting Fall 2020. ARC has faculty associated with multiple departments including CS, Math, ISyE and EE. The candidate will work on any aspect of algorithms, optimization, broadly interpreted, and collaborate with ARC faculty.
Website: https://www.isye.gatech.edu/about/employment-opportunities/postdoctoral-fellow-arc
Email: arc-postdoc@cc.gatech.edu
Bopuifs fodszqujpo qspcmfn.
“Unsung Entrepreneur” source |
Adolph Ochs was the owner of the New York Times. In 1897 he created the paper’s slogan, “All the News That’s Fit to Print.” We at GLL would like some suggestions on our own slogan. Send us your ideas. Please no suggestion of “All the news that fits we print,” as that is already out there.
Today Ken and I wish to comment on a recent article in the NYT that was on end-to-end encryption.
The article leads by saying:
A Justice Department official hinted on Monday that a yearslong fight over encrypted communications could become part of a sweeping investigation of big tech companies.
Of course, end-to-end encryption scrambles messages so that only the sender and receiver can decode the message. Other methods are weaker: some only encrypt messages as they enter part of the network. This means that one must trust the network to keep your message secret. Thus the end-to-end method reduces the number of parties that one must trust.
In 1912, Ochs was a party to encryption that was literally end-to-end on the globe. The New York Times had bought exclusive American rights to report Roald Amundsen’s expedition to the South Pole. When Amundsen returned to Hobart, Tasmania, he sent a coded cable to his brother Leon who was acting as conduit to the Times and the London Daily Chronicle. The brother pronounced the coast clear for Amundsen to communicate directly to the papers. The stories were still quickly plagiarized once the first one appeared in the Times, and Ochs had to defend his rights with lawsuits.
There is an ongoing interest in using end-to-end encryption to protect more and more of our messages. And this interest leads to several hard problems.
The main one addressed by the NYT article is: Does this type of encryption protect bad actors? Many believe that encryption makes it impossible to track criminals. Many in law enforcement, for example, wish to have the ability to access any messages, at least on a court order. Some countries are likely to make this the law—that is, they will insist that they always can access any message. A followup NYT article described debates within Interpol about these matters.
The above problem is not what we wish to talk about today. We want to raise another problem.
How do we know that our messages are being properly encrypted?
We could check that our app is in end-to-end mode. The app will say “yes”. The problem is that this does not prove anything. The deeper question is how do we know that messages are correctly encrypted. Indeed.
Suppose that we are told that the message has been sent to another person as the encrypted message . How do we know that this has been done? Several issues come to mind:
The app could lie. The app could for example say it is encrypting your message and it did not.
The app could mislead. The app could send an encrypted message and also send the clear message to who ever it wishes.
The app could be wrong. The app could think that the message was properly encrypted. The key, for example, could be a weak key.
The app method could be flawed. The app’s method could be incorrect. The method used might be vulnerable to non or unknown attacks.
Authenticated encryption seems to cover only part of the need. It can confirm the identity of the sender and that the ciphertext has not been tampered with. This is, however, a far cry from verifying that the encryption itself is proper and free of holes that could make it easy to figure out. Our point is also aside from problems with particular end-to-end implementations such as those considered in this 2017 paper.
Bopuifs fodszqujpo qspcmfn was encrypted with the simple key
The point of this silly example is that it might have been encrypted by a harder method, but it was only encrypted by a trivial substitution method. Nevertheless, Google could not figure it out:
Applications are invited for a postdoc position hosted by Hsin-Hao Su at Boston College. Areas of specific interests include but not limited to distributed graph algorithms, local algorithms, dynamic graph algorithms, gossip algorithms, and MPC algorithms. The position can start at any time in 2020 after February. The length of the position is for a period of up to two years.
Website: https://sites.google.com/site/distributedhsinhao/postdoc
Email: suhx@bc.edu
Applications are invited for a postdoctoral position in theoretical computer science in the School of Computer and Communication Sciences at EPFL. The position is for a period of up to two years, and comes with a competitive salary as well as a generous travel allowance.
Website: https://theory.epfl.ch/kapralov/postdoc.html
Email: michael.kapralov@epfl.ch
The College of Information and Computer Sciences at the University of Massachusetts
Amherst invites applications for tenure-track faculty in Theoretical Computer Science at the
Associate and Assistant Professor levels. Exceptional candidates at other ranks may be
considered.
Website: http://careers.umass.edu/amherst/en-us/job/502778/assistantassociate-professortheory
Email: facrec@cs.umass.edu
Aeiel Yadin’s homepage contains great lecture notes on harmonic functions on groups and on various other topics.
I have a lot of things to discuss and to report; exciting developments in the analysis of Boolean functions; much to report on algebraic, geometric and probabilistic combinatorics following our Oberwolfach summer meeting; much to tell about our Kazhdan seminar on quantum computation and symplectic geometry; a lot of exciting math and TCS activities in Israel; exciting things that Avi Wigderson’s told me on non commutative optimization and moment maps; and, of course, last but not least, the exciting Google supremacy demonstration that I most recently wrote about in my post Gil’s Collegial Quantum Supremacy Skepticism FAQ. In particular, the unbelievable local-to-global fidelity Formula (77), and (NEW) a poem by Peter Shor for quantum computer skeptics. More poems are most welcome!
With all these excitements, plans, and blog duties it looks that this is the right time to take a pause for a Test Your Intuition post. (Based on chats with Asaf Nachmias, Jonathan Hermon and Itai Benjamini.)
Consider the discrete cube as a graph: the vertices are all the 0-1 vertices of length , two vertices are adjacent if they differ in one coordinate.
A (lazy) simple random walk is described as follows: You start at the all 0 vertex. At each step when you are at vertex you stay where you are with probability 1/2 and, with probability 1/2, you move to a neighbor of chosen uniformly at random.
Your position after steps is a random variable describing a probability distribution on the vertices of the discrete cube. Now, lets fix once and for all the value of to be 0.1.
Test your intuition: How many steps T(n) does it take until is -close to the uniform distribution in total variation distance. (The total variation distance is 1/2 the distance).
We can also ask: How many steps M(n) does it take until is close to the uniform distribution for every ? Namely, for every ,
For this question there is a simple analysis based on the coupon collector problem.
We can also consider intermediate measures of proximity, like the entropy:
How many steps H(n) it takes until the entropy of is -close to the the entropy of the uniform distribution?
Let me try now to test your more detailed intuition: For the public opinion poll below we say that X behaves like Y if their ratio tends to one as tends to infinity, and that X is really smaller than Y if their ratio X/Y tends to a limit smaller than 1.
Let’s try something new: “Share your knowledge (SYK):” What other distances between probability distributions do you recommend? Tell us about them!
Ajtai, Komlos and Szemeredi proved that when you choose every edge of the discrete -cube with probability greater than a giant component emerges! Now, choose every edge with probability and start a simple random walk from a vertex of the giant component. Itai Benjamini conjectured that it will take roughly steps to approximately reach the stationary distribution. This seems very difficult.
Hoffman’s packing puzzle, and its connection to the inequality of arithmetic and geometric means (). The one I have is not quite so colorful as the illustration for this new Wikipedia article. My father-in-law made it for me some 30 years ago; you can see it in a corner of the photo at this post. I don’t unpack it very often, though, because I lost track of the handwritten table of solutions that I made when I first got it and it’s quite difficult to re-pack.
How the Iranian government shut off the internet (). According to this story, they have effected “a near-total internet and mobile data blackout” in an attempt to quell gasoline-price protests.
A Market for TCS Papers?? (, with Vijay Vazirani, on the “Turing’s Invisible Hand” blog.) The current situation with theoretical computer science conference reviewing is a mess of long publication delays and reviewer overload caused by repeated submissions and rejections. Vijay and I argue that it should instead be treated as a matching market with pooled submissions and stable matching, getting better results for less time and effort.
SODA 2020 accepted papers (). It only lists titles and authors, but if you notice a title you find intriguing you can find find more detail elsewhere. However, this depends on avoiding obscure titles; if, say, you found a breakthrough on clustered planarity showing that it’s in polynomial time, but you titled your paper “Atomic Embeddability, Clustered Planarity, and Thickenability”, others might not notice.
Did you know that William Chapple () discovered Euler’s formula for circumcenter-incenter distance before Euler, Poncelet’s porism on families of triangles inscribed and circumscribed by the same two circles before Poncelet, and was the first to publish a proof that Euclid missed, on the existence of orthocenters of triangles? Did you know that a street in Witheridge is named for him? Have you even heard of William Chapple before? Or Witheridge? Now you have.
Portal Icosahedron by Anthony James (). An icosahedral frame, infinity mirrors, and LED lighting create a view into an infinite icosahedral grid, creating an effect that, in the jargon of the art world, “is both esoteric and industrial, orphic and distinctly concrete”. Whatever that’s supposed to mean.
Get ready to change all of your bookmarks for non-profit organizations (, via) as the top-level .org domain name registry is sold to profiteers, drops its own non-profit status, and eliminates price caps on domain name renewals.
Ian Wanless on mathematician Eliyahu Rips and his Ig Nobel Prize for Literature (). An entertaining general-audience talk; audio only.
Charles Darwin’s first drawing of an evolutionary tree ().
Bechdelgrams illustrate of whether a movie passes the Bechdel test (). A nice use of color to highlight the information you’re looking for in a social network: Here, the network consists of interactions between characters in a film, and the women and conversations not about men are given distinctive colors to show the test criteria: does the film have at least two named female characters, who speak to each other, about something other than men?
Rogan Brown creates intricate paper sculptures inspired by microorganisms ().
Some recent open-access conference proceedings (): 27th European Symp. on Algorithms (ESA); 30th Int. Symp. on Algorithms and Computation (ISAAC); 22nd Japan Conf. on Discrete and Computational Geometry, Graphs, and Games (JCDCGGG). JCDCGGG is not very selective (think CCCG but more so), but I have a paper there with several co-authors on ununfoldable polyhedra with few vertices.
The Nefertiti bust meets the 21st century (, via). Interesting essay on claims of intellectual property on ancient artifacts (in this case a high-resolution 3d scan of a bust of Nefertiti), clearly invalid under both US law and still-being-implemented EU law and “dangerously close to committing copy fraud”.