Thursday, September 18, 2014

Eratosthenes-Legendre sieve

In this post, we present the basic idea of sieve methods and derive the simple Eratosthenes-Legendre sieve. Subsequently, there will be posts about more advanced sieve topics and their applications in number theory. Sieve theory offers tools for estimating various quantities in number theory (and also other branches of mathematics), in particular ones related to primes. The first idea of sieve theory was already invented by Eratosthenes around 2200 years ago. He devised a simple but rather efficient algorithm for finding all the primes up to $x$. Unfortunately, Eratosthenes' algorithm is so famous that the generality of the idea behind it is often forgotten. The idea allows us to estimate theoretically the number of primes up to $x$, or more generally many other similar sets of numbers, instead of taking a fixed $x$ such as $10^{10}$ and doing a boring computation. The simple but important idea can be formulated as follows : If we want to compute the number of those elements of a set $A\subset \mathbb{Z_+}$ with $|A|=x$ that have no prime factors below $z$, we should first take all the $x$ numbers, then subtract the number of even ones, subtract the number of multiples of $3$, subtract the number of multiples of $5$, add the number of multiples of $6$ (they were discarded twice), and so forth. The process goes through all the numbers $d$ that divide the product of the primes up to $z$ and whether we subtract or add depends on the parity of the number of prime divisors of $d$. It was probably Legendre who first wrote down this observation as a formula $-$ around 2000 years after Eratosthenes' algorithm $-$ and it is essentially the principle of inclusion and exclusion from combinatorics. Below we briefly discuss sieve methods in general. Then we derive the Eratosthenes-Legendre sieve and present some applications of it.

Introduction

Although the Eratosthenes-Legendre sieve already yields some nontrivial results, it is very weak, and for that reason not much was done with sieves until the 20th century. Around 1915, the situation changed when Brun developed a more refined combinatorial sieve and proved impressive results such as the convergence of the sum of reciprocals of twin primes and the existence of infinitely many $n$ such that $n$ and $n+2$ have at most $9$ prime factors. This inspired new research on sieve methods, and in the 1940s Selberg invented a weighted sieve which, among other things, gives good inequalities for the number of primes in short intervals without even using asymptotic notation. In addition, Linnik, Rényi, Roth, Bombieri, Montgomery and Vaughan developed the large sieve in the 1940s-1970s, using analytic ideas, and it was used to prove for example the Bombieri-Vinogradov theorem, which can be thought of as an averaged version of the generalized Riemann hypothesis. Many other sieves have been invented as well, and many of them give rise to new results even today.

Sieve methods have been used to prove many famous conjectures in number theory for semiprimes (products of at most two primes) instead of primes. For instance, Chen showed in 1973 that every large enough even integer is the sum of a prime and a semiprime; this is an approximation to Goldbach's strong conjecture. He also showed that there are infinitely many primes $p$ such that $p+2$ is a semiprime, which is the analogue of the twin prime conjecture for semiprimes. In addition, many sparse sequences are known to contain infinitely many semiprimes. For example, it was shown by Iwaniec in 1978 that there are infinitely may semiprimes of the form $n^2+1$, but for primes this is an open problem. The main reason why such results have not been obtained for primes is the parity problem: Sieves are generally unable to distinct between numbers with even and odd number of prime factors. In some problems, this difficulty has been overcome, but not with purely sieve theoretic methods. For example, Selberg and Erdös gave an elementary (no hard analysis) proof of the prime number theorem, and Friedlander and Iwaniec showed that the thin sequence $a^4+b^2$ has infinitely many primes.

Concerning recent applications of sieves, one must mention Zhang's result from 2013 that there are infinitely many pairs of primes that are at most $M$ apart for some constant $M$ (for which Zhang gave the bound $7\cdot 10^7$). To emphasize the strength of this result compared to what was known before, it can be remarked that only in 2005 it was proved that $\displaystyle\inf_{n\geq 1}\frac{p_{n+1}-p_n}{\log p_n}=0,$ where $p_n$ is the $n$th prime, by Goldston, Pintz and Yildirim (in fact, Zhang used some ideas from their paper). In 2013 and 2014, Zhang's results have been improved by many authors, including the Polymath project, Maynard and Tao, and the results have been generalized to $m$-tuples of primes instead of pairs of primes and the value of $M$ has been reduced significantly (to around $250$). Also the remarkable result by Green and Tao in 2004 that the primes contain arbitrarily long arithmetic progressions uses ideas from sieve theory and the work of Goldston and Yildirim.

General setting in sieve theory

A general question in sieve theory is the following: Given a set $A\subset \mathbb{Z}_+$ with $x$ elements, a set $P\subset \mathbb{P}$ of primes and a sifting parameter $z$, what can be said about $S(A,P,z)$, which counts the number of $a\in A$ that are coprime to all the primes of $P$ up to $z$? It turns out, as Eratosthenes hinted, that the size of $S(A,P,z)$ depends crucially on the sizes of $A_d=\{a\in A:a\equiv 0 \pmod d\}$. If we know our set $A$ relatively well, we can estimate $|A_d|$ and hence $S(A,P,z)$, that is the ''almost primes'' of $A$. If $z=(\max_{a\in A} a)^{\frac{1}{k}}$, then $S(A,P,z)$ counts the numbers $a\in A\cap (z,x]$ with at most $k$ prime factors.

For estimating $|A_d|$, it is generally assumed that $|A_d|=\frac{\omega(d)}{d}|A|+R_d$, where the error $R_d$ is small, often $|R_d|\leq \omega(d)$ is supposed. In other words, the proportion of elements of $A$ that are divisible by $d$ is $\frac{\omega(d)}{d}$. The assumption on the error term may seem strange at first, but in most applications it is satisfied because of the Chinese remainder theorem (or for trivial reasons). Example 6 above, however, is an important exception. The Chinese remainder theorem, when it applies, also gives that $\omega(d)$ is multiplicative, that is $\omega(ab)=\omega(a)\omega(b)$ for coprime $a$ and $b$. This assumption is crucial, as it says that divisibility of elements of $A$ by different primes is in a way independent. As an element of $A$ is not divisible by $p$ with probability $1-\frac{\omega(p)}{p}$ (with a very small error), independence ought to give us
\begin{eqnarray}S(A,P,z)=|A|\prod_{p\leq z \atop p\in P}\left(1-\frac{\omega(p)}{p}\right)+E(x,z),\end{eqnarray}
where $E$ is a (hopefully small) error term. We denote
\begin{eqnarray}W(z)=\prod_{p\leq z \atop p\in P}\left(1-\frac{\omega(p)}{p}\right),\end{eqnarray} so that our sieve prediction is just $S(A,P,z)=xW(z)+E(x,z)$.

If $|A_d|$ are known quite accurately, so are the individual terms of $W(z)$, and then estimating $W(z)$ well is not usually very difficult. What is more interesting is the behavior of the error $E(x,z)$ (it depends on $A$ and $P$ as well). Unfortunately, the Eratosthenes-Legendre sieve involves so long sums that $E(x,z)$ is about $2^z$ even in the simplest cases. Therefore we may not choose $z$ to be any power of $x$, so we cannot deduce any lower bounds for almost primes. However, with a simple Brun sieve one can cut the error to about $z^{\log \log z}$, and with a refined Brun sieve one can even get $E(x,z)$ to be bounded by $z^k$ for a fixed $k$, and the Selberg sieve is capable of giving even $k=2$ in some cases. Results that give the asymptotic $S(A,P,z)\sim xW(z)$, where $z=f(x)$ and $f$ is not too large, are called fundamental lemmas. If $f(x)\leq x^{\frac {1}{k}}$ for large enough $k$, one can indeed obtain such an asymptotic, but for large enough $z$ the asymptotic fails, as we shall see. The usefulness of the notations above will be clarified with examples.

Example 1. Let $A=[1,x],P=\mathbb{P}.$ Then $|A_d|=\lfloor\frac{x}{d} \rfloor=\frac{1}{d}x+R_d$, where $|R_d|\leq 1,$ so that $\omega(d)=1$. Now $S(A,P,z)=\pi(x,z)$, where $\pi(x,z)$ counts all the positive integers up to $x$ that are coprime to the primes below $z$.

Example 2. Let $A=\{an+b:n\leq x\}$ for some integers $a>0$ and $b$. Let $P=\mathbb{P}\setminus S$, where $S$ is the set of prime diviors of $a$. For $d$ coprime to $a$, $|A_d|=\frac{x}{d}+R_d,|R_d|\leq 1$ since the congruence $an+b\equiv 0 \pmod d$ has a unique solution. Now $S(A,P,z)$ counts the integers in this arithmetic progression being coprime to the primes below $z$ that do not divide $a$.

Example 3. Let $A=\{n^2+1:n\leq x\}.$ For odd primes $p$, $n^2+1\equiv 0 \pmod p$ has either two or no solutions $\pmod p$; the first case happens exactly when $p\equiv 1 \pmod 4$. By the Chinese remainder theorem, $n^2+1\equiv 0 \pmod d$ has $2^{\nu(d')}$ solutions for square-free values of $d$ if $d$ has no prime factors $p\equiv -1 \pmod 4$, where $\nu(d')$ is the number of prime divisors of $d'$ and $d'$ is the odd part of $d$. Hence our assumptions are satisfied with this function $\omega(d)$, and for example $S(A,P,\sqrt{x})$ counts the primes of the form $n^2+1$ for $\sqrt{x}<n\leq x$.

Example 4. Let $A=\{n(n+2):n\leq x\}, P=\mathbb{P}.$ Then $n(n+2)\equiv 0\pmod p$ has $2$ solutions $\pmod p$, except for $p=2$ there is one solution. By the Chinese remainder theorem, $\omega(d)=2^{\omega(d')}$ for all square-free $d$ (so this time there is no restriction on the prime divisors of $d$). The quantity $S(A,P,\sqrt{x})$ is clearly the number of twin primes on $(\sqrt{x},x].$

Example 5. Let $A=\{|(n-1^2)(n-2^2)...(n-r^2)|: n\leq x\},P=\mathbb{P}\setminus{2}$, where $r\geq z$. Then for every prime $2<p\leq z$, $|(n-1^2)(n-2^2)...(n-r^2)|\equiv 0 \pmod p$ is equivalent to $n$ being a quadratic residue $\pmod p.$ It is easy to see that there are $\frac{p-1}{2}$ quadratic resdidues $\pmod p$ for odd primes $p$, so by the Chinese remainder theorem, $\omega(d)=\prod_{p\mid d}\frac{p-1}{2}$ for square-free odd $d$. This time $S(A,P,z)$ tells the number of $n\leq x$ such that $(n-1^2)(n-2^2)...(n-r^2)\neq 0\pmod p$ for all primes $p\leq z$, that is $n$ is a quadratic nonresidue modulo all the odd primes up to $z$. This example differ from the others by the fact that $\omega(p)$ is now large.

Example 6. Let $A=\{p+2:p\leq y\}$, where $p$ runs through primes and $\pi(y)=x$. Also let $P=\mathbb{P}\setminus{2}$.  Thisgives a different formulation of the twin prime problem compared to Example 4; in particular, now $\omega(p)=\frac{p}{p-1}$ for odd primes $p$. There are two important points in this example. One is that $\omega(p)$ tends to $1$ and one can say that $\omega(p)=1$ asymptotically "on average" (this can and will be made precise in later posts), while the best uniform bound for $\omega(p)$ is $\frac{3}{2}$. This indicates that sometimes one gets better results assuming average conditions. Another point is that there is a trade-off between the approaches of Examples 4 and 6. The set in Example 4 is easier to handle, while in Example 6 estimating $A_d$ requires deep knowledge on primes in arithmetic progressions (and $|R_d|\leq \omega(d)$ is not satisfied pointwise but in an average sense), but will lead to better results.

We introduce a couple of more notations. By $P(z)$ we denote $\prod_{p\leq z,p\in P}p$. We use the usual asymptotic notation, so that $f(x)=O(g(x))$ means that $\left|\frac{f(x)}{g(x)}\right|$ is bounded as $x\to \infty$, and this can also be denoted $f\ll g.$ The notation $f=o(g)$ in turn means that $\frac{f(x)}{g(x)}\to 0$ as $x\to \infty$.

Eratosthenes-Legendre sieve

We already wrote the observation of Eratosthenes and Legendre in words, somewhat vaguely though. To write it as a formula, we apply the principle of inclusion and exclusion from elementary combinatorics.

Lemma 1 (Inclusion-exclusion). Let $S_1,...,S_n$ be finite sets. Then
\begin{eqnarray}\left|\bigcup_{i=1}^n S_i\right|&=\sum_{i=1}^n |S_i|-\sum_{1\leq i<j\leq n}|S_{i}\cap S_j|+\sum_{1\leq i<j<k\leq n}|S_i\cap S_j\cap S_k|-...\\&+(-1)^{n-1}|S_1\cap...\cap S_n|.\end{eqnarray}
Proof. Let $x\in \bigcup_{i=1}^n S_i$, and let $x$ belong to exactly $m$ of the sets $S_i$. Then $x$ is counted in the formula
\begin{eqnarray}\binom{m}{1}-\binom{m}{2}+\binom{m}{3}-...+(-1)^{m-1}\binom{m}{m}\end{eqnarray}
times. We show by induction that this number is always $1$. For $m=1$ it is trivial and assuming the case $m-1$ and using $\binom{m}{k}=\binom{m-1}{k}+\binom{m-1}{k-1}$ we get
\begin{eqnarray}\sum_{k=1}^m(-1)^{k-1}\binom{m}{k}&=&\sum_{k=1}^m(-1)^{k-1}\binom{m-1}{k}+\sum_{k=1}^m(-1)^{k-1}\binom{m-1}{k-1}\\&=&\sum_{k=1}^{m-1}(-1)^{k-1}\binom{m-1}{k}+\sum_{k=0}^{m-1}(-1)^{k}\binom{m-1}{k}\\&=&1+1-(-1)^0\binom{m-1}{0}=1.\end{eqnarray}
This proves the principle of inclusion and exclusion. ■

Applying the principle to the setting of sieves, we obtain the following.

Lemma 2 (Elementary sieve formula). With the same notations as above,
\begin{eqnarray}S(A,P,z)=\sum_{d\mid P(z)}\mu(d)|A_d|,\end{eqnarray}
where $\mu$ is the Möbius function: $\mu(1)=1,\mu(p_1...p_k)=(-1)^k$ if $p_i$ are distinct primes and $\mu(d)=0$ otherwise.

Proof. By inclusion and exclusion,
\begin{eqnarray}S(A,P,z)=|A|-\sum_{p\mid P(z)}|A_p|+\sum_{pq\mid P(z)}|A_{pq}|-\sum_{pqr\mid P(z)}|A_{pqr}|-...,\end{eqnarray}
where $p,q,r$ are primes, and this is clearly equal to $\sum_{d\mid P(z)}\mu(d)|A_d|$. ■

From the lemma above, it is easy to derive the Eratosthenes-Legendre sieve. By our sieve assumption, $|A_d|=\frac{\omega(d)}{d}|A|+R_d$, where $|R_d|\leq \omega(d)$ and $\omega$ is multiplicative. Hence
\begin{eqnarray}S(A,P,z)=|A|\sum_{d\mid P(z)}\mu(d)\frac{\omega(d)}{d}+\theta \sum_{d\mid P(z)} |R_d|,\end{eqnarray}
where $|\theta|\leq 1$ but $\theta$ depends on the variables. Since
\begin{eqnarray}\prod_{i=1}^n (1-a_i)=1-\sum_{i=1}^n a_i+\sum_{1\leq i<j\leq n}a_ia_j-\sum_{1\leq i<j<k\leq n}a_ia_ja_k+...,\end{eqnarray}
we observe that
\begin{eqnarray}S(A,P,z)&=&|A|\prod_{p\leq z\atop p\in P}\left(1-\frac{\omega(p)}{p}\right)+\theta\sum_{d\mid P(z)} |R_d|&=&xW(z)+\theta \sum_{d\mid P(z)}\omega(d)\\&=&xW(z)+\theta \left(\sum_{p\leq z\atop p\in P}\omega(p)\right)^{\pi(z)},\end{eqnarray}
since multiplying out $\left(\sum_{p\leq z, p\in P}\omega(p)\right)^{\pi(z)}$, we get by multiplicativity all the desired terms, in addition to something else that is positive. The value of $\theta$ changes from line to line, but it is always bounded by $1$ in modulus. The Eratosthenes-Legendre sieve is now derived.

Theorem (Eratosthenes-Legendre sieve). With the notations above,and assuming $|A_d|=\frac{\omega(d)}{d}+R_d$ with $\omega(d)$ multiplicative and $|R_d|\leq \omega(d)$, we have
\begin{eqnarray}S(A,P,z)=xW(z)+\theta \left(\sum_{p\leq z\atop p\in P}\omega(p)\right)^{\pi(z)}\end{eqnarray}
where $|\theta|\leq 1$ (it may depend on all the variables).

Applications

We give now some applications of the Eratosthenes-Legendre sieve. First we consider $\pi(x,z)$, that is we have the situation of Example 1 and we count the positive integers up to $x$ with no prime factors below $z$. The sieve gives
\begin{eqnarray}S(A,P,z)=\pi(x,z)&=&x\prod_{p\leq z}\left(1-\frac{1}{p}\right)+O(2^{\pi(z)})\\&=& (1+o(1))\frac{e^{-\gamma x}}{\log z}+O(2^{\pi(z)})\end{eqnarray}
by Mertens' theorem (see this post), where $\gamma$ is Euler's constant. In particular, for $z=f(x)\leq \log x$ we have
\begin{eqnarray}\pi(x,z)\sim \frac{e^{-\gamma x}}{\log z}\end{eqnarray}
as $x\to \infty$, so we have an asymptotic formula for $\pi(x,z)$ when $z$ grows very slowly in $x$. Even if we assumed the prime number theorem $\pi(z)\sim \frac{z}{\log z},$ the choice $z=2\log x \log \log x$ would already produce an error term that is larger than $x$ so that the formula becomes worthless. Therefore the only thing we can say about $\pi(x)$ using this simple sieve is that
\begin{eqnarray}\pi(x)\leq \pi(x,\log x)+\log x\ll \frac{x}{\log \log x}.\end{eqnarray}
Nevertheless, this gives an easy proof of the fact that primes have density zero in the positive integers. We could now ask, what happens if we have a sieve with a better error term; could we prove something stronger, perhaps even the prime number theorem? The answer to the first question is yes; for example a strong form of the Brun sieve proves that $\pi(x)\ll \frac{x}{\log x}$ (and the proof is just elementary although tedious combinatorics). However, the second question has a negative answer if we assume that our sieve is of the form $S(A,P,z)=xW(z)+E(x,z)$, where the error is small enough. Indeed, we would get
\begin{eqnarray}\pi(x)=\pi(x,\sqrt{x})+O(\sqrt x)=(1+o(1))\frac{e^{-\gamma} x}{\log x}\end{eqnarray}
which contradicts the prime number theorem. This shows that no sieve of the form described above gives always an asymptotic formula. Nevertheless, we mentioned in the introduction that there is an elementary proof of the prime number theorem, by Erdös and Selberg, and it uses sieve theoretic ideas that are present for example in Bombieri's sieve.

Next we show that almost all positive integers are not of the form $a^2+b^2$, where $a$ and $b$ are integers. There is an elementary theorem, attributed to Fermat, which says that $n$ is a sum of two squares if and only if each prime factor $p$ of $n$ such that $p\equiv -1 \pmod 4$ occurs to an even exponent. We only need one side of the statement, namely that each prime factor $p\equiv -1 \pmod 4$ must have an even exponent, and this follows just from the fact that $-1$ is a quadratic nonresidue $\pmod p$ for $p\equiv -1\pmod 4$. Let $N(x)$ be the number of $n\leq x$ that are sums of two squares, and let $N'(x)$ be the number of square-free $n\leq x$ with the same property. Then $N(x)=N'(x)+\sum_{k\leq x}N'\left(\frac{x}{k^2}\right)$, and if $N'(x)\leq \varepsilon x$ for all large enough $x$,then $N(x)\leq x(\varepsilon+\varepsilon\sum_{n=1}^{\infty}n^{-2})<3\varepsilon x$, so it suffices to show $N'(x)=o(x)$.

Let $A=[1,x], P=\mathbb{P}_{4,-1}$, where $\mathbb{P}_{a,b}=\{p\in \mathbb{P}:p\equiv b \pmod a\}$. Then clearly $\omega(d)=1$ for $d\mid P(z)$ and $|R_d|\leq 1$. The Eratosthenes-Legendre sieve gives
\begin{eqnarray}N'(x)&\leq& |\{n\leq x: n \quad \text{has no prime factor}\quad p\equiv -1 \pmod 4\}|\\&\leq& S(A,P,\log x)\leq x\prod_{p\leq \log x \atop p\equiv -1 \pmod 4}\left(1-\frac{1}{p}\right)+O(2^{\log x}),\end{eqnarray}
and this is $o(x)$, for the product above converges to zero as $x\to \infty$ (there is a rather easy elementary proof for it; the proof is quite similar to proving $\prod_{p}(1-\frac{1}{p})=0$ or equivalently $\sum_{p}\frac{1}{p}=\infty$). Much better asymptotics can be derived for $N(x)$ using more sophisticated methods; Landau showed that $N(x)\sim \frac{cx}{\sqrt{\log x}}$ for a certain constant $c$.

Lastly we consider the number of square-free integers up to $x$, which we denote by $S(x)$. This time we do not quite sieve using primes but using squares of primes instead. It is clear that similarly as for primes, inclusion and exclusion gives
\begin{eqnarray}S(x)=x-\sum_{p\leq \sqrt{x}}\left\lfloor\frac{x}{p^2} \right\rfloor+\sum_{pq\leq \sqrt{x}}\left\lfloor\frac{x}{(pq)^2} \right\rfloor-...\end{eqnarray}
This time the sums are much shorter than in the corresponding formula for $\pi(x)$, and indeed our error is at most
\begin{eqnarray}\pi(\sqrt{x})+\pi_2(\sqrt{x})+...+\pi_k(\sqrt{x}),\end{eqnarray}
where $\pi_m(y)$ counts those $n\leq y$ that have exactly $m$ prime divisors and $k$ is the largest integer with $\pi_k(\sqrt{x})>0$. The expression above is just $\lfloor\sqrt{x}\rfloor$, so
\begin{eqnarray}S(x)=x-\sum_{p\leq \sqrt{x}}\frac{x}{p^2} +\sum_{pq\leq \sqrt{x}}\frac{x}{(pq)^2} -...+\theta \sqrt{x}=x\prod_{p\leq x}\left(1-\frac{1}{p^2}\right)+\theta\sqrt{x},\quad |\theta|\leq 1.\end{eqnarray}
By the Euler product (which can be proved easily multiplying out and using the fundamental theorem of arithmetic),
\begin{eqnarray}\sum_{n=1}^{\infty}n^{-s}=\prod_{p}\frac{1}{1-p^{-s}},\quad s>1,\end{eqnarray}
so
\begin{eqnarray}\prod_{p}\left(1-\frac{1}{p^2}\right)=\left(\sum_{n=1}^{\infty}n^{-2}\right)^{-1}=\frac{6}{\pi^2}\end{eqnarray}
by the Basel problem (see this post). We have
\begin{eqnarray}\prod_{p\geq x}\left(1-\frac{1}{p^2}\right)^{-1}&\leq& \prod_{n\geq x}\left(1-\frac{1}{n^2}\right)^{-1}\\&=&\prod_{n\geq x}\left(1+\frac{1}{n^2-1}\right)\\&\leq& \exp\left(\sum_{n\geq x}\frac{1}{n^2-1}\right)\quad \text{since}\quad 1+y\leq e^y\\&\leq& e^{\frac{c}{x}}\leq \left(1+\frac{2c}{x}\right)\end{eqnarray}
for large enough $x$, so $S(x)=\frac{6}{\pi^2}x+\theta(\sqrt{x}+2c)$. We conclude that for this problem elementary sifting was able to give a very precise result, and as a corollary we get the elegant fact that square-free integers have density $\frac{6}{\pi^2}$.