Derivation of the sieve
In the 1940's, Selberg found a way to sharpen the already good inequalities given by the Brun sieve, by using a different method. The basic principle is astonishingly simple, but turns out to be fruitful. For any set $A\subset \mathbb{Z}_+$(with $x$ elements) and any set $P$ of primes, we have
\[\begin{eqnarray}S(A,P,z)=\sum_{a\in A\atop (a,P(z)=1)}1\leq \sum_{a\in A}\left(\sum_{d\mid a\atop d\mid P(z)} \lambda_d\right)^2\end{eqnarray}\]
for arbitrary real numbers $\lambda_1,\lambda_2,...$ with $\lambda_1=1$. Simplifying, this expression becomes
\[\begin{eqnarray}\sum_{d_1,d_2\mid P(z)}\lambda_{d_1}\lambda_{d_2}A_{[d_1,d_2]}\end{eqnarray}\]
(where $[\cdot,\cdot]$ denotes the least common multiple), for each term $\lambda_{d_1}\lambda_{d_2}$ is counted as many times as there are elements of $A$ that are divisible by $d_1$ and $d_2$. Writing $|A_d|=\frac{\omega(d)}{d}x+R_d$ as usual, with $\omega(\cdot)$ being multiplicative, we obtain
\[\begin{eqnarray}S(A,P,z)\leq x\sum_{d_1,d_2\mid P(z)}\frac{\omega([d_1,d_2])}{[d_1,d_2]}\lambda_{d_1}\lambda_{d_2}+\sum_{d_1,d_2\mid P(z)}|\lambda_{d_1}\lambda_{d_2}||R_{[d_1,d_2]}|\quad (1).\end{eqnarray}\]
The first term is regarded as the main term, while the second is the error. The basic principle is just that we should minimize the main term as a function of the $\lambda_d$'s, with the constraint $\lambda_1=1$. For convenience, we assume $\lambda_d=0$ for $d>z$. Our task is just to minimize a quadratic form, but setting all partial derivatives equal to zero would lead to a rather unpleasant looking system of equations, so we first diagonalize the form. Using $mn=(m,n)[m,n]$, the form becomes
\[\begin{eqnarray}\sum_{d_1,d_2\mid P(z)}\frac{(d_1,d_2)}{\omega((d_1,d_2))}\frac{\lambda_{d_1}\omega(d_1)}{d_1}\cdot \frac{\lambda_{d_2}\omega(d_2)}{d_2}.\end{eqnarray}\]
We have $\frac{p}{\omega(p)}=1+\frac{\omega(p)}{p-\omega(p)}=1+\frac{1}{g(p)}$, where $g(p)$ is as in earlier posts (it is defined like this for primes and it is extended multiplicatively to square free numbers), so we can rewrite the above expression as
\[\begin{eqnarray}&&\sum_{d_1,d_2\mid P(z)}\prod_{p\mid (d_1,d_2)}\left(1+\frac{1}{g(p)}\right)\frac{\lambda_{d_1}\omega(d_1)}{d_1}\cdot \frac{\lambda_{d_2}\omega(d_2)}{d_2}\\&=&\sum_{d_1,d_2\mid P(z)}\sum_{k\mid(d_1,d_2)}\frac{1}{g(k)}\frac{\lambda_{d_1}\omega(d_1)}{d_1}\cdot \frac{\lambda_{d_2}\omega(d_2)}{d_2}\\&=&\sum_{k\mid P(z)}\frac{1}{g(k)}\left(\sum_{d\mid P(z)\atop d\equiv 0\pmod k}\frac{\omega(d)}{d}\lambda_d\right)^2.\end{eqnarray}\]
Now denote
\[\begin{eqnarray}\xi_k=\sum_{d\mid P(z)\atop d\equiv 0\pmod k}\frac{\omega(d)}{d}\lambda_d;\end{eqnarray}\]
we are going to minimize the diagonal form $\sum_{k\mid P(z)}\frac{1}{g(k)}\xi_k^2$ subject to $\lambda_1=1$ and $\lambda_d=0$ for $d>z$ (for convenience, $\lambda_d$ is defined for every positive integer $d$ but is supported of the numbers up to $z$ dividing $P(z)$). First we should express the condition $\lambda_1=1$ in terms of the $\xi_k$'s.
Since
\[\begin{eqnarray}\xi_1=\sum_{d=1}^{\infty}\frac{\omega(d)}{d}\lambda_d,\quad \xi_2=\sum_{d=1}^{\infty}\frac{\omega(2d)}{2d}\lambda_{2d},...,\end{eqnarray}\]
inclusion and exclusion gives
\[\begin{eqnarray}1=\lambda_1\omega(1)&=&\sum_{n=1}^{\infty}\mu(n)\xi_n=\sum_{n\mid P(z)\atop n\leq z}\mu(n)\xi_n,\end{eqnarray}\]
and more generally
\[\begin{eqnarray}\frac{\omega(d)}{d}\lambda_d&=&\sum_{n=1}^{\infty}\mu(n)\xi_{dn}=\sum_{n\mid P(z)\atop n\leq \frac{z}{d}}\mu(n)\xi_{dn}\quad \quad\quad (2).\end{eqnarray}\]
We can now easily minimize the quadratic form under consideration. By Cauchy-Schwarz,
\[\begin{eqnarray}1=\sum_{n\mid P(z)\atop n\leq z}\mu(n)\xi_n&=&\sum_{n\mid P(z)}\mu(n)\sqrt{g(n)}\cdot \frac{\xi_n}{\sqrt{g(n)}}\\&\leq& \left(\sum_{n\mid P(z)\atop n\leq z}\mu^2(n)g(n) \sum_{n\mid P(z)\atop n\leq z} \frac{1}{g(n)}\xi_n^2\right)^{\frac{1}{2}},\end{eqnarray}\]
so
\[\begin{eqnarray}\sum_{n\mid P(z)\atop n\leq z} \frac{1}{g(n)}\xi_n^2\geq \left(\sum_{n\mid P(z)\atop n\leq z}\mu^2(n)g(n)\right)^{-1}:=\frac{1}{G(z)}.\end{eqnarray}\]
Equality is attained if and only if $\xi_n=\frac{\mu(n)g(n)}{G(z)}$ for all $n\leq z$ such that $n\mid P(z)$.
After this, we just need to estimate the error term in $(1)$. We make the usual assumption $|R_d|\leq \omega(d)$. It turns out that actually $|\lambda_d|\leq 1$ for all $d$. By $(2),$ the parameters $\lambda_d$ are given by
\[\begin{eqnarray}\lambda_d&=&\frac{d}{\omega(d)}G(z)^{-1}\sum_{n\mid P(z)\atop n\leq \frac{z}{d}}\mu(n)\mu(dn)g(dn)\\&=&\frac{d\mu(d)g(d)}{\omega(d)}G(z)^{-1}\sum_{n\mid P(z)\atop n\leq \frac{z}{d}, (n,d)=1}\mu^2(n)g(n)\\&=&\frac{\mu(d)}{\prod_{p\mid d}\left(1-\frac{\omega(p)}{p}\right)}G(z)^{-1}\sum_{n\mid P(z)\atop n\leq \frac{z}{d}, (n,d)=1}\mu^2(n)g(n)\\&:=&\frac{\mu(d)}{\prod_{p\mid d}\left(1-\frac{\omega(p)}{p}\right)}\cdot\frac{G_d(\frac{z}{d})}{G(z)},\end{eqnarray}\]
where we use the fact that $\mu(nd)=0$ unless $(n,d)=1$.
To see that $\lambda_d$ is bounded by $1$ in modulus, we notice that
\[\begin{eqnarray}G(z)&=&\sum_{\ell\mid d}\sum_{\substack{n\mid P(z)\\n\leq z\\ (n,d)=\ell}}\mu^2(n)g(n)=\sum_{\ell\mid d}\sum_{\substack{m\mid P(z)\\m\leq \frac{z}{\ell}\\ (m,d)=\ell}}\mu^2(m\ell)g(m\ell)\\&=&\sum_{\ell\mid d}\mu^2(\ell)g(\ell)G_{\ell}\left(\frac{z}{\ell}\right)\geq \sum_{\ell\mid d}\mu^2(\ell)g(\ell)G_d\left(\frac{z}{d}\right)\\&=&\prod_{p\mid d}(1+g(p))\cdot G_d\left(\frac{z}{d}\right)=\prod_{p\mid d}\frac{1}{1-\frac{\omega(p)}{p}}\cdot G_d\left(\frac{z}{d}\right),\end{eqnarray}\]
so indeed $|\lambda_d|\leq 1$. Now the error term in $(1)$ is at most
\[\begin{eqnarray}\sum_{d_1,d_2\mid P(z)\atop d_1,d_2\leq z}|R_{[d_1,d_2]}|\quad \quad (3).\end{eqnarray}\]
The number of pairs of integers with $[d_1,d_2]=d$ for a fixed square-free $d$ is $3^{\nu(d)}$. Namely, if $d=p_1...p_{\nu(d)}$, then $d_1=\prod_{i\in S}p_i$ for some $S\subset\{1,...,\nu(d)\}$, so $d_2=\prod_{i\in T}p_i$ for some $T$ containing $\{1,...,\nu(d)\}\setminus S.$ If the size of $S$ is $r$, there are $\binom{\nu(d)}{r}$ choices for $S$ and $2^{\nu(d)-r}$ choices for $T$, so the binomial identity $\sum_{r=0}^{\nu(d)}\binom{\nu(d)}{r}2^r=3^{\nu(d)}$ proves the desired statement.
Hence the error is not greater than
\[\begin{eqnarray}\sum_{d\mid P(z)\atop d\leq z^2}3^{\nu(d)}\omega(d)&\leq& z^2\sum_{d\mid P(z)\atop d \leq z^2}\frac{3^{\nu(d)}\omega(d)}{d}\\&=&z^2\prod_{p\leq z\atop p\in P}\left(1+\frac{3\omega(p)}{p}\right)\\&\leq& z^2\prod_{p\leq z\atop p\in P}\left(1+\frac{\omega(p)}{p}\right)^3 &\leq& z^2\prod_{p\leq z\atop p\in P}\left(1-\frac{\omega(p)}{p}\right)^{-3}\\&:=&\frac{z^2}{W(z)^3}.\end{eqnarray}\]
We have now derived Selberg's sieve.
Theorem 1. Let $A$ be an arbitary set of positive integers with $x$ elements, and $P$ an arbitrary subset of primes. Assume $A_d=\frac{\omega(d)}{d}+R_d$ with $\omega(d)$ multiplicative and $|R_d|\leq \omega(d)$. Then for any $z\geq 2$,
\[\begin{eqnarray}S(A,P,z)\leq \frac{x}{G(z)}+\frac{z^2}{W(z)^3}.\end{eqnarray}\]
Without assuming $|R_d|\leq \omega(d)$, we still have
\[\begin{eqnarray}S(A,P,z)\leq \frac{x}{G(z)}+\sum_{d\leq z^2\atop d\mid P(z)}3^{\nu(d)}|R_d|.\end{eqnarray}\]
Without assuming $|R_d|\leq \omega(d)$, we still have
\[\begin{eqnarray}S(A,P,z)\leq \frac{x}{G(z)}+\sum_{d\leq z^2\atop d\mid P(z)}3^{\nu(d)}|R_d|.\end{eqnarray}\]
Proof. Given above. ■
The inequalities above has many good features. First, the assumptions are minimal. Second, $z$ is not assumed to be large enough and we do not need to adjust any further parameters. Third, and most importantly, it gives sharp results, as we will see below.
Applications to primes in arithmetic progressions
We intend to give good upper bounds for the number of primes on (short) intervals using Selberg's sieve. When sifting the primes in the arithmetic progression $qn+a$, we should take $A=\{n\geq 0:qn+a\in [x-y,x]\}$, $P=\mathbb{P}\setminus\{p:p\mid q\}$ and $\omega(p)=1$ for $p\in P$, $\omega(p)=0$ for other primes. The first thing we have to do is to estimate $G(z)$ in this situation. As $g(p)=\frac{1}{p-1}=\frac{1}{\varphi(p)}$, we have $g(d)=\frac{1}{\varphi(d)}$ for $d$ coprime to primes not in $P$. Therefore
\[\begin{eqnarray}G(z)=H_q(z):=\sum_{d\leq z\atop (d,q)=1}\frac{\mu^2(d)}{\varphi(d)}.\end{eqnarray}\]
Notice that
\[\begin{eqnarray}H_1(z)=\sum_{\ell\mid q}\sum_{d\leq z\atop (d,q)=\ell}\frac{\mu^2(d)}{\varphi(d)}=\sum_{\ell\mid q}\frac{\mu^2(\ell)}{\varphi(\ell)}\sum_{m\leq \frac{z}{\ell}\atop (m,q)=1}\frac{\mu^2(m)}{\varphi(m)}=\sum_{\ell\mid q}\frac{\mu^2(\ell)}{\varphi(\ell)}H_q\left(\frac{z}{\ell}\right),\end{eqnarray}\]
so
\[\begin{eqnarray}H_q(z)\geq \frac{H_1(z)}{\sum_{\ell\mid q}\frac{\mu^2(\ell)}{\varphi(\ell)}}=H_1(z)\prod_{p\mid q}\left(1-\frac{1}{p}\right).\end{eqnarray}\]
Hence it suffices to estimate $H_1$ from below in order to give an upper bound for $\frac{1}{G(z)}$. It holds that
\[\begin{eqnarray}H_1(z)=\sum_{d\leq z}\frac{\mu^2(d)}{d}\prod_{p\mid d}\left(1+\frac{1}{p}+\frac{1}{p^2}+...\right)\geq \sum_{n\leq z}\frac{1}{n}\geq \log z\end{eqnarray}\]
because every positive integer $n\leq z$ can be written in the form $n=dp_1^{k_1}...p_r^{k_r}$ where $d\leq z$ is square-free and $p_1,...,p_r\mid d$ are primes. We have now
\[\begin{eqnarray}H_q(z)\geq \frac{\varphi(q)}{q}\log z,\quad (5)\end{eqnarray}\]
so we conclude
\[\begin{eqnarray}S(A,P,z)\leq \frac{y}{\varphi(q)\log z}+\frac{z^2}{W(z)^3}.\end{eqnarray}\]
However, the error term can be improved. As $|R_d|\leq 1$ for all $d$, returning to (3) and estimating that error term trivially (using the fact that the number of square-free $d\leq z$ is certainly less than $z-1$), we get
\[\begin{eqnarray}\pi(x;q,a)-\pi(x-y;q,a)\leq \frac{y}{\varphi(q)\log z}+(z-1)^2+\pi(z)+1\leq \frac{y}{\varphi(q)\log z}+z^2.\end{eqnarray}\]
Taking $z=\frac{\sqrt{y}}{\sqrt{\varphi(q)}\log y}$, we get
\[\begin{eqnarray}\pi(x;q,a)-\pi(x-y;q,a)&\leq& \frac{2y}{\varphi(q)\log \frac{y}{q\log^2 y}}+\frac{y}{\varphi(q)\log^2 y}\\&\leq& \frac{2y}{\varphi(q)\log \frac{y}{q\log^2 y}}\left(1+\frac{1}{2\log y}\right).\quad (4)\end{eqnarray}\]
This is the Brun-Titchmarsh inequality, except that the second factor could be removed and $\log \frac{y}{q\log^2 y}$ replaced with $\log\frac{y}{q}$ (in that case, the inequality is due to Montgomery and Vaughan). One can obtain the latter improvement (while increasing the second factor to $1+\frac{8}{\log \frac{y}{q}}$) by estimating the error term $(\sum_{d\leq z,(d,q)=1}\lambda_d)^2$ very accurately using the explicit formula for $\lambda_d$ and rather tedious computations that bound the error by $24\frac{q}{\varphi(q)}\frac{z^2}{\log^2 z}$ for $z\geq e^6$; see Halberstam and Richert. With even more careful estimation of the sum over $\lambda_d$, one can prove the inequality of Montgomery and Vaughan. In any case, we arrived at the following.
Theorem 2. For any fixed $q$, we have
\[\begin{eqnarray}\pi(x;q,a)-\pi(x-y;q,a)\leq \frac{(2+o(1))y}{\varphi(q)\log y}\end{eqnarray}\]
as $y\to \infty$.
We make several remarks about this inequality. First of all, it is off by a factor of two from what we expect and what we have in the prime number theorem. Nevertheless, the Brun-Titchmarsh inequality gives upper bounds that are (conjectured to be) correct up to the factor $2$ on intervals on which the prime number theorem or the Riemann hypothesis say nothing about the number of primes. Indeed, let $\pi(x;q,a)=\frac{1}{\varphi(q)}Li(x)+E(x;q,a)$, where $Li(x)=\int_{2}^{x}\frac{dt}{\log t}$ is the logarithmic integral and the approximation is based on the heuristic that $n$ is a prime congruent to $a\pmod q$ with "probability" $\frac{1}{\varphi(q)\log n}$. Then
\[\begin{eqnarray}\pi(x;q,a)-\pi(x-y;q,a)&=&\frac{1}{\varphi(q)}(Li(x)-Li(x-y))+E(x;q,a)-E(x-y;q,a)\\&\leq& \frac{(1+o(1))y}{\log x}+E(x;q,a)-E(x-y;q,a).\end{eqnarray}\]
Using the error bound $E(x;q,a)\leq\frac{c_{q,k} x}{\log^k x}$ in the prime number theorem for arithmetic progressions, we get
\[\begin{eqnarray}\pi(x;q,a)-\pi(x-y;q,a)\leq \frac{(1+o(1))y}{\log x}+\frac{c_{q,k}x}{\log^k x}.\end{eqnarray}\]
because we can expect no cancellation in the two errors. Despite being a good bound on long intervals, as soon as $y=o\left(\frac{x}{\log^{k}x}\right)$, the bound becomes worse than the trivial one. Assuming the generalized Riemann hypothesis , we have
\[\begin{eqnarray}\pi(x;q,a)-\pi(x-y;q,a)\leq \frac{(1+o(1))y}{\log x}+c_q\sqrt{x}\log x,\end{eqnarray}\]
and even this is useless on intervals of length $y\ll \sqrt{x}$. Therefore, the theorem above and its refinements are very powerful tools on short intervals. In addition, formulas such as $(4)$ or the inequality of Montgormery and Vaughan are completely explicit unlike the prime number theorem (there are explicit versions of the prime number theorem though, for instance by Rosser and Schoenfeld, but proving them is more complicated than proving the explicit inequalities mentioned above. For primes in arithmetic progressions the possibility of an exceptional Siegel zero of an $L$-function complicates estimations a lot). It is also good to note that the constant $2$ in the Brun-Titchmarsh inequality has not been improved, and in fact cannot be without proving that there are no Siegel zeros.
Upper bounds for Goldbach's problem and linear equations in primes
Next we consider applications of Selberg's sieve to the number of representations of an integer as the sum of two primes and the number of primes $p$ such that $q=ap+b$ is also prime. Such primes include the twin primes and the Sophie Germain primes. The refined Brun's sieve from this previous post would already give bounds for the number of these integers, but with an unspecified constant factor (or a rather large constant if one makes everything explicit). There are however a couple of good reasons to do the necessary work with Selberg's sieve. One is that the estimates we get are only off by a factor of $4$ from the conjectured ones, and if one obtained an estimate that is off by a factor of less than $2$, that would prove that there are no Siegel zeros. Another reason is that the proof is a nice application of the Bombieri-Vinogradov theorem, and assuming it the proof is not very complicated.
To consider the number of representations of an even integer $N$ as the sum of two primes, set $A=\{N-p:p\leq N,p\in \mathbb{P}\}$, $P=P_N:=\mathbb{P}\setminus\{p:p\mid N\}$. The reason we do not sift all the integers in $A'=\{n\leq x:n(N-n)\}$ by the primes up to $\sqrt{N}$ and coprime to $N$ is that for this set $A'$ the function $\omega(d)$ would be $2^{\nu(d)}$ for $d$ coprime to $N$, while for the set $A$ that we choose it is much smaller, namely $\frac{d}{\varphi(d)}$.
We are interested in the primes in $A$, and we are not sifting with the prime divisors of $N$ as $A_p=1$ for $p\mid N$. Let us write $|A_d|=\frac{1}{\varphi(d)}\pi(N)+E(N,N,d)$ for $d$ square-free and coprime to $N$. Here $E(x;q,a)$ has the same meaning as above. The most efficient proofs of the prime number theorem in arithmetic progressions give essentially $E(x;q,a)\ll_q x\exp(-(\log x)^{\frac{1}{3}})$ (a bit better), but still the error $E(N,N, d)$ is not even close to being bounded by $\omega(d)=\frac{d}{\varphi(d)}$ since the error depends crucially on $N$. The Bombieri-Vinogradov theorem is therefore required to establish that the error is in fact small when we average over $d\leq z$.
We should first compute $G(z)$. Note that $g(p)=\frac{\frac{p}{p-1}}{p-\frac{p}{p-1}}=\frac{1}{p-2}=\frac{1}{p-1}(1+g(p))$, so
\[\begin{eqnarray}G(z)&=&\sum_{n\leq z\atop (n,N)=1}\mu^2(n)g(n)=\sum_{n\leq z\atop (n,N)=1}\frac{\mu^2(n)}{\varphi(n)}\prod_{p\mid N}(1+g(p))\\&=&\sum_{n\leq z\atop (n,N)=1}\frac{\mu^2(n)}{\varphi(n)}\sum_{d\mid N}g(d)\\&=&\sum_{d\leq z\atop (d,N)=1}g(d)\sum_{\substack{n\leq z\\(n,N)=1\\ d\equiv 0\pmod N}}\frac{\mu^2(n)}{\varphi(n)}\\&=&\sum_{d\leq z\atop (d,N)=1}\frac{\mu^2(d)}{\varphi(d)}g(d)\sum_{k\leq \frac{z}{d}\atop (k,dN)=1}\frac{\mu^2(k)}{\varphi(k)}\\&\geq&\prod_{p\mid N}\left(1-\frac{1}{p}\right) \sum_{d\leq z\atop (d,N)=1}\frac{\mu^2(d)}{\varphi(d)}g(d)\prod_{p\mid d}\left(1-\frac{1}{p}\right)\log\frac{z}{d}\\&=&\prod_{p\mid N}\left(1-\frac{1}{p}\right)\left(\sum_{d\leq z \atop (d,N)=1}\frac{\mu^2(d)g(d)}{d}\log z-\sum_{d\leq z \atop (d,N)=1}\frac{\mu^2(d)g(d)}{d}\log d\right)\end{eqnarray}\]
where the inequality follows from formula $(5)$. The expression above is further equal to
\[\begin{eqnarray}\prod_{p\mid N}\left(1-\frac{1}{p}\right)\left(\sum_{d=1\atop (d,N)=1}^{\infty}\frac{\mu^2(d)g(d)}{d}\log z-\sum_{d=1\atop (d,N)=1}^{\infty}\frac{\mu^2(d)g(d)}{d}\log d+O\left(\frac{\log z}{z}\right)\right)\end{eqnarray}\]
We simplify the sums as follows:
\[\begin{eqnarray}\sum_{d=1\atop (d,N)=1}^{\infty}\frac{\mu^2(d)g(d)}{d}&=&\prod_{p\nmid N}\left(1+\frac{g(p)}{p}\right)\\&=&\prod_{p\nmid N}\left(1+\frac{1}{p(p-2)}\right),\\\sum_{d=1\atop (d,N)=1}^{\infty}\frac{\mu^2(d)g(d)}{d}\log d&=&\sum_{d=1\atop (d,N)=1}^{\infty}\frac{\mu^2(d)g(d)}{d}\sum_{p\mid d}\log p\\&=&\sum_{p\nmid N}\log p\sum_{\substack{d=1\\(d,N)=1\\d\equiv 0\pmod p}}^{\infty}\frac{\mu^2(d)g(d)}{d}\\&=&\sum_{p\nmid N}\log p\cdot\frac{g(p)}{p}\sum_{k=1\atop (k,pN)=1}^{\infty}\frac{\mu^2(k)g(k)}{k}\\&=&\prod_{p\nmid N}\left(1+\frac{1}{p(p-2)}\right)\sum_{p\nmid N}\frac{\log p}{p(p-2)}\left(1+\frac{1}{p(p-2)}\right)^{-1}\\&=&\prod_{p\nmid N}\left(1+\frac{1}{p(p-2)}\right)\sum_{p\nmid N}\frac{\log p}{(p-1)^2}\end{eqnarray}\]
Putting this all together yields
\[\begin{eqnarray}&&S(A,P,z)\\&\leq& \pi(N)\prod_{p\nmid N}\left(1+\frac{1}{p(p-2)}\right)^{-1}\prod_{p\mid N}\left(1-\frac{1}{p}\right)^{-1}\left(\log z-\sum_{p}\frac{\log p}{(p-1)^2}\right)^{-1}\left(1+O(z^{-1})\right)\\&+&\sum_{d\leq z^2\atop (d,N)=1}\mu^2(d)3^{\nu(d)}|E(N;N,d)|,\quad\quad (5)\\\end{eqnarray}\]
by Selberg's sieve (Theorem 1). The two products can easily be rewritten as
\[\begin{eqnarray}\left(\frac{1}{2}\prod_{p>2}\left(1+\frac{1}{p(p-2)}\right)\prod_{2<p\mid N}\frac{p-2}{p-1}\right)^{-1}=2\prod_{p>2}\left(1-\frac{1}{(p-1)^2}\right)\prod_{2<p\mid N}\frac{p-1}{p-2},\quad (6)\end{eqnarray}\]
so
\[\begin{eqnarray}&S(A,P,z)\leq 2\mathcal{G}\prod_{2<p\mid N}\frac{p-1}{p-2}\cdot\frac{N}{\log N \log z}\left(1+O\left(\frac{1}{\log z}\right)\right)\\&+\sum_{d\leq z^2\atop (d,N)=1}\mu^2(d)3^{\nu(d)}|E(N;d,N)|,\end{eqnarray}\]
where $\mathcal{G}$ is the first product in $(6)$, and we used $(\log z-c)^{-1}=(\log z)^{-1}(1+O((\log z)^{-1}))$ and $\pi(N)=\frac{N}{\log N}(1+O(\frac{1}{\log N}))$. Now it remains to estimate the error term. By Cauchy-Schwarz,
\[\begin{eqnarray}\sum_{d\leq z^2\atop (d,N)=1}\mu^2(d)3^{\nu(d)}|E(N;d,N)|&\leq& \sum_{d\leq z^2\atop (d,N)=1}\sqrt{d}\cdot3^{\nu(d)}\cdot \frac{|E(N;d,N)|}{\sqrt{d}}\\&\leq& \left(\sum_{d\leq z^2}d\cdot 6^{\nu(d)}\right)^{\frac{1}{2}}\left(\sum_{d\leq z^2\atop (d,N)=1}\frac{|E(N;d,N)|^2}{d}\right)^{\frac{1}{2}}\\&\leq& \left(\sum_{d\leq z^2}d\cdot 6^{\nu(d)}\right)^{\frac{1}{2}}\left(\sum_{d\leq z^2\atop (d,N)=1}|E(N;d,N)|\right)^{\frac{1}{2}}.\end{eqnarray}\]
What the Bombieri-Vinogradov theorem says is that for every positive $M$ there exists a $c(M)$ such that
\[\begin{eqnarray}\sum_{1\leq q\leq \frac{\sqrt{x}}{\log^{c(M)}x}}\max_{y\leq x}\max_{a\in \mathbb{Z}_q^{\times}}|E(y;q,a)|\ll_{M} \frac{x}{\log^M x}.\end{eqnarray}\]
This inequality beats the bound following from the Brun-Titchmarsch inequality by an arbitrary power of $\log x$. The statement would follow from the generalized Riemann hypothesis, as then we had pointwise $|E(x;q,a)|\ll \sqrt{x}\log x\log q$, but the averaged version is something that can actually be proved and hence use as a replacement of the GRH in several situations. We choose $z^2=\frac{N}{\log^{c(12)}N}$, whence the error is bounded by
\[\begin{eqnarray}\left(\sum_{d\leq z^2}d\cdot 6^{\nu(d)}\right)^{\frac{1}{2}}\left(\sum_{d\leq z^2\atop (d,N)=1}|E(N;d,N)|\right)^{\frac{1}{2}}&\ll& \left(\sum_{d\leq z^2} 3^{\nu(d)}\right)^{\frac{1}{2}}\cdot \frac{\sqrt{N}}{\log^6 N}\\\end{eqnarray}\]
We finally calculate
\[\begin{eqnarray}\sum_{d\leq y}6^{\nu(d)}&\leq&y\sum_{d\leq y}\frac{6^{\nu(d)}}{d}\\&\leq& y\prod_{p\leq y}\left(1+\frac{6}{p}\right)\\
&\leq & y\prod_{p\leq y}\left(1+\frac{1}{p}\right)^6\\
&\ll& y\log^6 y.\end{eqnarray}\]
&\leq & y\prod_{p\leq y}\left(1+\frac{1}{p}\right)^6\\
&\ll& y\log^6 y.\end{eqnarray}\]
and setting $y=\frac{N}{\log^{c(12)}N},$ the error is at most
\[\begin{eqnarray}\ll N(\log N)^{3-\frac{1}{2}c(12)-6}\ll \frac{N}{\log^3 N}.\end{eqnarray}\]
Thus we obtain the following.
Theorem 3. If $N$ is an even positive integer, we have
\[\begin{eqnarray}|\{(p,p'):N=p+p'\}|\leq 8\mathcal{G}\prod_{2<p\mid N}\frac{p-1}{p-2}\cdot\frac{N}{\log^2 N}\cdot\left(1+O\left(\frac{\log \log N}{\log N}\right)\right).\end{eqnarray}\]
Proof. We choose $z^2=\frac{\sqrt{N}}{\log^{c(12)}N}$. Then $S(A,P,z)\geq |\{(p,p'):N=p+p'\}|+O(\sqrt{N})$. The error term in Selberg's sieve becomes irrelevant, and we have
Proof. We choose $z^2=\frac{\sqrt{N}}{\log^{c(12)}N}$. Then $S(A,P,z)\geq |\{(p,p'):N=p+p'\}|+O(\sqrt{N})$. The error term in Selberg's sieve becomes irrelevant, and we have
\[\begin{eqnarray}\frac{1}{\log \frac{\sqrt[4]{N}}{\log^{\frac{1}{4}c(12)}N}}=\frac{4}{\log N}\left(1+O\left(\frac{\log \log N}{\log N}\right)\right);\end{eqnarray}\]
the claim is proved. ■
One expects that in $(5)$ the error term should be negligible even if $z$ is very close to $x$, say $z=x^{1-\varepsilon}$ for any $\varepsilon>0$. Indeed, the claim that the Bombieri-Vinogradov estimate holds for the very long range $1\leq q\leq x^{1-\varepsilon}$ is the Elliott-Halberstam conjecture, which seems to be out of reach, as the Bombieri-Vinogradov theorem has not been proved even for averages over $1\leq q\leq \sqrt{x}$. However, Zhang was able to prove the desired average estimate for $1\leq q\leq x^{\frac{1}{2}+\delta}$, where $\delta>0$ is small enough, but with the condition that the average is over square-free and smooth $q$ (and with the residue class $a\pmod q$ being restricted), that is those $q$ with no ''large'' prime factors. This already was enough to prove the existence of infinitely many gaps between primes.
We turn to the problem of estimating the number of primes of the form $ap+b$, where $p$ is prime and $a$ and $b$ are coprime. We choose $A=\{p\leq N: ap+b\in \mathbb{P}\}$ and $P=\mathbb{P}\setminus\{p:p\mid ab\}$. Then $\omega(p)=\frac{p}{p-1}$ for $p$ coprime to $ab$. The exact same argument as for the Golbach problem shows
\[\begin{eqnarray}&S(A,P,z)\leq 2\mathcal{G}\prod_{2<p\mid N}\frac{p-1}{p-2}\cdot\frac{N}{\log N \log z}\left(1+O\left(\frac{1}{\log z}\right)\right)\\&+\sum_{d\leq z^2\atop (d,N)=1}\mu^2(d)3^{\nu(d)}\left|E(N;d,-ba^{-1}\hspace{-0.3cm}\pmod d)\right|,\end{eqnarray}\]
and we get with the same argument as before that the error sum is $\ll \frac{N}{\log^3 N}$, independently of $a$ and $b$. Therefore we obtain the following theorem.
Theorem 4. For any integers $a>0$ and $b$ that are coprime we have
\[\begin{eqnarray}|\{p\leq N:ap+b\in \mathbb{P}\}|\leq 8\mathcal{G}\prod_{2<p\mid ab}\frac{p-1}{p-2}\cdot\frac{N}{\log^2 N}\left(1+O\left(\frac{\log \log N}{\log N}\right)\right).\end{eqnarray}\]
In particular, we get (supposedly) very good estimates for the number of twin primes and the number of Sophie Germain primes, that is primes $p$ such that $2p+1$ is also prime. The Sophie Germain primes are for example important as secret keys in the RSA cryptosystem, since if $q=2p+1$ is a prime, then attacks for finding the secret key that are based on finding small primes $q-1$ will fail.
We have already seen several successful applications of the Selberg sieve, but now we want to show that the estimate $S(A,P,z)\leq \frac{x}{G(z)}+\sum_{d\leq z^2}3^{\nu(d)}|R_d|$ given by the Selberg sieve is $\ll xW(z)$, that is what we expect up to a constant, with mild assumptions. For later purposes, we estimate instead of $G(z)$ the more general quantity
\[\begin{eqnarray}G(x,z)=\sum_{d\leq x\atop d\mid d P(z)}g(d).\end{eqnarray}\]We prove the following.
Theorem 5. Suppose
\[\begin{eqnarray}\sum_{y\leq p<z}\frac{\omega(p)\log p}{p}\leq \kappa \log \frac{z}{y}+A\quad (7)\end{eqnarray}\]
for all $2\leq y<z\leq x$ for some constants $\kappa,A$, and $\frac{\omega(p)}{p}\leq \eta<1$ for all primes $p$. For any $\lambda>0$ and $z<x$, it holds that
\[\begin{eqnarray}&&\frac{1}{G(x,z)}\leq W(z)\left(1+O\left(\exp\left(-\lambda \frac{\log x}{\log z}+\left(\frac{2\kappa}{\lambda}+\frac{C}{\log z}\right)e^{\lambda}\right)\right)\right).\end{eqnarray}\]
Assuming additionally that $|R_d|\ll \omega(d)$, noticing $G(z)=G(z,z)$, and setting $\lambda=1$, Selberg's sieve gives for $z\leq x$ the result
\[\begin{eqnarray}S(A,P,z)\ll xW(z)+O\left(\frac{z^2}{W(z)^3}\right).\end{eqnarray}\]
We start with a couple of remarks. First of all, it suffices to prove the claim about $G(x,z)$; the latter claim follows then directly from the Selberg sieve. Second, the condition $(7)$ says that $\omega(p)$ is bounded on average by $\kappa$ in a weak sense (since by Mertens' theorem, $\omega(p)=\kappa$ for all primes $p$ satisfies $(7)$); conditions like this arise often in sieve theoretic results, for example because $W(z)$ can be estimated using partial summation if we know a bound for $(7)$; a pointwise bound is not necessary for estimating $W(z)$. If the left-hand side of $(7)$ is also at least $\kappa \log \frac{z}{y}-A$, the number $\kappa$ is called the sieve dimension, and it plays an important role when it comes to applying various sieves. Third, the situation of Theorems 3 and 4 is a good example where $\kappa=1$, so $(7)$ is on average at most $1$, while the best pointwise bound for $\omega(p)=\frac{p}{p-1}$is strictly greater than $1$. In order to see that $W(z)$ is controlled by $(7)$, we note the following lemma.
Lemma 6. For $2\leq y<z$, supposing $(7)$, we have
\[\begin{eqnarray}\frac{W(z)}{W(y)}\ll \left(\frac{\log z}{\log y}\right)^{\kappa}\end{eqnarray}.\]
Proof. Using $1+t\leq e^t$, we see that
\[\begin{eqnarray}\frac{W(z)}{W(y)}\leq \prod_{y<p\leq z}(1+g(p))\leq \exp\left(\sum_{y<p\leq z}g(p)\right).\end{eqnarray}\] The rest is just a rather boring calculation applying partial summation. ■
We proceed to the proof of Theorem 5.
Proof. Notice that
With Theorem 5, we can give an upper bound of the correct order of magnitude for $S(A,P,z)$ given almost any reasonable $A$ and $P$. In particular, we could bound from above the number of primes up to $x$ represented by a polynomial. Such applications usually reduce to a routine check of the condition $(7)$, and we do not address them $-$ also because this post is already rather lengthy.
A general upper bound estimate for the Selberg sieve
\[\begin{eqnarray}G(x,z)=\sum_{d\leq x\atop d\mid d P(z)}g(d).\end{eqnarray}\]We prove the following.
Theorem 5. Suppose
\[\begin{eqnarray}\sum_{y\leq p<z}\frac{\omega(p)\log p}{p}\leq \kappa \log \frac{z}{y}+A\quad (7)\end{eqnarray}\]
for all $2\leq y<z\leq x$ for some constants $\kappa,A$, and $\frac{\omega(p)}{p}\leq \eta<1$ for all primes $p$. For any $\lambda>0$ and $z<x$, it holds that
\[\begin{eqnarray}&&\frac{1}{G(x,z)}\leq W(z)\left(1+O\left(\exp\left(-\lambda \frac{\log x}{\log z}+\left(\frac{2\kappa}{\lambda}+\frac{C}{\log z}\right)e^{\lambda}\right)\right)\right).\end{eqnarray}\]
Assuming additionally that $|R_d|\ll \omega(d)$, noticing $G(z)=G(z,z)$, and setting $\lambda=1$, Selberg's sieve gives for $z\leq x$ the result
\[\begin{eqnarray}S(A,P,z)\ll xW(z)+O\left(\frac{z^2}{W(z)^3}\right).\end{eqnarray}\]
We start with a couple of remarks. First of all, it suffices to prove the claim about $G(x,z)$; the latter claim follows then directly from the Selberg sieve. Second, the condition $(7)$ says that $\omega(p)$ is bounded on average by $\kappa$ in a weak sense (since by Mertens' theorem, $\omega(p)=\kappa$ for all primes $p$ satisfies $(7)$); conditions like this arise often in sieve theoretic results, for example because $W(z)$ can be estimated using partial summation if we know a bound for $(7)$; a pointwise bound is not necessary for estimating $W(z)$. If the left-hand side of $(7)$ is also at least $\kappa \log \frac{z}{y}-A$, the number $\kappa$ is called the sieve dimension, and it plays an important role when it comes to applying various sieves. Third, the situation of Theorems 3 and 4 is a good example where $\kappa=1$, so $(7)$ is on average at most $1$, while the best pointwise bound for $\omega(p)=\frac{p}{p-1}$is strictly greater than $1$. In order to see that $W(z)$ is controlled by $(7)$, we note the following lemma.
Lemma 6. For $2\leq y<z$, supposing $(7)$, we have
\[\begin{eqnarray}\frac{W(z)}{W(y)}\ll \left(\frac{\log z}{\log y}\right)^{\kappa}\end{eqnarray}.\]
Proof. Using $1+t\leq e^t$, we see that
\[\begin{eqnarray}\frac{W(z)}{W(y)}\leq \prod_{y<p\leq z}(1+g(p))\leq \exp\left(\sum_{y<p\leq z}g(p)\right).\end{eqnarray}\] The rest is just a rather boring calculation applying partial summation. ■
We proceed to the proof of Theorem 5.
Proof. Notice that
\[\begin{eqnarray}\frac{1}{W(z)}-G(x,z)&=&\prod_{p\leq z\atop p\in P}\frac{p}{p-\omega(p)}-G(x,z)\\&=&\prod_{p\leq z \atop p \in P}(1+g(p))-G(x,z)\\&=&\sum_{d>x}g(d)\end{eqnarray}\]
To estimate this, we use Rankin's trick: we replace the summation over $d>x$ by a summation over all $d\geq 1$, weighing each term by $\left(\frac{d}{x}\right)^{1-s}$, where $s<1$ is to be optimized later. We obtain
\[\begin{eqnarray}\sum_{d>x}g(d)\leq x^{s-1}\sum_{d=1}^{\infty} \frac{dg(d)}{d^{s}}\leq x^{s-1}\prod_{p\leq z}\left(1+\frac{\omega(p)}{p^s(1-\frac{\omega(p)}{p})}\right)\end{eqnarray}\]
Therefore
\[\begin{eqnarray}1-W(z)G(x,z)&\leq& x^{s-1}\prod_{p\leq z}\left(1-\frac{\omega(p)}{p}+\frac{\omega(p)}{p^s}\right)\\&\leq& \exp\left((s-1)\log x+\sum_{p\leq z}\omega(p)\left(p^{-s}-p^{-1}\right)\right).\end{eqnarray}\]
Since $p^{-s}-p^{-1}=p^{-1}(\exp((1-s)\log p)-1)$, the Taylor series of the exponential gives
\[\begin{eqnarray}\sum_{p\leq z}\omega(p)(p^{-s}-p^{-1})=\sum_{n=1}^{\infty}\sum_{p\leq z}\frac{\omega(p)(1-s)^n\log^n p}{p\cdot n!}\quad (8).\end{eqnarray}\]
If we denote $F(t)=\sum_ {p\leq z}\frac{\omega(p)\log p}{p}$, then by partial summation
\[\begin{eqnarray}\sum_{p\leq z}\frac{\log^n p}{p}&=&F(z)\log^{n-1}z-(n-1)\int_{1}^z \frac{F(t)\log^{n-2}t}{t}dt\\&=&F(z)\log^{n-1}z+(n-1)\int_1^z \frac{(F(z)-F(t))\log^{n-2}t}{t}dt\\&-&(n-1)F(z)\int_{1}^z \frac{\log^{n-2}t}{t}dt\\&=&F(z)\left(\log^{n-1}z-(n-1)\int_1^z \frac{\log^{n-2}t}{t}dt\right)+(n-1)\int_1^z \frac{(F(z)-F(t))\log^{n-2}t}{t}dt\\&\leq& (n-1) \int_{1}^z\frac{\kappa \log \frac{z}{t}\cdot \log^{n-2}t}{t}dt+A(n-1)\int_{1}^z \frac{\log^{n-2}t}{t}dt\\&=&\kappa(n-1)\log z \int_{1}^z \frac{\log^{n-2}t}{t}dt-\kappa(n-1)\int_{1}^z \frac{\log^{n-1}t}{t}dt+O(\log^{n-1}z)\\&=&\left(\kappa-\kappa\frac{n-1}{n}\right)\log^n z+O(\log^{n-1}z)\\&=&\frac{\kappa}{n}\log^n z+O(\log^{n-1}z).\end{eqnarray}\]
Hence $(8)$ is at most
\[\begin{eqnarray}\sum_{n=1}^{\infty}\frac{((1-s)\log z)^n}{n!}\left(\frac{\kappa}{n}+\frac{A}{\log z}\right)\leq 2\kappa\sum_{n=1}^{\infty}\frac{((1-s)\log z)^n}{(n+1)!}+\frac{A}{\log z}\sum_{n=1}^{\infty}\frac{((1-s)\log z)^n }{n!}.\end{eqnarray}\]
Setting $1-s=\frac{\lambda}{\log z}$, this is further bounded by $2\kappa \frac{e^{\lambda}}{\lambda}+\frac{A}{\log z}e^{\lambda}$. We conclude
\[\begin{eqnarray}1-W(z)G(x,z)\leq \exp\left(-\lambda \frac{\log x}{\log z}+2\kappa\frac{e^{\lambda}}{\lambda}+\frac{A}{\log z}e^{\lambda}\right)\quad (9).\end{eqnarray}\]
Denoting the right-hand side of $(9)$ by $L$, we deduce
\[\begin{eqnarray}\frac{1}{G(x,z)}\leq W(z)\left(1+\frac{L}{W(z)G(x,z)}\right),\end{eqnarray}\]
so it reamains to show $W(z)G(x,z)\gg 1$. Choose $c \in (0,1)$ such that $L=\frac{1}{2}$ for $x=z^c$ and $\lambda=1$. Then $W(z^c)G(z,z^c)\geq\frac{1}{2}$, so
\[\begin{eqnarray}\frac{1}{W(z)G(x,z)}= \frac{W(z^c)}{W(z)}\cdot \frac{1}{W(z^c)G(x,z)}\leq \frac{W(z^c)}{W(z)}\cdot \frac{1}{W(z^c)G(z,z^c)}\ll 1\end{eqnarray}\]
because $\frac{W(y)}{W(z)}\ll \left(\frac{\log^{\kappa}y}{\log^{\kappa} z}\right)$. This completes the proof. ■
With Theorem 5, we can give an upper bound of the correct order of magnitude for $S(A,P,z)$ given almost any reasonable $A$ and $P$. In particular, we could bound from above the number of primes up to $x$ represented by a polynomial. Such applications usually reduce to a routine check of the condition $(7)$, and we do not address them $-$ also because this post is already rather lengthy.