In this post, we derive a simple form of Brun's combinatorial sieve that is already superior to the Eratosthenes-Legendre sieve and is sufficient to prove for example that the sum of the reciprocals of twin prime converges. We will see, however, that this sieve gives upper bounds that are slightly weaker than the optimal ones, such as $\pi(x)\ll x\cdot \frac{\log \log x}{\log x}$. In a later post, we will derive a more powerful Brun sieve that enables us to get rid of the extra factors $\log \log x$ and allows us to choose the sieve parameter $z$ to be a fixed power of $x$.

**Some notation**
We use the following notation. The symbols $c$ and $C$ represent positive constants whose values may vary on different occurrences (they could be computed explicitly, but that is unnecessary for us). The notation $f(x)\ll g(x)$ means that $\left|\frac{f(x)}{g(x)}\right|$ is bounded as $x\to \infty$. The quantities $\theta$ and $\theta'$ may depend on all the variables but satisfy $|\theta|,|\theta'|\leq 1$. The function $\nu(m)$ tells the number of distinct prime factors of $m$. Otherwise we use the notations of this previous post. In particular, we are sifting a set $A$ of size $x$ with the set primes $P\subset \mathbb{P}$; $S(A,P,z)$ is the number of elements of $A$ that are coprime to all primes of $P$ that are not greater than $z$. By $P(z)$ we mean the product of the primes of $P$ up to $z$. We denote $A_d=\{a\in A:a\equiv 0 \pmod d\}$ and work under the assumption that $|A_d|=\frac{\omega(d)}{d}|A|+\theta R_d$, where $|R_d|\leq \omega(d)$ and $\omega$ is a multiplicative function defined on the square-free positive integers. In many applications, this is satisfied by the Chinese remainder theorem (while in some other applications that are dealt with in later posts, the condition fails but remains true when averaged), and this says more or less that sifting different primes is independent. The product $W(z)=\prod_{p\in P, p\leq z}(1-\frac{\omega(p)}{p})$ arises naturally in our sieve estimates due to independence. We follow mostly the book of Halberstam and Richert.

**Combinatorial setting**
The starting point for developing the Brun sieve is to realize why the sieve of Eratosthenes-Legendre gives rather weak results. That sieve was based on the function $\sum_{d\mid n}\mu(d)$ being the characteristic function of $n=1$, so that it is able to isolate the numbers $a\in A$ satisfying $(a,P(z))=1$. But then our error term becomes $\sum_{d\mid P(z)}\mu(d)R_d$, which is a very long sum with $2^{\pi(z)}$ terms. Since the numbers $R_d$ could be almost anything small, we cannot expect much cancellation, so we are forced to use the triangle inequality to get the bound $O(2^{\pi(z)})$ for the error term. Even if we proved some cancellation in the sum, the ideal situation would probably be that $\sum_{d\mid P(z)}\mu(d)R_d$ would behave like a sum of $2^{\pi(z)}$ random variables with absolute values $c$, and then the sum would be approximately $2^{\frac{\pi(z)}{2}}$, which is not any better for us. For this reason, the Brun sieve uses a different function $\sigma(n)$ with $\sigma(1)=1$ that is not exactly the characteristic function of $n=1$ but nearly enough. It is reasonable to choose $\sigma(n)=\sum_{d\mid n}\mu(d)\chi(d)$ for a suitable function $\chi$ with $\chi(1)=1$. It seems that if $\chi$ is sometimes equal to $1$ and usually equal to $0$, our error term will involve a shorter sum and the approximation of $\sigma$ for the characteristic function can be quite good. Before choosing what $\sigma$ exactly is, let us see what this approximation generally gives us.

By Möbius inversion, $\mu(d)\chi(d)=\sum_{k\mid d}\mu\left(\frac{d}{k}\right)\sigma(k)$, so

\[\begin{eqnarray}\sum_{a\in A}\sigma((a,P(z)))&=&\sum_{a\in A}\sum_{d\mid (a,P(z))}\mu(d)\chi(d)\\&=&\sum_{a\in A}\sum_{d\mid (a,P(z))}\sum_{k\mid d} \mu\left(\frac{d}{k}\right)\sigma(k)\\&=&\sum_{a\in A}\sum_{d\mid (a,P(z))}\mu(d)+\sum_{a\in A}\sum_{d\mid (a,P(z))}\sum_{1<k\mid d} \mu\left(\frac{d}{k}\right)\sigma(k)\quad \quad\\&=&S(A,P,z)+\sum_{1<k\mid P(z)}\sum_{t\mid \frac{P(z)}{k}} \sum_{a\in A_{tk}} \mu(t)\sigma(k)\quad (\text{let}\quad d=tk)\\&=&S(A,P,z)+\sum_{1<k\mid P(z)}\sigma(k)S(A_k,P^{(k)},z) \quad \quad (1),\end{eqnarray}\]

where $P^{(k)}=\{p\in P: p\nmid k\}$, because $\sum_{t\mid Q(z)}\mu(t)|B_{t}|=S(B,Q,z)$ for any set $B$ any set set $Q$ of primes. In the computation above, $Q=P^{(k)}$.

On the other hand,

\[\begin{eqnarray}\sum_{a\in A}\sigma((a,P(z)))&=&\sum_{a\in A}\sum_{d\mid (a,P(z))}\mu(d)\chi(d)\\&=&\sum_{d\mid P(z)}\sum_{a\in A_d}\mu(d)\chi(d)\\&=&\sum_{d\mid P(z)}\mu(d)\chi(d)|A_d| \quad \quad (2).\end{eqnarray}\]

Equating $(1)$ and $(2)$, we get the following formula which is fundamental in deriving the Brun sieve.

\[\begin{eqnarray}S(A,P,z)=\sum_{d\mid P(z)}\mu(d)\chi(d)|A_d|-\sum_{1<k\mid P(z)}\sigma(k)S(A_k,P^{(k)},z)\quad \quad (3).\end{eqnarray}\]

This formula tells that if $\sigma(k)$ is zero or close to it for small values of $k>1$ (that is, those values for which $S(A,P^{(k)},z)$ is large), the first sum in $(3)$ is a good approximation for $S(A,P,z)$. Moreover, if we choose $\chi$ so that $\sigma(k)$ does not change its sign, the first sum in $(3)$ is an upper or lower bound for $S(A,P,z)$ and the more complicated sum can be ignored. Therefore, if we choose functions $\chi_1$ and $\chi_2$ in such a way that the corresponding functions $\sigma_j(n)=\sum_{d\mid n}\mu(d)\chi_j(d), j=1,2$ satisfy $\sigma_1(n)\leq 0\leq \sigma_2(n)$ for $n>1$, then

\[\begin{eqnarray}\sum_{d\mid P(z)}\mu(d)\chi_1(d)|A_d| \leq S(A,P,z)\leq \sum_{d\mid P(z)}\mu(d)\chi_2(d)|A_d|\quad \quad (4).\end{eqnarray}\]

Given our sieve assumption $|A_d|=\frac{\omega(d)}{d}|A|+R_d$, we get

\[\begin{eqnarray} S(A,P,z)= |A|\sum_{d\mid P(z)}\mu(d)\chi_j(d)\frac{\omega(d)}{d}+\theta\sum_{d\mid P(z)}|\chi_j(d)||R_d| \quad \quad (5),\end{eqnarray}\]

for $j=1,2$, where $|\theta|\leq 1$ but it may depend on all the variables. The reason we used triangle inequality to the error term is the same as before: We know little about $|R_d|$ and possible cancellation in the error term is not expected to be significant.

We say that the first sum in $(5)$ is $(5.1)$, while the second one is $(5.2)$. We will see that $(5.1)$ is the main term. Before choosing the functions $\chi_1,\chi_2$, let us express $(5.1)$ using $\sigma_j(n)$. By Möbius inversion,

\[\begin{eqnarray} \sum_{d\mid P(z)}\mu(d)\chi(d)\frac{\omega(d)}{d}&=&\sum_{d\mid P(z)}\sum_{k\mid d}\mu\left(\frac{d}{k}\right)\sigma_j(k)\frac{\omega(d)}{d}\\&=&\sum_{k\mid P(z)}\sum_{t\mid \frac{P(z)}{k}}\mu(k)\frac{\omega(k)}{k}\sigma_j(k)\mu(t)\frac{\omega(t)}{t}\\&=&\sum_{k\mid P(z)}\mu(k)\frac{\omega(k)}{k}\sigma_j(k)\frac{W(z)}{\prod_{p\mid k,p\in P}\left(1-\frac{\omega(p)}{p}\right)}\\&=&W(z)\left(1+\sum_{1<k\mid P(z)}\sigma(d)g(d)\right) \quad\quad (6),\end{eqnarray}\]

where $g(d)=\frac{\omega(d)}{d\prod_{p\mid d}(1-\frac{\omega(p)}{p})}$, $W(z)=\prod_{p\leq z,p\in P}(1-\frac{\omega(p)}{p})$, and we used the multiplicativity of $\frac{\omega(d)}{d}$ and the fact that

\[\begin{eqnarray}\sum_{t\mid Q(z)}\mu(t)\frac{\omega(t)}{t}=\prod_{p\leq z,p\in Q}\left(1-\frac{\omega(p)}{p}\right)\end{eqnarray}\]

for any set $Q$ of primes.

**Choosing the sieve functions**
Now we choose the functions $\chi_1$, $\chi_2$. As already mentioned, letting $\chi_1(d)$ and $\chi_2(d)$ to be $1$ for small values of $d$ and $0$ for the rest should give a good approximation functions $\sigma_1$ and $\sigma_2$ for the characteristic function. As the computations are going to be somewhat messy, we could take $\chi_j$ to be very simple:

\[\begin{eqnarray}\chi_j(d)=\begin{cases}1, \quad\text{if}\quad \nu(d)\leq r\\0, \quad \text{if}\quad \nu(d)>r,\end{cases}\end{eqnarray}\]

where $\nu(\cdot)$ is the number of prime factors and $r$ is to be chosen later; in particular $r$ depends on the index $j$. This choice seems fairly good, as now $\sigma_j(n)=0$ for all $n>1$ such that all the prime factors of $n$ are bounded by $r$. Now the question is whether the condition that $\sigma_j(n)$ does not change sign is satisfied. This is indeed the case because

\[\begin{eqnarray}\sigma_j(n)&=&\sum_{\nu(d)\leq r}\mu(d)\\&=&\sum_{k=0}^{r}(-1)^k\binom{\nu(d)}{k}\\&=&(-1)^r\binom{\nu(d)-1}{r},\end{eqnarray}\]

where the last step follows from a formula that was proved in the post about the Eratosthenes-Legnedre sieve. Hence for $j=1$, we can choose $r$ to be any odd positive integer and for $j=2$ any even integer. The approach with these $\chi_1,\chi_2$ is essentially based on the Bonferroni inequalities

which tell that the inclusion-exclusion principle yields alternatively too large and small bounds and whose proofs is based on the binomial identity above. Therefore, we sacrifice exactness in the inclusion-exclusion principle to shorten the sum in it significantly.

**Estimation of error terms**
We amke the assumption $|R_d|\leq \omega(d),$ which allows us to estimate the sum $(5.2)$. It turns out that we could use much cruder bounds in what follows, but as this is not clear a priori, we do not do so. We have

\[\begin{eqnarray}\sum_{d\mid P(z)}|\chi_j(d)||R_d|&\leq& \sum_{d\mid P(z)}\omega(d)\\&\leq& \sum_{d\mid P(z)\atop \nu(d)\leq r} \omega(d)\\&\leq& \frac{S_1^r}{r!}+\frac{S_1^{r-1}}{(r-1)!}+...+\frac{S_1^0}{0!},\\&\leq& \frac{S_1^r}{(r-1)!}\end{eqnarray}\]

where $S_1=\sum_{p\leq z,p\in P}\omega(p)$ and we observe that when $S_1^r$ is multiplied out, each term $\omega(d)$ with $\nu(d)=r$ occurs $r!$ times. Now we have expressed $(5.1)$ in a more useful form and estimated the error $(5.2)$. The expression $(6)$, which equals $(5.1)$, involves still one sum $\sum_{1<k\mid P(z)}g(d)\sigma(d)$ that we want to bound. The function $g$ is easily seen to be multiplicative, so

\[\begin{eqnarray}\sum_{1<k\mid P(z)}g(d)\sigma_j(d)&\leq& \sum_{m=r}^{\pi(z)}\sum_{\nu(d)=m\atop d\mid P(z)}g(d)\sigma_j(d)\\&\leq& \sum_{m=r}^{\pi(z)}\binom{m}{r}\cdot \frac{1}{m!}\left(\sum_{p\leq z,p\in P}g(p)\right)^{m}\quad \text{since}\quad |\sigma_j(d)|\leq \binom{\nu(d)}{r}\\&=&\sum_{m=r}^{\pi(z)}\binom{m}{r}\cdot \frac{1}{m!}S^m\\&\leq& \frac{S^r}{r!}\sum_{m=r}^{\infty} \frac{1}{(m-r)!}S^{m-r}\\&=& \frac{S^r}{r!}e^{S},
\end{eqnarray}\]

where $S=\sum_{p\leq z, p\in P}g(p)$. We replaced a finite sum by the exponential function, but we have $\sum_{m=0}^n\frac{x^m}{m!}\geq e^x-1$ when $n\geq 2x$ by comparison with a geometric series. This means that replacing the finite sum by an infinite one costs us almost nothing as long as $\pi(z)-r\geq 2S_1$. We will soon see that $S_1\ll \log \log z$ with a mild assumption, while it turns out that $r$ should be chosen to be as small as $c\log \log z$.

We make the assumption $1-\frac{\omega(p)}{p}\leq \frac{1}{A}$ (that is, the proportion of sifted residue classes $\pmod p$ is bounded by a constant less than $1$, which is satisfied in almost any sensible application), so that $g(p)\leq A\frac{\omega(p)}{p}$. This yields

\[\begin{eqnarray}\sum_{1<k\mid P(z)}g(d)\sigma(d)&\leq \frac{1}{r!}\left(A\sum_{p\leq z\atop p \in P}\frac{\omega(p)}{p}\right)^r\exp\left(A\sum_{p\leq z \atop p \in P}\frac{\omega(p)}{p}\right).
\end{eqnarray}\]

Finally, we get rid of the factorial with $\frac{1}{r!}\leq \left(\frac{e}{r}\right)^r,$ so that

\[\begin{eqnarray}\sum_{1<k\mid P(z)}g(d)\sigma(d)\leq \left(\frac{AeS_2}{r}\right)^re^{AS_2},\end{eqnarray}\]

where $S_2=\sum_{p\leq z,p\in P}\frac{\omega(p)}{p}.$ We have now estimated the main term and the error term in $(5)$ and $(6)$, so

\[\begin{eqnarray}S(A,P,z)\leq |A|W(z)\left(1+\theta\left(\frac{AeS_2}{r}\right)^{r}e^{AS_2}\right)+\theta' \frac{S_1^{r}}{(r-1)!}\end{eqnarray}\]

by $(3)$ for even $r$, and for odd $r$ the inequality is reversed. Combining the inequalities for $r$ and $r+1$ and using $S_2\leq z$, we obtain the following.

**Theorem 1 (Basic Brun sieve, minimal assumptions).**Assume $1-\frac{\omega(p)}{p}\geq \frac{1}{A}$ for some $A>0$ and all $p\in P$, and let $r$ be an even positive integer with $r\leq \pi(z)$. Then

\[\begin{eqnarray}&|A|W(z)\left(1-\left(\frac{AeS_2}{r-1}\right)^{r-1}e^{AS_2}\right)-\left(\frac{eS_1}{r-1}\right)^{r-1}\leq S(A,P,z),\\&S(A,P,z)\leq |A|W(z)\left(1+\left(\frac{AeS_2}{r}\right)^{r}e^{AS_2}\right)+\left(\frac{eS_1}{r}\right)^r,\end{eqnarray}\]

where $S_1=\sum_{p\leq z,P\in P}\omega(p),S_2=\sum_{p\leq z,P\in P}\frac{\omega(p)}{p}$.

**Proof.**Given above. ■

This form of the Brun sieve is very general, but unfortunately it is not very useful unless $\omega(p)$ is small as a function of $p$, at least on average. To see the weakness of this sieve for large $\omega(p),$ consider sifting the set $A=\{(n-1^2)(n-2^2))...(n-z^2):n\leq x\}$, which is closely related to quadratic nonresidues (if an element $a\in A$ is coprime to $P(z)$, then the corresponding $n$ is a quadratic nonresidue modulo all the odd primes up to $z$). Then $\omega(p)=\frac{p-1}{2}$ for odd primes $p$, $\omega(2)=0,$ and $S_1\leq z\pi(z),S_2\leq \frac{\pi(z)}{2}$ and $W(z)=\prod_{2<p\leq z}\frac{p+1}{2p}\leq c(\log z)2^{-\pi(z)}$ (the estimates are good enough). We obtain

\[\begin{eqnarray}S(A,P,z)\leq c x(\log z)\cdot 2^{-\pi(z)}\left(1+\left(\frac{Ae \pi(z)}{2r}\right)^r e^{\frac{A\pi(z)}{2}}\right)+z\left(\frac{ez\pi(z)}{r}\right)^r\end{eqnarray}\]

We see that for all $r<\frac{Ae\pi(z)}{2}$, the error is at least $c^{\pi(z)}$ for some $c>1$, which is essentially what the Eratosthenes-Legendre sieve gives, so we must choose $r$ differently. However, if $\frac{Ae\pi(z)}{2}<r\leq z$, then the second error term is at least $\left(\pi(z)\right)^{\frac{Ae\pi(z)}{2}}$, for some other constant $C$, and this is more or less the same error term as in the sieve of Eratosthenes (in fact, a bit worse). Hence we do not get anything new with this sieve if $\omega(p)$ is large (for $z\leq c\log x$ we do get the same as what Erastosthenes' sieve gives, that is $S(A,P,z)\sim x\cdot 2^{-\pi(z)}$). A suitable sieve for applications where $\omega(p)$ is at least a constant times $p$ is the large sieve, as the name indicates.

The example above suggests that we should assume that $\omega(p)$ is small on average, say

\[\begin{eqnarray}\sum_{p\leq z, P\in P}\frac{\omega(p)}{p}\ll \log \log z.\end{eqnarray}\]

For simplicity, we actually assume that $\omega(p)\leq B$ for some constant $B$ as interesting choices for $\omega(p)$ are rarely bounded on average without being bounded.

We have the estimates $S_1\leq B\pi(z)$, $S_2\leq B(\log \log z+1)$ for $z\geq C$ (the latter follows from Mertens' theorem proved in this post). Hence the Brun sieve becomes

\[\begin{eqnarray}S(A,P,z)=|A|W(z)\left(1+\theta\left(\frac{ABe(\log \log z+1)}{r}\right)^re^{\frac{AB}{2}(\log \log z+1)}\right)+\theta'\left(\frac{Be\pi(z)}{r}\right)^r.
\end{eqnarray}\]

if $r=f(z)$ where $f(z)\to \infty$ as $z\to \infty$ (then $r$ can be replaced by $r-1$ in the Brun sieve). To assure that the first error term is not terribly large, we must choose $r\gg \log \log z$. The second error term is almost $\left(\frac{z}{r}\right)^r=\exp(r\log \frac{z}{r})$, which is increasing for $r\leq \pi(z)$ (and on a wider range). Hence the optimal choice is $r=\lambda \log \log z$ for some constant $\lambda$ (originally $r$ was supposed to be an integer, but when we are left with nice functions with real variables, we can just interpolate from the case of integer $r$ to the general case). We want the first error term to be $o(1)$, which is equivalent to $\left(\frac{ABe}{\lambda}\right)^{\lambda \log \log z}=o(e^{-\frac{AB\log \log z}{2}})$, that is, $\left(\frac{ABe}{\lambda}\right)^{\lambda}<e^{-\frac{AB}{2}}$. Let us choose $\lambda=\lambda_0+1$, where $\lambda_0$ is the infimum of solutions to $\left(\frac{ABe}{\lambda}\right)^{\lambda}<e^{-\frac{AB}{2}}$. The first error is now $O((\log z)^{-c})$ for some $c>0$. The second error term is for $z\geq C$ at most $z^{\lambda \log \log z}.$ Therefore we have derived another form of the Brun sieve.

**Theorem 2 (Brun sieve, $\omega(p)$ bounded).**Let $\omega(p)\leq B<p$ for some constant $B$ and all primes $p$. Then there exist constants $c,C,\lambda>0$ such that

\[\begin{eqnarray}S(A,P,z)=|A|W(z)(1+(\log z)^{-c})+z^{\lambda \log \log z}\end{eqnarray}\]

for $z\geq C$. In particular, if $z=f(x)$ for any function $f$ with $f(x)\leq x^{\frac{1}{2\lambda \log \log x}},$ we have

\[\begin{eqnarray}S(A;P,z)\sim xW(z)\end{eqnarray}\]

as $|A|=x\to \infty.$

**Proof.**The first claim was already proved. The second one follows from the fact that $z^{\lambda \log \log z}\leq \sqrt{x}$ with the given assumptions, while

\[\begin{eqnarray}xW(z)\geq x\prod_{p\leq x}\left(1-\frac{B}{p}\right)&=&x\exp\left(\sum_{p\leq x}\log\left(1-\frac{B}{p}\right)\right)\\&\geq& x\exp\left(-\sum_{p\leq x}\frac{B}{p}+O(1)\right)\gg \frac{x}{(\log x)^B}.\end{eqnarray}\]

by the Taylor series of the logarithm and Mertens' theorem. ■

**Applications**
Clearly, the range $\log z\leq \frac{1}{2\lambda}\frac{\log x}{\log \log x}$, on which the Brun sieve gives asymptotic results, is far larger than the range $\log z\leq \log \log x$ allowed by the Eratosthenes-Legendre sieve. In fact, the range of reasonable values of $z$ in the Brun sieve is only slightly shorter than ranges of the form $\log z \leq \varepsilon \log x$, so we may expect results that are superior to the ones given by the Eratosthenes-Legendre sieve sieve but not quite enough to give lower bounds for the number of almost primes (this will be corrected in a later post by choosing more clever functions $\chi$ for the Brun sieve).

First we investigate the function $\pi(x,z)$. The choice $A=[1,x]\cap \mathbb{Z}, P=\mathbb{P},\omega(p)=1$ for all $p$ gives

\[\begin{eqnarray}\pi(x,z)\sim xW(z)\sim x\frac{e^{-\gamma}}{\log z}\end{eqnarray}\]

by Mertens' theorem for $z\leq x^{\frac{1}{2\lambda\log \log x}}$, so that in particular $\pi(x)\ll \frac{x}{\log x}\log \log x$, which falls just short from the Chebychev bound $\pi(x)\ll \frac{x}{\log x}$.

For the twin prime problem, we also get a non-trivial estimate, with $A=\{n(n+2):n\leq x\}, P=\mathbb{P}$ and $\omega(p)=2$ for odd primes and $\omega(2)=1$. The estimate is

\[\begin{eqnarray}\pi_2(x)\ll x\prod_{p\leq x^{\frac{1}{2\lambda \log \log x}}}\left(1-\frac{2}{p}\right)\ll \frac{x}{\log^2 x}(\log \log x)^2. \end{eqnarray}\]

This combined with the Chebychev's elementary result $\pi(x)\gg \frac{x}{\log x}$ shows that almost all primes are non-twin primes. Moreover, by partial summation,

\[\begin{eqnarray}\sum_{p\leq x\atop p\, \text{twin prime}}\frac{1}{p}&=&\frac{\pi_2(x)}{x}+\int_{2}^{x}\frac{\pi_2(t)}{t^2}dt\\&\ll& 1+\int_{2}^x \frac{1}{t(\log t)^{\frac{3}{2}}}dt\ll 1,\end{eqnarray}\]

so that the sum of the reciprocals of the twin primes converges. It is good to note that we get the identical upper bound for the number of Sophie Germain primes and many other special classes of primes, as then $\omega(p)=2$ for odd primes $p$. Despite the impressive results, the factor $(\log \log x)^2$ in these estimates is unnecessary and can be removed with a stronger form of the Brun sieve.