Choosing an auxiliary function
We follow mostly Halberstam and Richert's book, using the notations of the previous post about Brun sieve. In that post, we found out that if the function $\sigma(n):=\sum_{d\mid n}\mu(d)\chi(d)$ approximating the characteristic function of $n=1$ satisfies $\sigma(n)\geq 0$ or $\sigma(n)\leq 0$ for all $n$, then
\[\begin{eqnarray}S(A,P,z)&\leq&xW(z)\left(1+\sum_{1<d\mid P(z)}\sigma(d)g(d)\right)+\sum_{d\mid P(z)}|\chi(d)||R_d|\quad (1)\end{eqnarray}\]
or
\[\begin{eqnarray}S(A,P,z)\geq xW(z)\left(1+\sum_{1<d\mid P(z)}\sigma(d)g(d)\right)-\sum_{d\mid P(z)}|\chi(d)||R_d|,\quad (2)\end{eqnarray}\]
respectively. We noticed that the very simple auxiliary function $\chi(d)=1_{\{\nu(d)\leq r\}}$ gives rise to such a function $\sigma$, and from this we derived a combinatorial sieve with error term $O(z^{\log \log z})$. It turns out, however, that the choice was not optimal, and a more clever choice of $\chi(d)$ gives the desired error term of $O(z^K)$ for some $K$.
Let us consider what would be a better auxiliary function $\chi$. In the basic Brun sieve, the first error term
\[\begin{eqnarray}\sum_{1<d\mid P(z)}\sigma(d)g(d)\end{eqnarray}\]
was $o(1)$ as long as $r>C\log \log z$ for large $C$, owing to the fact that we had $\sigma(n)=0$ if $0<\nu(n)\leq r$. If this additional condition is fulfilled, the problematic error term is the second one,
\[\begin{eqnarray}\sum_{d\mid P(z)}|\chi(d)||R_d|.\end{eqnarray}\]
Fortunately, this error can be estimated much more sharply if we choose a different function $\chi$ for which $\sigma(n)$ still has constant sign. We assume $|R_d|\leq \omega(d)$ (this is usually satisfied, but there are also important exceptions), and it is also reasonable to assume $\chi(d)\in \{0,1\}$. Now if $\chi(d)=0$ for all $d\mid P(z)$ except for those having at most $m$ prime factors above $z'$, where $m\in \mathbb{Z}_+$ and $z'<z$ are parameters, we find
\[\begin{eqnarray}\sum_{d\mid P(z)}|\chi(d)|\omega(d)\leq \left(\sum_{p\leq z'}\omega(p)\right)^{M-m}\left(\sum_{p\leq z}\omega(p)\right)^m,\end{eqnarray}\]
where $M$ is the maximal number of prime factors of $d$ such that $\chi(d)\neq 0$ can hold. We suppose as before that $\omega(p)\leq A$ for some constant $A$, so the error is at most $(Az')^{M-m}z^m$. If for example $M=r$ (but $\chi$ is not supposed to be $1_{\nu(d)\leq r}$) and $m=1$, the error is at most $(Az')^{r-1}z$. If $z'$ is a lot smaller than $z$, say $z'=\sqrt{z},$ then the error is at most $(Az)^{\frac{r-1}{2}}z$. Our original error in the basic Brun sieve was $(Az)^r$, so we have basically cut the error to its square root (given that a function $\chi$ satisfying the required properties exist). The error is nevertheless too large, so we want to iterate the procedure. If we demand $d$ to have at most $k$ prime factors above $z^{\frac{1}{2^k}}$ for all $k$, and at most $r$ in total, the error is bounded by
\[\begin{eqnarray}\left(\sum_{p\leq z\atop}\omega(p)\right)\left(\sum_{p\leq z^{\frac{1}{2}}}\omega(p)\right)\left(\sum_{p\leq z^{\frac{1}{4}}}\omega(p)\right)...\leq (Az)^{1+\frac{1}{2}+\frac{1}{4}+...}=(Az)^2,\end{eqnarray}\]
so we indeed get an error term that is only a power of $z$. Our task now is to do the above argument rigorously, and with suitable parameters (we do not necessarily want to take square roots).
We use the notation $\nu_{a}(d)$ for the number of prime factors of $d$ that are at least $a$. We define two functions $\chi_1,\chi_2$ in terms of the sequence $z_r,z_{r-1},...z_1,z_0$, where $z_k=z^{e^{-\lambda (r-k)}}$ for a parameter $\lambda \in (0,1)$ and $z_r=z,z_0=2,z_{i+1}>z_i$. For $d\mid P(z)$ set
\[\begin{eqnarray}\chi_1(d)&=&1\quad \text{if}\quad \nu_{z_k}(d)\leq 2(r- k) \quad \text{for all}\quad 0\leq k\leq r-1\\\chi_2(d)&=&1\quad \text{if}\quad \nu_{z_k}(d)\leq 2(r-k)-1 \quad \text{for all}\quad 0\leq k\leq r-1\end{eqnarray}\]
and $\chi_1(d)=0,\chi_2(d)=0$ in all the other cases. This is almost what we did heuristically above. Now we prove a crucial lemma.
Lemma 1. If $\sigma_j(n)=\sum_{d\mid n}\mu(d)\chi_j(d)$ for $j=1,2$, then $\sigma_j(1)=1$ and $\sigma_1(n)\geq 0,\sigma_2(n)\leq 0$ for all square-free $n>1$.
Proof. Clearly $\sigma_j(1)=1$. We give a simple sufficient criterion for $\sigma(n)$ to have constant sign, in terms of $\chi$. If $p$ is a prime divisor of $n$, then
\[\begin{eqnarray}\sigma(n)&=&\sum_{d\mid \frac{n}{p}}\mu(d)\chi(d)+\sum_{d\mid \frac{n}{p}}\mu(pd)\chi_1(pd)\\&=&\sum_{d\mid \frac{n}{p}}\mu(d)(\chi_1(d)-\chi_1(p d)).\end{eqnarray}\]
In particular, we can choose $p=q(n)$, the smallest prime divisor of $n$. As $n$ is square-free, we have $q(d)>q(n)$, so it is enough to have
\[\begin{eqnarray}\mu(d)(\chi(d)-\chi(pd))\geq 0\quad \text{for all primes}\quad p<q(d).\end{eqnarray}\]
We check this criterion for the functions $\chi_1$ and $\chi_2$. Trivially $\chi_j(d)\geq \chi_j(pd)$, so it is enough to show that if $\chi_j(d)=1, \chi_j(pd)=0$, then $\mu(d)=(-1)^{j+1}$. Let $p\in [z_k,z_{k+1})$. As $p<q(d)$, we have $\nu(d)=\nu_{z_k}(d),$ and this must be equal to $2(r-k)$ if $\chi_1(d)=1$ and $\chi_1(pd)=0.$ Therefore $\nu(d)$ is even and $\mu(d)=1$. Similarly, we see in the case of $\chi_2$ that $\mu(d)=-1$. Now the criterion is verified, so the lemma is proved. ■
Estimating the error terms
Now we are allowed to use the formula $(1)$ or $(2)$, so we turn to error terms.
Lemma 2. If $z_k$ are as before ($\chi_1$ and $\chi_2$ are determined by them), we have
\[\begin{eqnarray}\sum_{d\mid P(z)}\chi_1(d)|R_d|\leq (Az)^{\frac{2}{1-e^{\lambda}}}.\end{eqnarray}\]
\[\begin{eqnarray}\sum_{d\mid P(z)}\chi_2(d)|R_d|\leq (Az)^{\frac{2}{1-e^{\lambda}}-1}.\end{eqnarray}\]
Proof. This follows just as in our heuristic. Indeed,
\[\begin{eqnarray}\sum_{d\mid P(z)}\chi_1(d)|R_d|&\leq& \left(1+\sum_{p\leq z}\omega(p)\right)^2\left(1+\sum_{p\leq z^{e^{-\lambda}}}\omega(p)\right)^2\left(1+\sum_{p\leq z^{e^{-2\lambda}}}\omega(p)\right)^2...\\&\leq& (Az)^{2+2e^{-\lambda}+2e^{-2\lambda}+...}=(Az)^{\frac{2}{1-e^{-\lambda}}}.\end{eqnarray}\]
and the other estimate follows similarly. ■
Lemma 3. Assume $\omega(p)\leq A$, and let $\lambda>\frac{1}{Ae}$ be a number such that
\[\begin{eqnarray}\kappa_A:=(1+10^{-10})\frac{A}{2}\lambda e^{1+\frac{A}{2}\lambda}\in (0,1).\end{eqnarray}\]
Then, for large values of $z$, it holds that
\[\begin{eqnarray}\sum_{1<d\mid P(z)}|\sigma_j(d)|g(d)\ll_A \kappa_A^{2\log \log z}\quad (3).\end{eqnarray}\]
Proof. Let $j=1;$ the other case is similar. We estimate the terms in the sum on the left of $(3)$ depending on $q(d)$. We do this because for those $d$ for which $\nu(d)$ is large (say $\geq r$), we can estimate essentially as we did in the previous post, but for small values of $\nu(d)$ we have to be careful so that our upper bound is not automatically $\gg 1$ (if $\nu(d)$ is small, we will see that $q(d)$ is large so that $g(d)$ is small). We find
\[\begin{eqnarray}\sum_{1<d\mid P(z)}|\sigma_j(d)|g(d)&=&\sum_{t=0}^{r-1}\sum_{d\mid P(z)\atop q(d)\in [z_t,z_{t+1})}|\sigma_1(d)|g(d)\\&=&\sum_{t=0}^{r-1}\sum_{d\mid P(z)\atop q(d)\in [z_t,z_{t+1})}\left|\sum_{k\mid \frac{d}{q(d)}}\mu(k)(\chi_1(k)-\chi_1(q(d)k))\right|g(d)\\&=&\sum_{t=0}^{r-1}\sum_{d\mid P(z)\atop q(d)\in [z_t,z_{t+1})}\left|\sum_{k\mid \frac{d}{q(d)}\atop \nu(k)=2(r-t)}\mu(k)\right|g(d) \quad \quad (4)\\&=&\sum_{t=0}^{r-1}\sum_{d\mid P(z)\atop q(d)\in [z_t,z_{t+1})}\binom{\nu(d)-1}{2(r-t)}g(d)\quad \quad \quad \quad (5)\\&\leq&\sum_{t=0}^{r-1}\sum_{d'\mid P(z)\atop q(d')>z_t}\binom{\nu(d')}{2(r-t)}g(d')\cdot\frac{(A+1)^2}{z_t}\quad (6),\end{eqnarray}\]
where we obtained $(4)$ from the proof of Lemma 1, $(5)$ as in the previous post, and $(6)$ by writing $d=q(d)d'$ with $q(d')>q(d)$ and applying the multiplicativity of $g$. We estimate $(6)$ further by
\[\begin{eqnarray}&\leq& \sum_{t=0}^{r-1} \frac{(A+1)^2}{z_t}\sum_{m=2(r-t)}^{\infty} \binom{m}{2(r-t)}\frac{1}{m!}\left(\sum_{z_t\leq z\leq z}g(p)\right)^m\\&=&\sum_{t=0}^{r-1}\frac{(A+1)^2}{z_t}\left(\sum_{z_t\leq p\leq z} g(p)\right)^{2(r-t)}\exp\left(\sum_{z_t\leq p\leq z}g(p)\right)\cdot \frac{1}{(2(r-t)!)}\\&\ll_A& \sum_{t=0}^{r-1}\frac{1}{z_t}\left(\sum_{z_t\leq p\leq z}\frac{A}{p}+c_A\right)^{2(r-t)}\exp\left(\sum_{z_t\leq p\leq z}\frac{A}{p}\right)\cdot \frac{1}{(2(r-t))!}\quad (7)\\&\ll_A& \sum_{t=0}^{r-1}\frac{1}{z_t} \left(\frac{Ae(r-t)\lambda+c_A}{2(r-t)}\right)^{2(r-t)}e^{A(r-t)\lambda}\quad\quad \quad \quad \quad \quad \quad \quad \quad (8)\\&\leq& \max_{0\leq t\leq r-1} \frac{1}{z_t}\left(\frac{Ae\lambda}{2}+\frac{c_A}{2(r-t)}\right)^{2(r-t)}e^{A(r-t)\lambda}\quad\quad \quad \quad \quad \quad \quad \quad \,\,\, \,(9).\end{eqnarray}\]
The first intermediate steps are just as for the basic Brun sieve. For $(7)$ we used
\[\begin{eqnarray}\sum_{z_t\leq p\leq z}g(p)=\sum_{z_t\leq p\leq z}\left(\frac{\omega(p)}{p}+\frac{\omega(p)^2}{p(p-\omega(p))}\right)\leq \sum_{z_t\leq p\leq z}\frac{\omega(p)}{p}+c_A.\end{eqnarray}\]
For $(8)$ we used Mertens' estimates (see this post) and Stirling's approximation.
When $r-t\leq 10^{10}\cdot\frac{c_A}{Ae}$, the expression $(9)$ is bounded by
\[\begin{eqnarray}\ll_A z^{-exp(-\alpha_A \lambda)}\left(\frac{Ae\lambda}{2}+\frac{1}{2Ae}\right)^{2\beta_A}=o(z^{-\varepsilon_A}),\end{eqnarray}\]
where $\alpha_A,\beta_A,\varepsilon_A>0$ are suitable constants. For other values of $r-t$, $(9)$ is at most
\[\begin{eqnarray}\frac{1}{z_t}\kappa_A^{2(r-t)},\end{eqnarray}\]
where $\kappa_A$ is asin the theorem. Hence we want to maximize over $[0,r]$ the function $f(y)=-e^{-\lambda y}\log z+2y\log \kappa_A$, where $\kappa_A\in (0,1)$ by assumption. We have $f(0)=-\log z,f(r)=-1+2r\log \kappa_A$. Also $f'(y)=0$ if and only if $y=\frac{1}{\lambda}\log\frac{2\kappa_A}{\lambda \log z}=\frac{1}{\lambda}\log \log z+O_A(1)$. For this value of $y=r-t$, the expression $(9)$ becomes
\[\begin{eqnarray}\ll_A \kappa_A^{\frac{2\log \log z}{\lambda}}\quad \quad (10),\end{eqnarray}\]
and since $2=z^{e^{-\lambda r}}$, we have $r=\frac{\log \log z}{\lambda}+O(1)$, so $(10)$ is the maximum of $(9)$ up to a constant factor. This completes the proof. ■
Combining the estimates above, we have the refined Brun sieve.
Theorem 4 (refined Brun sieve). Let $\omega(p)\leq A$, and let $\lambda\in (0,1)$ be such that $\kappa_A\in (0,1)$, where $\kappa_A$ is defined in Lemma 3. Then, for some $C_A>0,$
\[\begin{eqnarray}S(A,P,z)&\leq& xW(z)(1+C_A\cdot \kappa_A^{\frac{2\log \log z}{\lambda}})+(Az)^{\frac{2}{1-e^{-\lambda}}},\\S(A,P,z)&\geq& xW(z)(1-C_A\cdot \kappa_A^{\frac{2\log \log z}{\lambda}})+(Az)^{\frac{2}{1-e^{-\lambda}}-1}.\end{eqnarray}\]
Proof. This follows by combining Lemmas 1, 2 and 3. ■
Applications
As the error term in the sieve is a power of $z$, we may consider various problems about almost primes. We start by considering twin almost primes.
We choose $A=\{n(n+2): n\leq x\}, P=\mathbb{P}\setminus\{2\}, \omega(p)=2$. Then $A=2$ and we may choose $\lambda=0.2784$ (we want $\lambda$ to be as large as possible). Then $\frac{2}{1-e^{-\lambda}}=8.2303...$, so
\[\begin{eqnarray}(1-o(1))xW(z)+O(z^{7.3})\leq S(A,P,z)\leq(1+o(1))xW(z)+O(z^{8.3})\end{eqnarray}\]
Choosing $z=2x^{\frac{1}{8}}$, we have $S(A,P,2x^{\frac{1}{8}})\gg\frac{x}{\log^2 x}$, and similarly $S(A,P,2x^{\frac{1}{9}})\ll \frac{x}{\log^2 x}$. The values of $n$ counted by $S(A,P,2x^{\frac{1}{8}})$ are such that $n$ and $n+2$ have at most $7$ prime factors; otherwise $n$ or $n+2$ would have a prime factor below $x^{\frac{1}{8}}$. Furthermore, $S(A,P,2x^{\frac{1}{9}})$ counts the twin primes on $(x^{\frac{1}{9}},x]$ among some other numbers, so $\pi_2(x)\ll \frac{x}{\log^2 x}$, where $\pi_2$ is the counting function of twin primes. In conclusion, we have the following theorems.
Theorem 5. There are infinitely many integers $n$ such that $n$ and $n+2$ have at most $7$ prime factors, counted with multiplicities.
Theorem 6. We have $\pi_2(x)\ll \frac{x}{\log^2 x}$.
The latter estimate is believed to be optimal up to a constant factor (and it would be if we could have an error term of $o(z^2)$ in the computations above), but of course we do not even know whether $\pi_2(x)$ is unbounded.
Next we consider Goldbach's problem for almost primes. Let $x$ be any large even integer, and set $A=\{n(x-n):n\leq x\}, P=\mathbb{P}, \omega(p)=2$ if $p\nmid x$, $\omega(p)=1$ if $p\mid x$. Then again $A=2$ and we may choose $\lambda=0.2784$. We get, as before,
\[\begin{eqnarray}S(A,P,2x^{\frac{1}{8}})\gg\frac{x}{\log^2 x}, \quad S(A,P,2x^{\frac{1}{9}})\ll \prod_{p\mid x}\left(1-\frac{1}{p}\right)\prod_{p\nmid x\atop p\leq x}\left(1-\frac{2}{p}\right).\end{eqnarray}\]
The quanity $S(A,P,2x^{\frac{1}{8}})$ counts integers $n\in [1,x]$ such that $n$ and $x-n$ have at most $7$ prime factors. On the other hand, $S(A,P,2x^{\frac{1}{9}})$ is an upper bound for $N(x)$, the number of representations of $x$ as a sum of two primes. To summarize,
Theorem 7. Every large enough even integer is the sum of two numbers having at most $7$ prime factors.
Theorem 8. We have $N(x)\ll \frac{x}{\log^2 x}\prod_{p\mid x}\frac{p-1}{p-2}$, where $N(\cdot)$ is as before.
Next we consider primes represented by polynomials. Let $f(x)\in \mathbb{Z}[x]$ be a non-constant irreducible polynomial with positive leading coefficient, of degree $d$. In what follows, implicit constants may depend on the coefficients of $f$ but not on $d$. Let $s(f,p)$ be the number of solutions to $f(m)\equiv 0 \pmod p$, and suppose that $s(f,p)<p$ for all $p$; that is, $f$ has no fixed prime divisor. By Lagrange's theorem, $s(f,p)\leq d$. Clearly we also have $|A_d|=\frac{s(f,p)}{p}|A_p|+R_p, |R_p|\leq \omega(p)$, and by the Chinese remainder theorem the same holds for any $d\mid P(z)$. We can choose $A=d$, so we need to choose $\lambda$ such that
\[\begin{eqnarray}(1+10^{-10})\frac{d}{2}\lambda e^{1+\frac{d}{2}\lambda}\in(0,1).\end{eqnarray}\]
If $u=\frac{d}{2}\lambda$, we may take $u$ to be the solution of $ue^{u}=\frac{1-10^{-10}}{e}$, which is $0.2784...$. Then $\lambda=\frac{0.5569...}{d}$ and $\frac{2}{1-e^{-\lambda}}\leq 4.31d$ (by applying $e^{-t}\leq 1-t+\frac{t^2}{2}$ for $t\geq 0$). Now the Brun sieve gives
\[\begin{eqnarray}S(A,P,x^{\frac{1}{4.31d-0.1}-1})\gg \frac{x}{\log^d x}, \quad S(A,P,x^\frac{1}{0.431d-0.1})\ll x\prod_{p\leq x\atop s(f,p)>0}\left(1-\frac{1}{p}\right)\end{eqnarray}\]
(the upper bound is rather rough). By the Chebotarev density theorem, $s(f,p)>0$ happens for the proportion $\frac{G_P}{\#G}$ of primes (in the sense of the asymptotic density), where $G$ is the Galois group of the field extension of $\mathbb{Q}$ arising from $f$, and $G_P$ is the number of those permutations $\sigma \in G$ ($G$ is a subgroup of the permutation group $S_d$) with just one cycle. Let $S(d)$ be the number of permutations of $d$ elements with just one cycle. Then $s(f,p)>0$ for at least density $\frac{S(d)}{d!}\leq \frac{1}{d}$ of primes. By partial summation, we see that
\[\begin{eqnarray}\sum_{p\leq x\atop s(f,p)>0}\frac{1}{p}\geq (1-o(1))\cdot \frac{1}{d}\log \log x.\end{eqnarray}\]
Therefore
\[\begin{eqnarray}S(A,P,x^{\frac{1}{4.31d-0.1}})\ll \frac{x}{\log^{\frac{1}{d}} x}.\end{eqnarray}\]
We obtained the following theorems.
Theorem 9. If $f$ is a polynomial as above, there are infinitely many positive integers $n$ such that $f(n)$ has at most $\frac{1}{\frac{1}{4.31d-0.1}-1}$ prime divisors, counted with multiplicities. In particular, if $f$ is a quadratic polynomial satisfying the conditions, and in the special case $d=2$, there are infinitely many positive integers $n$ with $f(n)$ having at most $7$ prime factors with multiplicities (since then we have $\omega(p)\leq 2$, so the error term is the same as in the twin prime problem).
Theorem 10. If $f$ is a polynomial as above, there are $\ll \frac{x}{\log^{\frac{1}{d}}x}$ primes of the form $f(n)$ with $n\in[1,x]$, where the implicit constant may depend on the coefficients of $f$ but not on $d$.