Thursday, March 12, 2015

Hales-Jewett theorem

We consider some classical theorems in arithmetic Ramsey theory, where one wants to find some monochromatic arithmetic structure (an arithmetic progression, for example) when a large set of integers or lattice points in $\mathbb{Z}^d$ is colored with several colors. Many such results follow from the Hales-Jewett theorem from 1963, stating that given any $k$ and $r$, there is a dimension $N$ such that when the lattice points of a cube of side length $k$ and dimension $N$ are colored with $r$ colors, there is a monochromatic combinatorial line. A combinatorial line is a line of length $k$ which is constant in some coordinates and increases with slope $1$ in the others.

We will prove the Hales-Jewett theorem, as its corollary the van der Waerden theorem, and a generalization of the latter, the Gallai-Witt theorem. The van der Waerden theorem from 1927 states that, given $k$ and $r$, there is $N$ such that if $\{1,2,...,N\}$ is colored with $r$ colors, there is a monochromatic arithmetic progression of length $k$. The smallest such $N$ is the van der Waerden number $W(k,r)$, and although they seem to grow fairly rapidly, the bounds given by the traditional double induction proofs are unimaginably large compared to what is expected. However, in 1988, Shelah found a one variable induction proof of the Hales-Jewett theorem in Primitive Recursive Bounds for Van Der Waerden Numbers, and this implied a bound that is still large but nothing compared to the previous bounds, which were functions that grow faster than any function defined by primitive recursion. We present in this post a proof of the Hales-Jewett theorem similar to Shelah's proof, and consider its consequences to the van der Waerden theorem and related results.


Van der Waerden theorem and its generalizations

The Hales-Jewett theorem is the following assertion.

Theorem 1 (Hales-Jewett theorem). Let $k,r\geq 2$ be positive integers. Then there exists a positive integer $N$ such that whenever $\{1,2,...,k\}^N$ is colored with $r$ colors, there is a monochromatic combinatorial line, that is, a collection $P_1,...,P_k$of $k$ points such that in some coordinates the $P_i$ are equal, while all the other coordinates in $P_i$ have value $i$.

The smallest such $N$ is called the Hales-Jewett number $HJ(k,r)$. The combinatorial lines are very natural structures in the lattice cube, somewhat similarly to arithmetic progressions in the integers. As an example, the combinatorial lines in the $3\times 3$ square $\{1,2,3\}^2$ are
\[\begin{eqnarray}&&\{(1,1),(1,2),(1,3)\}, \{(1,1),(2,1),(3,1)\}, \{(2,1),(2,2),(3,2)\},\{(2,1),(2,2),(3,2)\},\\&&\{(3,1),(3,2),(3,3)\},\{(3,1),(3,2),(3,3)\},\{(1,1),(2,2),(3,3)\}.\end{eqnarray}\]
Note that these lines would count as a victory in a tic-tac-toe game on a $3\times 3$ board, and more generally combinatorial lines are always winning lines in a tic-tac-toe on some board. The additional requirement for a combinatorial line is that it should be increasing. Now the Hales-Jewett theorem immediately implies the following (not very practical) statement about tic-tac-toe. Given $k$ and $r$, if $r$ players are playing tic-tac-toe on a hypercube of side length $k$, then as long as the dimension is large enough, there will be no draw. We will see that the Hales-Jewett theorem, despite assuming almost no structure on the set of integers, has strong consequences in arithmetic Ramsey theory.

Before we prove the Hales-Jewett theorem, we consider some of its corollaries, which are by themselves important. We start by proving that the van der Waerden theorem follows from the Hales-Jewett theorem, which is stronger. For brevity, we sometimes denote $\{1,2,...,k\}$ by $[k]$ and $\{1,2,...,k\}^m$ by $[k]^m$.

Theorem 2 (van der Waerden). Let $k,r\geq 2$ be positive integers. Then there exists $N$ such that whenever $\{1,2,...,N\}$ is colored with $r$ colors, there is a monochromatic AP (arithmetic progression) of legth $k$. The smallest such $N$ is denoted by $W(k,r)$.

Proof. Let $N=HJ(k,r)$, so that any $r$-coloring of $[k]^N$ has a monochromatic combinatorial line. Let $\chi$ be any $r$-coloring of $\{1,2,...,k^N\}$. This induces an $r$-coloring of $[k]^r$ by interpreting the $k$-ary number
\[\begin{eqnarray}a_0k^0+a_1k^1+...+a_{N-1}k^{N-1},\quad 0\leq a_i\leq k-1\end{eqnarray}\]
as $(a_0+1,...,a_{N-1}+1)\in [k]^N$. By the definition of $N$, this coloring of $[k]^N$ has a monochromatic combinatorial line. Let the variable coordinates of this line be the coordinates on the places $d_1,...,d_j$. Returning to the situation of the $k$-ary numbers, all the numbers
\[\begin{eqnarray}n\cdot k^{d_1}+n\cdot k^{d_2}+...+n\cdot k^{d_j}+m,\,n=0,1,...,k-1\end{eqnarray}\]
have the same color for some $m$ whose $k$-ary representation has $0$ in the places $d_1,...,d_j.$ This gives us an arithmetic progression of length $k$. ■

The proof above actually gives a strengthening of the van der Waerden theorem, namely for large enough $M$ any $r$-coloring of $\{1,2,...,M\}$ contains an AP whose common difference is a number having only zeros and ones in its $k$-ary representation. Such numbers are rare; up to $x$ there are around $x^{\frac{\log 2}{\log k}}$ of them (but an asymptotic formula cannot be given because of fluctuations). Hence the van der Waerden theorem with the common difference of this form is much stronger than the original theorem. We also see that the van der Waerden numbers are bounded by
\[\begin{eqnarray}W(k,r)\leq k^{HJ(k,r)}.\end{eqnarray}\]
One could obtain better bounds, such as
\[\begin{eqnarray}W(k,r)\leq (k-1)HJ(k,r)+1,\end{eqnarray}\]
which is seen by mapping the points in $[k]^n$ to $\{1,2,...,(k-1)n+1\}$ by $x\mapsto x_1+...+x_n+1$. Nevertheless, this makes little difference, since the bounds we obtain for $HJ(k,r)$ grow faster than any power tower, so the bound for $W(k,r)$ is practically the same as for $HJ(k,r)$.

The classical proofs of the van der Waerden theorem are based on double induction on $k$ and $r$, where $W(k,r)$ is bounded in terms of $W(k,r-1)$ and $W(k-1,r')$ for some enormously large $r'$. For this reason, the bound obtained for $W(k,2)$ grows even faster than the Ackermann numbers, and this was a problem in all the known proofs until Shelah in 1988 gave a new proof of the Hales-Jewett theorem based on induction on a single variable, which implied that $W(k,2)$ is at most $f_5(10k)$, say, where
\[\begin{eqnarray}&&f_1(a,b)=a+b,\\&&f_2(a,0)=0\\&&f_n(a,0)=1\quad \text{for}\quad n\geq 3\\&&f_n(a,b)=f_{n-1}(a,f_n(a,b-1)))\quad \text{for}\quad b\geq 1\end{eqnarray}\]
and
\[\begin{eqnarray}f_n(a)=f_n(a,a).\end{eqnarray}\]
In other words, $f_1$ is the addition function, $f_2$ the multiplication function, $f_3$ the exponentiation function, and so forth. This is the type of bound we will prove later in this posty. In 2001, Gowers was able to prove Szemeredi's theorem, which generalizes the van der Waerden theorem, using methods from Fourier analysis, and this gave a bound for $W(k,r)$ that is much better than anything known previously:
\[\begin{eqnarray}W(k,r)\leq T(2,2,r,2,2,k+9),\end{eqnarray}\]
where $T$ denotes a tower of exponentiations, so that for example $T(a,b,c)=a^{b^c}$. This bound is still very far from the best known lower bound, which is approximately $2^k$ for $r=2$. As an illustartion of the probabilistic method that was already used in this post, we prove an exponential lower bound (with worse exponent).

Theorem 3. We have
\[\begin{eqnarray}W(k,2)\geq 2^{\frac{k-1}{2}}.\end{eqnarray}\]

Proof. Let $m\leq 2^{\frac{k-1}{2}}$. Color the integers in $\{1,2,...,m\}$ randomly with two colors, using each color with probability $\frac{1}{2}$. The probability that a fixed $k$-AP is monochromatic is $2^{1-k}$. The number of k-$AP$s in $\{1,2,...,m\}$ is $\leq m^2$ since if $a+dn,n=0,...,k-1$ is the progression, we have $a\leq m$ and $d\leq m$. Now the probability that there is a monochromatic $AP$ is at most
\[\begin{eqnarray}m^22^{1-k}:=p.\end{eqnarray}\]
We have $p<1$ for $m<2^{\frac{k-1}{2}}$, so the proof is complete. ■

It is believed that $W(k,2)<2^{k^2}$; Graham has offered $\$1000$ for proving it, so the knowledge about the van der Waerden numbers is even much more limited than knowledge on the Ramsey numbers $R(k,k)$.

Although little is known about bounds for the van der Waerden theorem, we can consider generalizations for it. As in the case of Ramsey's theorem, we could ask whether the whole $\mathbb{Z}_+$ actually contains an infinite monochromatic substructure, when it is colored with $r$ colors. Of course, Theorem 2 implies that if $\mathbb{Z}_+$ is colored with $r$ colors, there is a color having arbitrarily long APs (they are actually equivalent via a compactness argument). But does every $r$-coloring of $\mathbb{Z}_+$ contain not only APs of arbitrary length, but an AP of infinite length? It turns out that this is not the case.

Theorem 4. There is a $2$-coloring of $\{1,2,3,...\}$, which has no infinite $AP$.

Proof. Color the integers in $I_k=[3^k,3^{k+1})$ red if $k$ is even and blue if $k$ is odd. This gives a coloring of $\{1,2,3,...\}$. Let $A$ be an infinite $AP$ in this coloring, and let $a<b<c$ be such that $A$ intersects $I_a,I_b$ and $I_c$. Let $x$ be the largest element in $I_b\cap A$ and $y$ the smallest element in $I_c\cap A$. By construction, $b\leq c-2$, so $x\leq \frac{y}{3}$. But now any AP of positive integers counting $x$ and $y$ must start with $x$. This means that $I_a\cap A$ is empty, a contradiction. ■

Another possible direction for generalizing van der Waerden's theorem is increasing the dimension, a bit similarly to the hypergraph Ramsey theorem. The Hales-Jewett theorem is indeed strong enough to yield a $d$-dimensional version of the van der Waerden theorem. If we color $\mathbb{Z}^d$ with $r$ colors, we may ask, what kind of monochromatic configurations are guaranteed to be found. Naturally, we must identify two configurations that can be transformed into each other by translation and scaling by an integer factor. It is not even clear that we can find a monochromatic "square", that is, a set of four points parallel to the coordinate axes and with equal distances. However, the beautiful result for this problem is that we can find any configuration of fixed size.

Theorem 5 (Gallai-Witt). Let $U$ be a finite set of points in $\mathbb{Z}^d$. If $\mathbb{Z}^d$ is colored with $r$ colors, we can find a homothetic copy of $U$ (say $V=a+\lambda U$ with $a\in \mathbb{Z}^d$ and $\lambda \in \mathbb{Z}_+$) which is monochromatic.

Proof. Let $U=\{u_1,u_2,...,u_k\}$, and let $\mathbb{Z}^d$ be $r$-colored according to $\chi$. Consider $V^n$, where $n=HJ(k,r)$. Setting $f:V^n\mapsto \mathbb{Z}^d,f((x_1,...,x_n))=x_1+...+x_n$, we can define a coloring $\chi'$ on $V^n$ by $\chi'((x_1,...,x_n))=\chi(x_1+...+x_n)$. Since the Hales-Jewett theorem assumes no additional structure on the ground set, we may apply it, and find a combinatorial line $L\subset V^n$ which is monochromatic under $\chi'$. Let $J$ be the set of constant coordinates of $L$ and $\lambda$ the number of variable coordinates. Now $f(L)\subset \mathbb{Z}^d$ and
\[\begin{eqnarray}f(L)=\left\{\sum_{i\in J}v_i+\lambda v_1,\sum_{i\in J}v_i+\lambda v_2,...,\sum_{i\in J}v_i+\lambda v_k\right\}\end{eqnarray}\]
is monochromatic under $\chi.$ Defining $a=\sum_{i\in J}v_i$, we are done. ■

Proof of the Hales-Jewett theorem

We start by proving the case $k=2$ of the Hales-Jewett theorem, which is by far the easiest case. All we need for the upper bound is the pigeonhole principle, and the upper bound is easily seen to be exact.

Theorem 6. We have $HJ(2,r)=r$ for all $r\geq 2$.

Proof. We first show that $HJ(2,r)\leq r$. If $[2]^r$ is colored with $r$ colors, the pigeonhole principle tells that some two of the points
\[\begin{eqnarray}(1,1,...,1),\,(2,1,...,1),\,(1,2,...,1),\,(1,1,...,2)\end{eqnarray}\]
have the same color, and this gives a two point combinatorial line.

Next we show that $[2]^{r-1}$ can be colored with $r$ colors without producing any monochromatic combinatorial lines. Interpret $[2]^{r-1}$ as the binary numbers in $\{0,1,...,2^{r-1}-1\}$ by
\[\begin{eqnarray}(x_1,...,x_{r-1})\mapsto \sum_{i=0}^{r-2}f(x_i)2^i,\end{eqnarray}\]
where $f(1)=0,f(1)=1$. In $\{0,1,...,2^{r-1}-1\}$, define a coloring $\chi$ as follows:
\[\begin{eqnarray}\chi(0)=0,\, \chi(2^{a_1}+...+2^{a_k})=(a_1+...+a_k+k)\pmod r,\end{eqnarray}\]
when $0\leq a_1<...<a_k<r-1$, with $x\pmod r$ interpreted as the least nonnegative residue of $x$ modulo $r$. This gives an $r$-coloring of $\chi$. Notice that $\chi(0)\neq \chi(2^i)$ for all $i$ and if $0\leq a_1<...<a_k<r-1$ and $b\in \{0,1,...,r-2\}\setminus\{a_1,...,a_k\}$, then
\[\begin{eqnarray}\chi(2^{a_1}+....+a^{a_k})\neq \chi(2^{a_1}+....+2^{a_k}+2^b).\end{eqnarray}\]
Indeed, otherwise we would have $b\equiv r-1\pmod r$, which is impossible. Since binary numbers differing by one digit correspond to combinatorial lines in the original setting, we see that $\chi$ makes no combinatorial line monochromatic. ■

We turn to the general case of the Hales-Jewett theorem. Shelah's proof is by induction on $k$ only, and the idea is to reduce the situation to the case of the previous Hales-Jewett number $HJ(k-1,r)$ by finding a subspace of $[k]^N$, where the coloring is not altered by swapping some of the coordinates from $1$ to $2$. For this we need $HJ(2,r')$ for $r'$ much larger than $r$, but we know the exact value of $HJ(2,r')$, so the growth of the bound for $HJ(k,r)$ will not be even nearly as fast in the more classical proof, where $HJ(k-1,r')$ for some huge $r'$ is applied to bound $HJ(k,r)$. Shelah's proof gives us a bound that grows much slower than the Ackermann numbers, and in fact we can bound $HJ(k,r)$ by an iterated tetration operation, where tetration means iterated exponentiation.

Theorem 7 (Hales-Jewett theorem, primitive recursive bounds). Let $k, r\geq 2$ be positive integers. Then $HJ(k,r)$ exists, and moreover
\[\begin{eqnarray}HJ(k,r)\leq f_5(M,k),\end{eqnarray}\]
where $M=2\max\{k,r\}+1$ and $f_5$ is as above.

Proof.  Assume $HJ(k-1,r)$ exists (for $k=3$ this is known); we will bound $HJ(k,r)$ in terms of it. Let $N_1,...,N_m$ be a sequence, which will be fixed later. Consider an $r$-coloring $\chi$ of $[k]^n=[k]^{n-t_m}\times [k]^{N_m}$. We define a coloring $\chi'$ on the last $N_m$ coordinates, defining for $x\in [k]^{N_m}$ its color as
\[\begin{eqnarray}\chi'(x)=(\chi((y,x)):y\in [k]^{n-N_m}),\end{eqnarray}\]
where some ordering is fixed for the set $[k]^{n-N_m}$. Now $\chi'$ is a $r^{k^{n-N_m}}$-coloring of $[k]^{N_m}$, and if $N_m\geq HJ(2,r^{k^{N-N_m}})$, we can find a combinatorial line $L_m\subset [k]^{N_m}$ such that its first two points have the same color under $\chi'$ (because $[k]^{N_m}$ certainly contains $[2]^{N_m}$). We now restrict our attention to the subspace $[k]^{N-N_m}\times L_m$ of the whole space. For any $x$ in this subspace, any $1$s and $2$s among its last $N_m$ coordinates can be swapped. Indeed, if $y\neq x$ is obtained from $x$ by changing some of the $1$s and $2$s among the last $N_m$ coordinates, then $x$ and $y$ must be the first two points of $L_m$ (because they differ at some non-constant coordinate and have $1$ and $2$ as values of this coordinate). By the definition of $\chi'$, changing $x$ to $y$ alters nothing in the original coloring $\chi.$

We repeat the argument, considering now $[k]^{N-N_m-N_{m-1}}\times [k]^{N_{m-1}}\times L_m.$ Define a coloring $\chi''$ of $[k]^{N_{m-1}}$, defining for $x\in [k]^{N_{m-1}}$ its color as
\[\begin{eqnarray}\chi''(x)=(\chi((y,x,z)):y\in [k]^{n-N_m-N_{m-1}},z_0\in L_m),\end{eqnarray}\]
where some ordering is fixed for the set $[k]^{N-N_{m}-N_{m-1}}$ and $z_0\in L_m$ is fixed arbitrarily. This time $\chi''$ is a $r^{k^{N-N_m-N_{m-1}}}$-coloring of $[k]^{N_{m-1}}$, and we can find a combinatorial line $L_{m-1}\subset [k]^{N_{m-1}}$ such that its first two points have the same color under $\chi''$, provided that $N_{m-1}\geq HJ(2,r^{k^{N-N_m-N_{m-1}}})$. Again, by the definition of $\chi''$ and $\chi'$, if some of the last $N_m+N_{m-1}$ coordinates of an element from $[k]^{N-N_m-N_{m-1}}\times L_{m-1}\times L_m$ are swapped from $1$ to $2$, the color of the element does not change under $\chi$.

We continue in this way until we have found a subspace $L_1\times L_2\times ...\times L_m\subset [k]^{N}\subset [k]^{N_1}\times ...\times [k]^{N_m}$ such that if in any elment of it some coordinates are swapped from $1$ to $2$, the color under $\chi$ remains the same. Let $L_i'$ be the combinatorial line $L_i$ with its first point excluded. Then $L_1'\times...\times L_m'$ is a subspace of $\{2,...,k\}^{N}$ with at least $m$ independent variable coordinates (each line has some variable coordinate). This means that if we choose $m=HJ(k-1,r)$, the subspace contains a monochromatic (under $\chi$) combinatorial line, say $L$. Add to this line $L$ the point that was previously removed (the one where the variable coordinates are $1$), say $a$. Since $\chi$ does not distinguish between the values $1$ and $2$ for coordinates in $L_1\times...\times L_m$, $\chi$ gives the same color to $a$ as to $L$, so $L\cup\{a\}$ is a monochromatic combinatorial line in $[k]^{N}$.

We conclude that we may choose $N=N_1+...+N_m$ with
\[\begin{eqnarray}&&N_1=r,\\&&N_{i+1}=r^{k^{N_1+...+N_{i-1}}},\,i\leq m-1\\&&m=HJ(k-1,r).\end{eqnarray}\]
The reason for $N_1=r$ is that in $[k]^r$ we can find a combinatorial line $L_1$ such that its first two points have the same color under $\chi$. Clearly, $N_i$ grow so fast that $N_i\geq 2N_{i-1}$, and hence $N_{i+1}\leq r^{k^{2N_i}}$. If $K=\max{k,r}$, we obtain
\[\begin{eqnarray}N_m\leq (2K)^{(2K)^{...^{2K}}},\end{eqnarray}\]
where $2K$ occurs $2m$ times. In other words,
\[\begin{eqnarray}HJ(k,r)\leq f_4(2K,2\cdot HJ(k-1,r)),\end{eqnarray}\]
so defining $a_k=2\cdot HJ(k,r)$ we have
\[\begin{eqnarray}a_k\leq 2f_4(2K,a_{k-1})\leq f_4(M,a_{k-1}).\end{eqnarray}\]
Iterating this gives
\[\begin{eqnarray}HJ(k,r)\leq a_k\leq f_5(M,k),\end{eqnarray}\]
because $a_1<M.$ This completes the proof. ■

Although we have seen that the Hales-Jewett theorem is a very general result, we should mention the much deeper density Hales-Jewett theorem. It says that given any positive integers $k$ and real number $\delta>0$, there exists $N$ such that whenver we take at least the proportion $\delta$ of the lattic points in $[k]^N$, they contain a combinatorial line. This theorem generalizes even Szemeredi's theorem, and was proved first by Furstenberg and Katznelson in 1991 using methods from Ergodic theory. In 2009, a Polymath project succeeded in giving a different, simpler proof (though still 34 pages), with explicit bounds for the functions involved.