We will prove the Hales-Jewett theorem, as its corollary the van der Waerden theorem, and a generalization of the latter, the Gallai-Witt theorem. The van der Waerden theorem from 1927 states that, given k and r, there is N such that if \{1,2,...,N\} is colored with r colors, there is a monochromatic arithmetic progression of length k. The smallest such N is the van der Waerden number W(k,r), and although they seem to grow fairly rapidly, the bounds given by the traditional double induction proofs are unimaginably large compared to what is expected. However, in 1988, Shelah found a one variable induction proof of the Hales-Jewett theorem in Primitive Recursive Bounds for Van Der Waerden Numbers, and this implied a bound that is still large but nothing compared to the previous bounds, which were functions that grow faster than any function defined by primitive recursion. We present in this post a proof of the Hales-Jewett theorem similar to Shelah's proof, and consider its consequences to the van der Waerden theorem and related results.
The Hales-Jewett theorem is the following assertion.
Theorem 1 (Hales-Jewett theorem). Let k,r\geq 2 be positive integers. Then there exists a positive integer N such that whenever \{1,2,...,k\}^N is colored with r colors, there is a monochromatic combinatorial line, that is, a collection P_1,...,P_kof k points such that in some coordinates the P_i are equal, while all the other coordinates in P_i have value i.
The smallest such N is called the Hales-Jewett number HJ(k,r). The combinatorial lines are very natural structures in the lattice cube, somewhat similarly to arithmetic progressions in the integers. As an example, the combinatorial lines in the 3\times 3 square \{1,2,3\}^2 are
\begin{eqnarray}&&\{(1,1),(1,2),(1,3)\}, \{(1,1),(2,1),(3,1)\}, \{(2,1),(2,2),(3,2)\},\{(2,1),(2,2),(3,2)\},\\&&\{(3,1),(3,2),(3,3)\},\{(3,1),(3,2),(3,3)\},\{(1,1),(2,2),(3,3)\}.\end{eqnarray}
Note that these lines would count as a victory in a tic-tac-toe game on a 3\times 3 board, and more generally combinatorial lines are always winning lines in a tic-tac-toe on some board. The additional requirement for a combinatorial line is that it should be increasing. Now the Hales-Jewett theorem immediately implies the following (not very practical) statement about tic-tac-toe. Given k and r, if r players are playing tic-tac-toe on a hypercube of side length k, then as long as the dimension is large enough, there will be no draw. We will see that the Hales-Jewett theorem, despite assuming almost no structure on the set of integers, has strong consequences in arithmetic Ramsey theory.
Before we prove the Hales-Jewett theorem, we consider some of its corollaries, which are by themselves important. We start by proving that the van der Waerden theorem follows from the Hales-Jewett theorem, which is stronger. For brevity, we sometimes denote \{1,2,...,k\} by [k] and \{1,2,...,k\}^m by [k]^m.
Theorem 2 (van der Waerden). Let k,r\geq 2 be positive integers. Then there exists N such that whenever \{1,2,...,N\} is colored with r colors, there is a monochromatic AP (arithmetic progression) of legth k. The smallest such N is denoted by W(k,r).
Proof. Let N=HJ(k,r), so that any r-coloring of [k]^N has a monochromatic combinatorial line. Let \chi be any r-coloring of \{1,2,...,k^N\}. This induces an r-coloring of [k]^r by interpreting the k-ary number
\begin{eqnarray}a_0k^0+a_1k^1+...+a_{N-1}k^{N-1},\quad 0\leq a_i\leq k-1\end{eqnarray}
as (a_0+1,...,a_{N-1}+1)\in [k]^N. By the definition of N, this coloring of [k]^N has a monochromatic combinatorial line. Let the variable coordinates of this line be the coordinates on the places d_1,...,d_j. Returning to the situation of the k-ary numbers, all the numbers
\begin{eqnarray}n\cdot k^{d_1}+n\cdot k^{d_2}+...+n\cdot k^{d_j}+m,\,n=0,1,...,k-1\end{eqnarray}
have the same color for some m whose k-ary representation has 0 in the places d_1,...,d_j. This gives us an arithmetic progression of length k. ■
The proof above actually gives a strengthening of the van der Waerden theorem, namely for large enough M any r-coloring of \{1,2,...,M\} contains an AP whose common difference is a number having only zeros and ones in its k-ary representation. Such numbers are rare; up to x there are around x^{\frac{\log 2}{\log k}} of them (but an asymptotic formula cannot be given because of fluctuations). Hence the van der Waerden theorem with the common difference of this form is much stronger than the original theorem. We also see that the van der Waerden numbers are bounded by
\begin{eqnarray}W(k,r)\leq k^{HJ(k,r)}.\end{eqnarray}
One could obtain better bounds, such as
\begin{eqnarray}W(k,r)\leq (k-1)HJ(k,r)+1,\end{eqnarray}
which is seen by mapping the points in [k]^n to \{1,2,...,(k-1)n+1\} by x\mapsto x_1+...+x_n+1. Nevertheless, this makes little difference, since the bounds we obtain for HJ(k,r) grow faster than any power tower, so the bound for W(k,r) is practically the same as for HJ(k,r).
The classical proofs of the van der Waerden theorem are based on double induction on k and r, where W(k,r) is bounded in terms of W(k,r-1) and W(k-1,r') for some enormously large r'. For this reason, the bound obtained for W(k,2) grows even faster than the Ackermann numbers, and this was a problem in all the known proofs until Shelah in 1988 gave a new proof of the Hales-Jewett theorem based on induction on a single variable, which implied that W(k,2) is at most f_5(10k), say, where
\begin{eqnarray}&&f_1(a,b)=a+b,\\&&f_2(a,0)=0\\&&f_n(a,0)=1\quad \text{for}\quad n\geq 3\\&&f_n(a,b)=f_{n-1}(a,f_n(a,b-1)))\quad \text{for}\quad b\geq 1\end{eqnarray}
and
\begin{eqnarray}f_n(a)=f_n(a,a).\end{eqnarray}
In other words, f_1 is the addition function, f_2 the multiplication function, f_3 the exponentiation function, and so forth. This is the type of bound we will prove later in this posty. In 2001, Gowers was able to prove Szemeredi's theorem, which generalizes the van der Waerden theorem, using methods from Fourier analysis, and this gave a bound for W(k,r) that is much better than anything known previously:
\begin{eqnarray}W(k,r)\leq T(2,2,r,2,2,k+9),\end{eqnarray}
where T denotes a tower of exponentiations, so that for example T(a,b,c)=a^{b^c}. This bound is still very far from the best known lower bound, which is approximately 2^k for r=2. As an illustartion of the probabilistic method that was already used in this post, we prove an exponential lower bound (with worse exponent).
Theorem 3. We have
\begin{eqnarray}W(k,2)\geq 2^{\frac{k-1}{2}}.\end{eqnarray}
Proof. Let m\leq 2^{\frac{k-1}{2}}. Color the integers in \{1,2,...,m\} randomly with two colors, using each color with probability \frac{1}{2}. The probability that a fixed k-AP is monochromatic is 2^{1-k}. The number of k-APs in \{1,2,...,m\} is \leq m^2 since if a+dn,n=0,...,k-1 is the progression, we have a\leq m and d\leq m. Now the probability that there is a monochromatic AP is at most
\begin{eqnarray}m^22^{1-k}:=p.\end{eqnarray}
We have p<1 for m<2^{\frac{k-1}{2}}, so the proof is complete. ■
It is believed that W(k,2)<2^{k^2}; Graham has offered \$1000 for proving it, so the knowledge about the van der Waerden numbers is even much more limited than knowledge on the Ramsey numbers R(k,k).
Although little is known about bounds for the van der Waerden theorem, we can consider generalizations for it. As in the case of Ramsey's theorem, we could ask whether the whole \mathbb{Z}_+ actually contains an infinite monochromatic substructure, when it is colored with r colors. Of course, Theorem 2 implies that if \mathbb{Z}_+ is colored with r colors, there is a color having arbitrarily long APs (they are actually equivalent via a compactness argument). But does every r-coloring of \mathbb{Z}_+ contain not only APs of arbitrary length, but an AP of infinite length? It turns out that this is not the case.
Theorem 4. There is a 2-coloring of \{1,2,3,...\}, which has no infinite AP.
Proof. Color the integers in I_k=[3^k,3^{k+1}) red if k is even and blue if k is odd. This gives a coloring of \{1,2,3,...\}. Let A be an infinite AP in this coloring, and let a<b<c be such that A intersects I_a,I_b and I_c. Let x be the largest element in I_b\cap A and y the smallest element in I_c\cap A. By construction, b\leq c-2, so x\leq \frac{y}{3}. But now any AP of positive integers counting x and y must start with x. This means that I_a\cap A is empty, a contradiction. ■
Another possible direction for generalizing van der Waerden's theorem is increasing the dimension, a bit similarly to the hypergraph Ramsey theorem. The Hales-Jewett theorem is indeed strong enough to yield a d-dimensional version of the van der Waerden theorem. If we color \mathbb{Z}^d with r colors, we may ask, what kind of monochromatic configurations are guaranteed to be found. Naturally, we must identify two configurations that can be transformed into each other by translation and scaling by an integer factor. It is not even clear that we can find a monochromatic "square", that is, a set of four points parallel to the coordinate axes and with equal distances. However, the beautiful result for this problem is that we can find any configuration of fixed size.
Theorem 5 (Gallai-Witt). Let U be a finite set of points in \mathbb{Z}^d. If \mathbb{Z}^d is colored with r colors, we can find a homothetic copy of U (say V=a+\lambda U with a\in \mathbb{Z}^d and \lambda \in \mathbb{Z}_+) which is monochromatic.
Proof. Let U=\{u_1,u_2,...,u_k\}, and let \mathbb{Z}^d be r-colored according to \chi. Consider V^n, where n=HJ(k,r). Setting f:V^n\mapsto \mathbb{Z}^d,f((x_1,...,x_n))=x_1+...+x_n, we can define a coloring \chi' on V^n by \chi'((x_1,...,x_n))=\chi(x_1+...+x_n). Since the Hales-Jewett theorem assumes no additional structure on the ground set, we may apply it, and find a combinatorial line L\subset V^n which is monochromatic under \chi'. Let J be the set of constant coordinates of L and \lambda the number of variable coordinates. Now f(L)\subset \mathbb{Z}^d and
\begin{eqnarray}f(L)=\left\{\sum_{i\in J}v_i+\lambda v_1,\sum_{i\in J}v_i+\lambda v_2,...,\sum_{i\in J}v_i+\lambda v_k\right\}\end{eqnarray}
is monochromatic under \chi. Defining a=\sum_{i\in J}v_i, we are done. ■
Proof of the Hales-Jewett theorem
We start by proving the case k=2 of the Hales-Jewett theorem, which is by far the easiest case. All we need for the upper bound is the pigeonhole principle, and the upper bound is easily seen to be exact.
Theorem 6. We have HJ(2,r)=r for all r\geq 2.
Proof. We first show that HJ(2,r)\leq r. If [2]^r is colored with r colors, the pigeonhole principle tells that some two of the points
\begin{eqnarray}(1,1,...,1),\,(2,1,...,1),\,(1,2,...,1),\,(1,1,...,2)\end{eqnarray}
have the same color, and this gives a two point combinatorial line.
Next we show that [2]^{r-1} can be colored with r colors without producing any monochromatic combinatorial lines. Interpret [2]^{r-1} as the binary numbers in \{0,1,...,2^{r-1}-1\} by
\begin{eqnarray}(x_1,...,x_{r-1})\mapsto \sum_{i=0}^{r-2}f(x_i)2^i,\end{eqnarray}
where f(1)=0,f(1)=1. In \{0,1,...,2^{r-1}-1\}, define a coloring \chi as follows:
\begin{eqnarray}\chi(0)=0,\, \chi(2^{a_1}+...+2^{a_k})=(a_1+...+a_k+k)\pmod r,\end{eqnarray}
when 0\leq a_1<...<a_k<r-1, with x\pmod r interpreted as the least nonnegative residue of x modulo r. This gives an r-coloring of \chi. Notice that \chi(0)\neq \chi(2^i) for all i and if 0\leq a_1<...<a_k<r-1 and b\in \{0,1,...,r-2\}\setminus\{a_1,...,a_k\}, then
\begin{eqnarray}\chi(2^{a_1}+....+a^{a_k})\neq \chi(2^{a_1}+....+2^{a_k}+2^b).\end{eqnarray}
Indeed, otherwise we would have b\equiv r-1\pmod r, which is impossible. Since binary numbers differing by one digit correspond to combinatorial lines in the original setting, we see that \chi makes no combinatorial line monochromatic. ■
We turn to the general case of the Hales-Jewett theorem. Shelah's proof is by induction on k only, and the idea is to reduce the situation to the case of the previous Hales-Jewett number HJ(k-1,r) by finding a subspace of [k]^N, where the coloring is not altered by swapping some of the coordinates from 1 to 2. For this we need HJ(2,r') for r' much larger than r, but we know the exact value of HJ(2,r'), so the growth of the bound for HJ(k,r) will not be even nearly as fast in the more classical proof, where HJ(k-1,r') for some huge r' is applied to bound HJ(k,r). Shelah's proof gives us a bound that grows much slower than the Ackermann numbers, and in fact we can bound HJ(k,r) by an iterated tetration operation, where tetration means iterated exponentiation.
Theorem 7 (Hales-Jewett theorem, primitive recursive bounds). Let k, r\geq 2 be positive integers. Then HJ(k,r) exists, and moreover
\begin{eqnarray}HJ(k,r)\leq f_5(M,k),\end{eqnarray}
where M=2\max\{k,r\}+1 and f_5 is as above.
Proof. Assume HJ(k-1,r) exists (for k=3 this is known); we will bound HJ(k,r) in terms of it. Let N_1,...,N_m be a sequence, which will be fixed later. Consider an r-coloring \chi of [k]^n=[k]^{n-t_m}\times [k]^{N_m}. We define a coloring \chi' on the last N_m coordinates, defining for x\in [k]^{N_m} its color as
\begin{eqnarray}\chi'(x)=(\chi((y,x)):y\in [k]^{n-N_m}),\end{eqnarray}
where some ordering is fixed for the set [k]^{n-N_m}. Now \chi' is a r^{k^{n-N_m}}-coloring of [k]^{N_m}, and if N_m\geq HJ(2,r^{k^{N-N_m}}), we can find a combinatorial line L_m\subset [k]^{N_m} such that its first two points have the same color under \chi' (because [k]^{N_m} certainly contains [2]^{N_m}). We now restrict our attention to the subspace [k]^{N-N_m}\times L_m of the whole space. For any x in this subspace, any 1s and 2s among its last N_m coordinates can be swapped. Indeed, if y\neq x is obtained from x by changing some of the 1s and 2s among the last N_m coordinates, then x and y must be the first two points of L_m (because they differ at some non-constant coordinate and have 1 and 2 as values of this coordinate). By the definition of \chi', changing x to y alters nothing in the original coloring \chi.
We repeat the argument, considering now [k]^{N-N_m-N_{m-1}}\times [k]^{N_{m-1}}\times L_m. Define a coloring \chi'' of [k]^{N_{m-1}}, defining for x\in [k]^{N_{m-1}} its color as
\begin{eqnarray}\chi''(x)=(\chi((y,x,z)):y\in [k]^{n-N_m-N_{m-1}},z_0\in L_m),\end{eqnarray}
where some ordering is fixed for the set [k]^{N-N_{m}-N_{m-1}} and z_0\in L_m is fixed arbitrarily. This time \chi'' is a r^{k^{N-N_m-N_{m-1}}}-coloring of [k]^{N_{m-1}}, and we can find a combinatorial line L_{m-1}\subset [k]^{N_{m-1}} such that its first two points have the same color under \chi'', provided that N_{m-1}\geq HJ(2,r^{k^{N-N_m-N_{m-1}}}). Again, by the definition of \chi'' and \chi', if some of the last N_m+N_{m-1} coordinates of an element from [k]^{N-N_m-N_{m-1}}\times L_{m-1}\times L_m are swapped from 1 to 2, the color of the element does not change under \chi.
We continue in this way until we have found a subspace L_1\times L_2\times ...\times L_m\subset [k]^{N}\subset [k]^{N_1}\times ...\times [k]^{N_m} such that if in any elment of it some coordinates are swapped from 1 to 2, the color under \chi remains the same. Let L_i' be the combinatorial line L_i with its first point excluded. Then L_1'\times...\times L_m' is a subspace of \{2,...,k\}^{N} with at least m independent variable coordinates (each line has some variable coordinate). This means that if we choose m=HJ(k-1,r), the subspace contains a monochromatic (under \chi) combinatorial line, say L. Add to this line L the point that was previously removed (the one where the variable coordinates are 1), say a. Since \chi does not distinguish between the values 1 and 2 for coordinates in L_1\times...\times L_m, \chi gives the same color to a as to L, so L\cup\{a\} is a monochromatic combinatorial line in [k]^{N}.
We conclude that we may choose N=N_1+...+N_m with
\begin{eqnarray}&&N_1=r,\\&&N_{i+1}=r^{k^{N_1+...+N_{i-1}}},\,i\leq m-1\\&&m=HJ(k-1,r).\end{eqnarray}
The reason for N_1=r is that in [k]^r we can find a combinatorial line L_1 such that its first two points have the same color under \chi. Clearly, N_i grow so fast that N_i\geq 2N_{i-1}, and hence N_{i+1}\leq r^{k^{2N_i}}. If K=\max{k,r}, we obtain
\begin{eqnarray}N_m\leq (2K)^{(2K)^{...^{2K}}},\end{eqnarray}
where 2K occurs 2m times. In other words,
\begin{eqnarray}HJ(k,r)\leq f_4(2K,2\cdot HJ(k-1,r)),\end{eqnarray}
so defining a_k=2\cdot HJ(k,r) we have
\begin{eqnarray}a_k\leq 2f_4(2K,a_{k-1})\leq f_4(M,a_{k-1}).\end{eqnarray}
Iterating this gives
\begin{eqnarray}HJ(k,r)\leq a_k\leq f_5(M,k),\end{eqnarray}
because a_1<M. This completes the proof. ■
Although we have seen that the Hales-Jewett theorem is a very general result, we should mention the much deeper density Hales-Jewett theorem. It says that given any positive integers k and real number \delta>0, there exists N such that whenver we take at least the proportion \delta of the lattic points in [k]^N, they contain a combinatorial line. This theorem generalizes even Szemeredi's theorem, and was proved first by Furstenberg and Katznelson in 1991 using methods from Ergodic theory. In 2009, a Polymath project succeeded in giving a different, simpler proof (though still 34 pages), with explicit bounds for the functions involved.
Although we have seen that the Hales-Jewett theorem is a very general result, we should mention the much deeper density Hales-Jewett theorem. It says that given any positive integers k and real number \delta>0, there exists N such that whenver we take at least the proportion \delta of the lattic points in [k]^N, they contain a combinatorial line. This theorem generalizes even Szemeredi's theorem, and was proved first by Furstenberg and Katznelson in 1991 using methods from Ergodic theory. In 2009, a Polymath project succeeded in giving a different, simpler proof (though still 34 pages), with explicit bounds for the functions involved.