## Thursday, October 16, 2014

### Fourier transform and its mapping properties

We present some classical results about the Fourier transform in $\mathbb{R}^n$ in this post, and some of them will be applied in later posts to partial differential equations, additive number theory, and other topics. In particular, we consider the validity of the Fourier inversion theorem in various forms, the $L^2$ theory of the Fourier transform and the mapping properties of the Fourier transform in many spaces.

Introduction

The Fourier transform is the natural continuous analogue of Fourier series, where we integrate over $\mathbb{R}$ or $\mathbb{R}^n$ instead of $[0,1]$ (or equivalently the reals modulo $1$, since in the case of Fourier series we study $1$-periodic functions). These earlier posts [1, 2, 3, 4, 5, 6, 7] deal with Fourier series and the discrete Fourier transform. Thus
$\begin{eqnarray}\hat{f}(\xi):=\int_{\mathbb{R}^n}f(x)e^{-2\pi ix\xi} dx\end{eqnarray}$
is the Fourier transform of an integrable function $f:\mathbb{R}^n\to \mathbb{\mathbb{C}}$n ($x\cdot \xi$ is the ordinary inner product of vectors, the natural analogue of multiplication in $\mathbb{R}^n$). By integrable functions, we mean $L^1(\mathbb{R}^n)$, that is Lebesgue integrable functions, since it is much more convenient to work with them. The integral defining the Fourier transform is finite as a Lebesgue integral if and only if $f\in L^1(\mathbb{R}^n)$, so this space is indeed a natural space for studying the Fourier transform (but not the only one, as we will see).

Similarly to the theory of Fourier series, a central problem regarding the Fourier transform is to determine when
$\begin{eqnarray}f(x)=\int_{\mathbb{R}^n}\hat{f}(\xi)e^{2\pi i x\xi},\quad (1)\end{eqnarray}$
holds. This is the continuous version of the Fourier expansion. We are not only concerned with the validity of the above equality pointwise (in fact, if $f\in L^p$ for some $p>1$, the integral does not necessarily converge even if $\hat{f}$ is a well-defined function); it might be as well in an average sense, in $L^p$, or in some other sense. It turns out that many theorems in Fourier analysis of periodic functions correspond to theorems in continuous Fourier analysis, including $L^2$ theory and conditions under which a function can be represented as a Fourier series or in the form $(1)$. Nevertheless, many new questions arise, such as how to define the Fourier transform in $L^p(\mathbb{R}^n)$ for $p>1$ and to find out how it maps in different spaces (When does it map from $L^p$ to $L^q$, or what can be said about the images of interesting spaces under the Fourier transform). Since $L^1([0,1])$ contains the spaces $L^p([0,1])$, the mentioned questions become much more interesting in the continuous case, and in fact it turns out that the only way to define the Fourier transform in $L^p(\mathbb{R}^n)$ for $p>2$ is to develop the theory of distributions, which can be thought as generalized functions. Another interesting aspect that occurs in $\mathbb{R}^n$ but not in $[0,1]$ is that when the validity of $(1)$ in $L^p$ may happen in several non-equivalent ways
$\begin{eqnarray}f(x)&=&L^p-\lim_{R\to \infty} \int_{[-R,R]^n}\hat{f}(\xi)e^{2\pi i x\cdot \xi}d\xi,\quad (2)\\f(x)&=&L^p-\lim_{R\to \infty} \int_{B(0,R)}\hat{f}(\xi)e^{2\pi i x\cdot \xi}d\xi. \quad\,\, (3)\end{eqnarray}$
(the right hand sides are initially defined only for nice enough functions, but the definitions can be extended for $L^p$ functions by denseness). Rather surprisingly, $(2)$ is valid whenever $f\in L^p$ (this is essentially the boundedness of the Hilbert transform), while $(3)$ is not generally true in such situations (due to a famous counter example by Fefferman).

There are a couple of reasons why we want to consider the Fourier transform in $\mathbb{R}^n$ and not just in $\mathbb{R}$. One reason is that this generality is required in many applications, for example in studying PDEs or the geometry of Euclidean spaces (for example the isoperimetric inequality). Another reason is that most argument carry easily to the general case. On the other hand, one more reason is that sometimes when an argument fails to generalize to $\mathbb{R}^n$, one can ask whether the result even holds in higher dimensions, and sometimes this leads to intriguing questions, such as the validity of $(2)$ and $(3)$. Of course, one can also study Fourier series on $[0,1]^n$, but it is more complicated than Fourier analysis on $[0,1]$, and for many applications Fourier analysis on $\mathbb{R}^n$ is more suitable.

Some useful properties of the Fourier transform

We derive the basic properties of the Fourier transform, and it turns out that many of these properties tell that the Fourier transform swaps certain important operation pairs (for instance differentiation and multiplication by a polynomial, and convolution and ordinary products). Therefore, in situations where these operations arise, it is sometimes possible to transform the problem to a simpler form by taking Fourier transforms, and then solve that easier problem. By the Fourier inversion theorem, which will be formulated and proved below, we can then return from the Fourier transform version of the problem to the original one. For example, linear PDEs become easy to solve when we apply the Fourier transform, and then inverting the transform we get a representation formula for the solution. We show first the very basic properties of the Fourier transform.

Proposition 1. Let $f,g:\mathbb{R}^n\to \mathbb{C}$ be integrable functions, and denote by $\mathcal{F}$ the Fourier transform (when using the hat notation would be inconvenient). Then
(i) $\mathcal{F}(\lambda f+\mu g)=\lambda \mathcal{F}(f)+\mu \mathcal{F}(g)$ for $\lambda,\mu\in \mathbb{C}$
(ii) $\mathcal{F}(f(x-a))(\xi)=e^{2\pi ia\cdot \xi}\hat{f}(\xi)$ for $a\in \mathbb{R}^n$
(iii) $\mathcal{F}(e^{2\pi ia\cdot x}f(x))(\xi)=\hat{f}(\xi-a)$
(iv) $\mathcal{F}(f(\lambda x))(\xi)=\frac{1}{|\lambda|^n}\hat{f}\left(\frac{x}{\lambda}\right)$ for $\lambda\in \mathbb{R}\setminus\{0\}$
(v) $\|\hat{f}\|_{\infty}\leq \|f\|_1$

Proof. (i) This follows from the linearity of the integral.
(ii), (iii), (iv) These follow from linear changes of variables.
(v) This follows from the triangle inequality. ■

The proposition above tells that the Fourier transform operator $\mathcal{F}$ is a linear operator from $L^1$ to $L^{\infty}$ behaving nicely under shifts and scaling. In order to give some more interesting properties of the Fourier transform, we define a space where the Fourier transform behaves very nicely, even when we differentiate functions or multiply them by polynomials.

Definition. The Schwartz space $S(\mathbb{R}^n)$ consists of all $C^{\infty}$ functions $f:\mathbb{R}^n\to \mathbb{C}$ with the property that $f(x)$ and all its derivatives decay faster than any polynomial as $|x|\to \infty$.

The functions $x\mapsto e^{-a\|x\|^2}$ for $a>0$ are simple examples of Schwartz functions. Now we can state the following.

Proposition 2. (i) If $f\in S(\mathbb{R}^n)$, we have $\mathcal{F}(\partial_j^k f)(\xi)=(2\pi i \xi_j)^k\hat{f}(\xi)$, where $\xi=(\xi_1,...,\xi_n)$ and $\partial_j^k$ is the $k$th derivative with respect to $\xi_j$.
(ii) If $f\in S(\mathbb{R}^n)$, we have $\mathcal{F}(x_j^k f(x))(\xi)=(-2\pi i)^{-k}\partial_j^k\hat{f}(\xi)$.
(iii) Let
$\begin{eqnarray}f*g(y)=\int_{\mathbb{R}^n}f(x)g(y-x)dx\end{eqnarray}$
denote the convolution of integrable functions $f,g:\mathbb{R}^n\to \mathbb{C}$. We have $\mathcal{F}(f*g)(\xi)=\hat{f}(\xi)\hat{g}(\xi)$.
(iv) For integrable $f,g:\mathbb{R}^n\to \mathbb{C}$ such that $fg$ is also integrable we have $\mathcal{F}(fg)(\xi)=\hat{f}*\hat{g}(-\xi)$.

Proof. (i) Since $f$ and its derivatives decay faster than polynomially at infinity, we may partial integrate to find
$\begin{eqnarray}\hat{f'}(\xi)=\int_{\mathbb{R}^n}f'(x)e^{-2\pi i x\cdot \xi}dx=-(-2\pi i\xi_j)\int_{\mathbb{R}^n}f(x)e^{-2\pi i x\cdot \xi}dx=2\pi i \xi_j \hat{f}(\xi),\end{eqnarray}$
and the general case follows similarly by integrating by parts $k$ times.

(ii) It suffices to prove the case $k=j=1$. We have
$\begin{eqnarray}\frac{f(x+he_1)-f(x)}{h}- \mathcal{F}(-2\pi i x_1f(x))&=&\int_{\mathbb{R}^n}f(x)e^{-2\pi i x\cdot \xi}\left(\frac{e^{-2\pi ix_1 h}-1}{h}+2\pi i x_1\right)dx\end{eqnarray}.$
Let $\varepsilon>0$ be given, and partition the integration domain into $B(0,M)$ and its complement, such that the integral over the complement is bounded by $\varepsilon.$ By the intermediate value theorem for $e^{-2\pi i x_1h_1}$, we see that the integral over $B(0,M)$ contributes at most $C(M)h\|f\|_1$ for some constant $C(M)$. Letting $h\to 0$ and then $\varepsilon\to 0$ gives the statement.

(iii) By Fubini's theorem,
$\begin{eqnarray}\int_{\mathbb{R}^n}|f*g(x)|dx&=&\int_{\mathbb{R^n}}\int_{\mathbb{R}^n}|f(y)g(x-y)|dy\, dx\\&=&\int_{\mathbb{R^n}}\int_{\mathbb{R}^n}|g(x-y)|dx|f(y)|dy=\|f\|_1\|g\|_1,\end{eqnarray}$
so $f*g \in L^1$. We compute, again by Fubini's theorem,
$\begin{eqnarray}\widehat{f*g}(\xi)&=&\int_{\mathbb{R}^n}\int_{\mathbb{R}^n}f(y)g(x-y)dye^{-2\pi i x\cdot \xi} dx\\&=&\int_{\mathbb{R}^n}\int_{\mathbb{R}^n}g(x-y)e^{-2\pi i x\cdot \xi} dx f(y) dy\\&=&\int_{\mathbb{R}^n}\hat{g}(\xi)f(y)e^{-2\pi i y\cdot \xi}dy=\hat{f}(\xi)\hat{g}(\xi)\end{eqnarray}$
by part (iii) of the previous proposition.

(iv) The proof is analogous to (iii). ■

The property (i) is obviously useful when applied to differential equations, while (iii) is of importance when one wants to study the distribution of a sum of independent random variables, as the distribution is given by a convolution. There following properties have already been encountered when studying Fourier series, and they are also fundamentally important.

Theorem 3. (Plancherel's formula for $L^1$ functions). Let $f,g\in L^1(\mathbb{R}^n)$ be such that $fg\in L^1(\mathbb{R}^n)$. We have
$\begin{eqnarray}\int_{\mathbb{R}^n}\hat{f}(x)g(x)dx=\int_{\mathbb{R}^n}f(x)\hat{g}(x)dx.\end{eqnarray}$
Proof. By Fubini's theorem, since $f$, $g$ and $fg$ are integrable,
$\begin{eqnarray}\int_{\mathbb{R}^n}\hat{f}(x)g(x)dx&=&\int_{\mathbb{R}^n}g(x)\int_{\mathbb{R}^n}f(\xi)e^{-2\pi ix\cdot \xi}d\xi dx\\&=&\int_{\mathbb{R}^n}g(x)\int_{\mathbb{R}^n}f(\xi)e^{-2\pi ix\cdot \xi}d\xi dx\\&=&\int_{\mathbb{R}^n}\int_{\mathbb{R}^n}f(\xi)g(x)e^{-2\pi ix\cdot \xi}dx d\xi\\&=&\int_{\mathbb{R}^n}f(\xi)\hat{g}(\xi)d\xi,\end{eqnarray}$
which was to be shown. ■

Theorem 4. (Poisson summation). Let $f:\mathbb{R}^n\to \mathbb{C}$ be twice continuously differentiable and satisfy $|f(x)|\leq \frac{C}{1+\|x\|^{n+1}}$. Then
$\begin{eqnarray}\sum_{m\in \mathbb{Z}^n}f(m)=\sum_{m\in \mathbb{Z}^n}\hat{f}(m),\end{eqnarray}$
where the sums are interpreted as the limits of the partial sums with $\|m\|\leq N$ as $N\to \infty.$

Proof. The formula was proved in this earlier post in the case $n=1$. The case of a general $n$ is similar. ■

We give one more very useful fact, namely that the Gaussian function is essentially its own Fourier transform. This is crucial for example when one uses the Fourier transform to prove the central limit theorem from probability.

Proposition 5. The Fourier transform of $f:\mathbb{R}^n\to \mathbb{R}, f(x)=e^{-a\|x\|^2},$ for $a>0$, is $\sqrt{\frac{\pi}{a}}e^{-\frac{\pi\|\xi\|^2}{a}}$. If $a=\pi$, then $\hat{f}(\xi)=f(\xi)$.

Proof. First suppose $n=1$. We evaluate
$\begin{eqnarray}\int_{-\infty}^{\infty}e^{-ax^2-2\pi i x \xi}dx&=&e^{-\frac{\pi^2 \xi^2}{a}}\int_{-\infty}^{\infty}e^{-(\sqrt{a}x+\frac{\pi i \xi}{\sqrt{a}})}dx\\&=&\frac{1}{\sqrt{a}}e^{-\frac{\pi^2 \xi^2}{a}}\int_{-\infty+\frac{\pi i \xi}{\sqrt{a}}}^{\infty+\frac{\pi i\xi}{\sqrt{a}}}e^{-y^2}dy.\\\end{eqnarray}$
If $\xi$ is purely imaginary, the result is
$\begin{eqnarray}\frac{1}{\sqrt{a}}e^{-\frac{\pi^2 \xi^2}{a}}\int_{-\infty}^{\infty}e^{-y^2}dy=\sqrt{\frac{\pi}{a}}e^{-\frac{\pi^2 \xi^2}{a}}.\end{eqnarray}$
Since $x\mapsto e^{-ax^2-2\pi i x \xi}$ decays faster than exponentially and is entire as a function of $\xi$, also its integral over $\mathbb{R}$ is entire (by dominated convergence). We know that the integral equals $\sqrt{\frac{\pi}{a}}e^{-\frac{\pi^2 \xi^2}{a}}$ on the imaginary axis, so by a uniqueness theorem for entire functions it is equal to that for all complex $\xi$.

Finally, consider the higher dimensional case. We have
$\begin{eqnarray}\int_{\mathbb{R}^n}e^{-a\|x\|^2-2\pi x\cdot \xi}dx&=&\prod_{k=1}^{n}\int_{\mathbb{R}}e^{-ax_k^2-2\pi x_k \xi_k}\\&=&\prod_{k=1}^{n}\sqrt{\frac{\pi}{a}}e^{-\frac{\pi^2 \xi_k^2}{a}}\\&=&\left(\frac{\pi}{a}\right)^{\frac{n}{2}}e^{-\frac{\pi^2 \|\xi\|^2}{a}},\end{eqnarray}$
as wanted. ■

Convergence results for Fourier integrals

Now that we have the basic properties of the Fourier transform under control, we can show some classical results concerning the convergence of the Fourier integral representations
$\begin{eqnarray}\int_{\mathbb{R}^n}\hat {f}(\xi)e^{-2\pi i x\cdot \xi},\end{eqnarray}$
where the integral might even fail to exist without further assumptions. In this post about Fourier series, we proved Fejér's theorem about the mean convergence of Fourier series, and it turns out that we have the same result, but we must define convergence in the mean a bit differently, as there are infinitely many ways to take an average over $\mathbb{R}^n$ for $n>1$, even when using rectangular averages.

Given $n$ positive real numbers $r_1,...,r_n$, we define
$\begin{eqnarray}[[r_1,...,r_n]]:=[-r_1,r_1]\times...\times [-r_n,r_n].\end{eqnarray}$
Then we have

Theorem 6 (mean convergence of Fourier integrals). Let $f\in L^1(\mathbb{R}^n)$ be any function, and for any rectangle $\mathcal{R}\subset \mathbb{R}^n$, define the partial Fourier integral over $\mathcal{R}$ as
$\begin{eqnarray}f_{\mathcal{R}}(x):=\int_{\mathcal{R}}\hat{f}(\xi)e^{2\pi i x\cdot \xi}.\end{eqnarray}$
Define the rectangular average of the partial Fourier integrals over $[-R,R]^n$ as
$\begin{eqnarray}\sigma_{R}(f,x):=\frac{1}{(2R)^n}\int_{[-R,R]^n}f_{[[r_1,...,r_n]]}(x)dr_1...dr_n.\end{eqnarray}$
Then it holds that
$\begin{eqnarray}f(x)=L^1-\lim_{R\to \infty}\sigma_{R}(f,x),\end{eqnarray}$
and for continuous functions $f$ also
$\begin{eqnarray}f(x)=\lim_{R\to \infty}\sigma_R(f,x)\end{eqnarray}$
for almost every $x$, and the convergence is uniform on compact sets.

Proof. The reason we use rectangular averages is that by Fubini's theorem we can split integrals over rectangles into one-dimensional integrals and thus reduce the statement to the case $n=1$. We see that
$\begin{eqnarray}\sigma_{R}(f,x)&=&\frac{1}{(2R)^n}\int_{[-R,R]^n}f_{[[r_1,...,r_n]]}(x)dr_1...dr_n\\&=&\frac{1}{(2R)^n}\int_{[-R,R]^n}\int_{[-R,R]^n}\hat{f}(\xi)e^{2\pi i x\cdot\xi}\int_{[-R,R]^n}1_{[[r_1,...,r_n]]}(\xi)dr_1...dr_nd\xi\\&=&\frac{1}{(2R)^n}\int_{[-R,R]^n} \hat{f}(\xi)e^{2\pi i x\cdot \xi}\prod_{j=1}^n\int_{[-R,R]\setminus [-|\xi_j|,|\xi_j|]}dr_jd\xi\\&=&\int_{[-R,R]^n}\hat{f}(\xi)e^{2\pi i x\cdot \xi}\prod_{j=1}^n\left(1-\frac{|\xi_j|}{R}\right)d\xi\\&=&\int_{[-R,R]^n}\prod_{j=1}^n\left(1-\frac{|\xi_j|}{R}\right) e^{2\pi i x\cdot \xi}\int_{\mathbb{R}^n}f(y)e^{-2\pi i y\cdot \xi}dyd\xi\\&=&\int_{\mathbb {R}^n}f(y)\int_{[-R,R]^n}\prod_{j=1}^n\left(1-\frac{|\xi_j|}{R}\right) e^{2\pi i (x-y)\cdot \xi}d\xi dy\\&=&\int_{\mathbb {R}^n}f(y)\mathcal{K}_R(x-y)dy\\&=&f*\mathcal{K}_R(x),\end{eqnarray}$
where
$\begin{eqnarray}\mathcal{K}_R(x):=K_R(x_1)...K_R(x_n)\end{eqnarray}$
and
$\begin{eqnarray}K_R(x):=\int_{-R}^R \left(1-\frac{|\xi|}{R}\right)e^{2\pi i x \xi}d\xi.\end{eqnarray}$
Similarly as in proving the corresponding theorem for Fourier series, we want to show that $(\mathcal{K}_{\frac{1}{R}})_{R>0}$ is a good family of kernels (for a definition, see Stein&Shackarchi, Fourier Ananlysis, page 48). In the post about convergence of Fourier series, we defined what it means for a countable sequence of functions on $[0,1]$ to be a good family of kernels, but here the definition is analogous. The required properties are
(i) $\int_{\mathbb{R}^n}\mathcal{K}_R(x)dx=1$ for all $R>0$
(ii) $\int_{\mathbb{R}^n}|\mathcal{K}_R(x)|dx\leq C$ for some constant $C$, independent of $R$
(iii) for every $\delta>0,$ $\int_{\mathbb{R}^n\setminus [-\delta,\delta]^n}|\mathcal{K}_R(x)|dx\xrightarrow{R\to \infty} 0.$
These properties imply $\|f-f*\mathcal{K}_R\|_1\xrightarrow{R\to \infty}0$ for $f\in L^1$ and $\sup_{x\in C}|f(x)-f*\mathcal{K}_R(x)|\xrightarrow{R\to \infty}0$ for continuous $f$ and compact $C\subset \mathbb{R}^n$. The proof of these implications is similar as in the case of a good family of kernels on $[0,1]$. Therefore, it is sufficient to verify the properties (i),(ii),(iii) for the specific kernels that we have.

We start with the case $n=1$, and evaluate the kernels $K_R(x)$, checking that they satisfy (i),(ii),(iii). We integrate by parts and find
$\begin{eqnarray}K_R(x)&=&\int_{-R}^R \left(1-\frac{|\xi|}{R}\right)e^{2\pi i x \xi}d\xi\\&=&\int_{-R}^R \frac{1}{2\pi i R x}\text{sgn}(\xi)e^{2\pi i x \xi}d\xi\\&=&\frac{1}{(2\pi i x)^2 R}(e^{2\pi i Rx}+e^{-2\pi i Rx}-2)\\&=&\frac{\sin^2(\pi R x)}{2\pi^2 R x^2}\end{eqnarray}$
by the identity $\cos(2y)-1=-2\sin^2 y$.

We observe now that
$\begin{eqnarray}\int_{\mathbb{R}}\frac{\sin^2(\pi Rx)}{2\pi^2 Rx^2}dx=\int_{\mathbb{R}}\frac{\sin^2(\pi y)}{2\pi^2 y^2}dy=1,\end{eqnarray}$
where the last integral can be evaluated in many ways, for example by the residue theorem or by forming the Fourier transform of a suitable triangular wave and applying the Fourier inversion theorem, which is proved above. Hence (i) holds for the family $(K_R)_{R>0}$, and (ii) follows immediately by nonnegativity. Since
$\begin{eqnarray}\int_{\mathbb{R}\setminus [-\delta,\delta]}\frac{\sin^2(\pi Rx)}{2\pi^2Rx^2}dx=\int_{\mathbb{R}\setminus [-R\delta,R\delta]}\frac{\sin^2(\pi y)}{2\pi^2 y^2}dy\to 0\end{eqnarray}$
as $R\to \infty$, also (iii) is true for the kernels $(K_R)_{R>0}$.

Now we return to the general case. Since (i) has been proved for the one-dimensional kernels, Fubini's theorem tells
$\begin{eqnarray}\int_{\mathbb{R}^n}\mathcal{K}_{R}(x)dx=\prod_{j=1}^n \int_{\mathbb{R}}K_R(x_j)dx_j=1.\end{eqnarray}$
The condition (ii) is unnecessary by nonnegativity. The condition (iii) follows from the corresponding condition for the one-dimensional kernels, again by Fubini's theorem. The proof is now finished. ■

The theorem above illustrates well how complications sometimes arise in proving properties of the Fourier transform on $\mathbb{R}^n$ compared to proving them in $\mathbb{R}$. Nevertheless, Theorem 6 can be said to be worth all the computations because it gives a rather simple way to prove that for every $L^1(\mathbb{R}^n)$ function, its Fourier integrals converge to $f$ at least in some sense. The following Fourier inversion theorem is simpler to state and gives a nicer result, but we have to assume that $\hat{f}$ is integrable. We will see in a later section that this is a surprisingly restrictive assumption; it implies that $f$ must be continuous and decay to $0$ at infinity, and these conditions are not even sufficient. Moreover, we will see later in this post that every $L^p$ function for $p\in [1,2]$ has a Fourier transform $\hat{f}$ that is locally integrable, and then the proof of Theorem 6 gives
$\begin{eqnarray}f(x)=L^p-\lim_{R\to \infty}\sigma(f,x),\end{eqnarray}$
while the more elegant result
$\begin{eqnarray}f(x)=L^p-\lim_{R\to \infty}\int_{B(0,R)}\hat{f}(\xi)e^{2\pi i x\cdot \xi}d\xi\end{eqnarray}$
fails to be true for $n>1$ and $p\neq 2$, as was mentioned in the introduction. When it comes to applications, though, the following result is of fundamental importance.

Theorem 7 (Fourier inversion theorem). Let $f\in L^1(\mathbb{R}^n)$ be such that $\hat{f}\in L^1(\mathbb{R}^n)$. Then for almost all $x$
$\begin{eqnarray}f(x)=\int_{\mathbb{R}^n}f(\xi)e^{2\pi i x\cdot \xi}d\xi.\end{eqnarray}$

Proof. Define $K_{\varepsilon}(\xi)=\frac{1}{\varepsilon^{\frac{n}{2}}}e^{-\frac{\pi \xi^2}{\varepsilon}}$. Then $(K_{\varepsilon})_{\varepsilon>0}$ is a family of good kernels since $\int_{\mathbb{R}^n}|K_{\varepsilon}(\xi)|d\xi=1$ (by a change of variables, the integral is a constant times the standard Gaussian integral), $K_{\varepsilon}(\xi)\geq 0$ and for any $\delta>0,$ $\int_{\|x\|\geq \delta}K_{\varepsilon}(\xi)d\xi\to 0$ as $\frac{1}{\varepsilon}\to \infty$ (again by relating the integral to the Gaussian integral).

As $K_{\varepsilon}=\widehat{\widehat{K_{\varepsilon}}}(\xi)$ by Proposition 5, Plancherel's formula for $L^1$ functions (Theorem 3) yields for almost all $x$ that
$\begin{eqnarray}f(x)&=&\int_{\mathbb{R}^n}f(\xi)K_{\varepsilon}(x-\xi)d\xi\\&=&\int_{\mathbb{R}^n}f(\xi)\widehat{\widehat{K_{\varepsilon}}}(x-\xi)d\xi\\&=&\int_{\mathbb{R}^n}\hat{f}(\xi)\widehat{K_{\varepsilon}}(x-\xi)e^{2\pi i x\cdot \xi}d\xi\\&=&\int_{\mathbb{R}^n}\hat{f}(\xi)e^{2\pi i x \cdot\xi}e^{-\pi \varepsilon (x-\xi)^2}d\xi\\&\xrightarrow{\varepsilon\to 0}&\int_{\mathbb{R}^n}\hat{f}(\xi)e^{2\pi i x \xi}d\xi\end{eqnarray}$
by dominated convergence. This proves the statement. ■

$L^2$ theory of the Fourier transform

Theorem 8. The Fourier transform $\mathcal{F}:L^1(\mathbb{R}^n)\to L^{\infty}(\mathbb{R}^n)$ has a unique extension to $L^1(\mathbb{R}^n)+L^2(\mathbb{R}^n)$ (the space of sums of an $L^1$ and an $L^2$ function) as a bounded linear operator. In particular, it extends uniquely to $L^p(\mathbb{R}^n)$ for $1\leq p\leq 2$.

Proof. The last statement is an immediate corollary of the first because every $f\in L^p(\mathbb{R}^n),p\in [1,2]$ can be written as $f(x)=f(x)1_{\{|f|\leq 1\}}(x)+f1_{\{|f|>1\}}$, where the terms are in $L^2$ and $L^1$, respectively. It remains to prove the first statement of the theorem. It suffices to extend $\mathcal{F}$ to $L^1(\mathbb{R}^n)\cup L^2(\mathbb{R}^n)$ uniquely, as it is then trivial to further extend uniquely to the sum space. The space $L^2(\mathbb{R}^n)$ has a dense subspace $L^1(\mathbb{R}^n)\cap L^2(\mathbb{R}^n)$ and $\mathcal{F}$ is continuous on this subspace in the $L^2$ norm (since Plancherel's theorem has been proved for functions in $L^1\cap L^2$), so the only possible definition of $\mathcal{F}$ for $L^2$ functions is given by
$\begin{eqnarray}\mathcal{F}(f)=L^2-\lim_{n\to \infty}\mathcal{F}(f_n),\end{eqnarray}$
where $f_n\in L^1\cap L^2$ is an arbitrary sequence converging to $f$ in the $L^2$ norm. We must show that this limit exists and is independent of the sequence. Let $(f_n)$ and $(g_n)$ be two secuencies of functions from $L^1\cap L^2$ converging to $f$ in $L^2$. Then by Plancherel's formula and the continuity of the norm
$\begin{eqnarray}\|f-g\|_2=L^2-\lim_{n\to \infty}\|\mathcal{F}(f_n)-\mathcal{F}(g_n)\|_2=L^2-\lim\|f_n-g_n\|_2=\|f-f\|=0,\end{eqnarray}$
so $f=g$. Therefore the limit is independent of the sequence, and it exists because $\mathcal{F}(f_n)$ is Cauchy due to
$\begin{eqnarray}\|\mathcal{F}(f_n)-\mathcal{F}(f_m)\|_2=\|f_n-f_m\|<\varepsilon,\quad m,n>N_{\varepsilon}.\end{eqnarray}$
Finally, the limit operator is obviously linear and is bounded because
$\begin{eqnarray}\|\mathcal{F}(f)\|_2=L^2-\lim_{n\to \infty}\|\mathcal{F}(f_n)\|_2=L^2-\lim_{n\to\infty}\|f_n\|_2=\|f\|,\end{eqnarray}$
so the claim is proved. ■

Theorem 9. For $f\in L^2(\mathbb{R}^n)$, we have
$\begin{eqnarray}\mathcal{F}(f)=L^2-\lim_{R\to\infty}\int_{B(0,R)}f(x)e^{-2\pi i x\cdot \xi}\quad (1),\end{eqnarray}$
where the Fourier transform $\mathcal{F}$ is defined by Theorem 8.

Proof. We will first show that $\int_{B(0,N)}f(x)e^{-2\pi i x\cdot \xi}dx,N=1,2,...$ is Cauchy in $L^2$. For $N>M$ we have
$\begin{eqnarray}\left\|\int_{B(0,N)}f(x)e^{-2\pi i x\cdot \xi}dx-\int_{B(0,M)}f(x)e^{-2\pi i x\cdot \xi}dx\right\|_2&=&\left\|\int_{M<\|x\|\leq N}f(x)e^{-2\pi i x\cdot \xi}dx\right\|_2\\&=&\|\mathcal{F}(f1_{\{M<\|x\|\leq N\}})\|_2\\&=&\|f1_{\{M<\|x\|\leq N\}}\|_2\\&=&\int_{M<\|x\|\leq N}|f(x)|^2dx\\&\leq& \int_{\|x\|>M}|f(x)|^2<\varepsilon\end{eqnarray}$
when $M,N>N_{\varepsilon}$, and we used the fact that $f1_{\{M<\|x\|\leq N\}}$ is in $L^1\cap L^2$. Therefore the sequence under consideration is Cauchy, and the limit on the right-hand side of $(1)$ exists (it suffices to take the limit over integer values of $R$ since the above argument goes trough for any sequences of $M$ and $N$ tending to infinity).

It has now been proved that
$\begin{eqnarray}\mathcal{G}(f):=L^2-\lim_{R\to\infty}\int_{B(0,R)}f(x)e^{-2\pi i x\cdot \xi}\end{eqnarray}dx$
is a well-defined linear operator $L^2(\mathbb{R}^n)\to L^2(\mathbb{R}^n)$. If $f\in L^1(\mathbb{R}^n)\cap L^2(\mathbb{R}^n)$, then
$\begin{eqnarray}\lim_{R\to\infty}\int_{B(0,R)}f(x)e^{-2\pi i x\cdot \xi}=\hat{f}(\xi)\end{eqnarray}$
pointwise. If a sequence has a limit in $L^1$ and $L^2$, the limits must coincide (because both convergences imply convergence in measure), so $\mathcal{F}=\mathcal{G}$ in the space $L^1\cap L^2$. By Theorem 8, $\mathcal{G}$ must be identical to $\mathcal{F}$ on $L^2$. The proof is now finished. ■

Theorem 10 (Plancherel's formula for $L^2$ functions). For $f,g\in L^2(\mathbb{R}^n)$ we have
$\begin{eqnarray}\int_{\mathbb{R}^n}f(x)\bar{g}(x)dx=\int_{\mathbb{R}^n}\hat{f}(x)\overline{\hat{f}}(x)dx.\end{eqnarray}$
Proof. If $(\cdot,\cdot)$ denotes the natural inner product on $L^2(\mathbb{R}^n)$, we must show $(f,g)=(\mathcal{F}(f),\mathcal{F}(g))$. By the continuity of the inner product and the definition of $\mathcal{F}$ given in the proof of Theorem 8, we obtain
$\begin{eqnarray}(\mathcal{F}(f),\mathcal{F}(g))&=&L^2-\lim_{n\to \infty} (\mathcal{F}(f_n),\mathcal{F}(g_n))\\&=&L^2-\lim_{n\to \infty}(f_n,g_n)\\&=&(f,g)\end{eqnarray}$
for any $f_n,g_n\in L^1\cap L^2$ converging in $L^2$ to $f$ and $g$, respectively. ■

Theorem 11 (Fourier inversion theorem in $L^2$). Let $f\in L^2(\mathbb{R}^n)$. Then
$\begin{eqnarray}f(x)=L^2-\lim_{R\to \infty}\int_{B(0,R)}\hat{f}(\xi)e^{2\pi i x\cdot \xi}d\xi.\end{eqnarray}$

Proof. By the previous theorem $\|f\|_2=\|\hat{f}\|_2$, so the operator
$\begin{eqnarray}f\mapsto L^2-\lim_{R\to \infty}\int_{B(0,R)}\hat{f}(\xi)e^{2\pi i x\cdot \xi}d\xi=L^2-\lim_{R\to \infty}\int_{B(0,R)}\widehat{f(-\xi)}e^{-2\pi i x\cdot \xi}d\xi\end{eqnarray}$
is a bounded linear operator form $L^2(\mathbb{R}^n)\to L^2(\mathbb{R}^n)$ by Theorem 9. By the Fourier inversion theorem for $L^1$ functions, the operator is the identity on the subspace $L^1(\mathbb{R}^n)\cap L^2(\mathbb{R}^n)$. The subspace is dense, so the operator must be the identity on the whole $L^2(\mathbb{R}^n)$. ■

Theorem 12. The Fourier transform is a bijection $L^2(\mathbb{R}^2)\to L^2(\mathbb{R}^n)$.

Proof. We have already shown that $\mathcal{F}$ maps $L^2(\mathbb{R}^2)\to L^2(\mathbb{R}^n)$. Given an $L^2$ function $f$, Theorem 11 shows that it is the Fourier transform of $\xi\mapsto \hat{f}(-\xi)$, which is still an $L^2$ function, so the claim follows. ■

Mapping properties of the Fourier transform between different spaces

It is interesting to know how the Fourier transform maps different spaces, that is what can be said about the images of various spaces under the Fourier transform. It turns out that for some spaces, the image is not hard to characterize, at least partially. In the previous section, we already showed that the space $L^2(\mathbb{R}^n)$ behaves ''as nicely as possible'' under the Fourier transform: the transform is a bijection there, and it is even an isometry (and hence a bounded linear operator) by Plancherel's formula. In $L^1(\mathbb{R}^n)$, we also have a simple partial description for the image space, but a complete description seems hopeless. Already the example $f(x)=1_{[-1,1]}(x)$ with $\hat{f}(\xi)=\frac{\sin(2\pi \xi)}{2\pi \xi}$ shows that the following property is necessary but not sufficient for a function to be the Fourier transform of an integrable function.

Theorem 13. (Riemann-Lebesgue). Let $f:\mathbb{R}^n\to \mathbb{C}$ be integrable. Then $\hat{f}(\xi)\to 0$ as $\|\xi\|\to \infty$ and $\hat{f}$ is continuous.

Proof. First assume $f\in S(\mathbb{R}^n)$. Then $|\hat{f}(\xi)|=\frac{2\pi}{|\xi_i|}|\partial_i f(\xi)|$ for any $i\in\{1,2,...,n\}$, so $|\hat{f}(\xi)|\leq \frac{2\pi}{\min|\xi_i|}|\nabla\cdot f|(\xi)\to 0$ as $\|\xi\|\to \infty$. Now if $f$ is a general $L^1$-function, it can be approximated by Schwartz functions $f_m$ so that $\|f-f_m\|_1\to 0$ as $m\to \infty$ (indeed, it is well-know that $C^{\infty}(\mathbb{R}^n)\cap L^1(\mathbb{R}^n)$ is dense in $L^1(\mathbb{R}^n)$ and $C_0^{\infty}(\mathbb{R}^n)$ is dense in $C^{\infty}(\mathbb{R}^n)\cap L^1(\mathbb{R}^n)$, so that in particular the Schwartz functions are dense on $L^1$). Therefore we have $|\hat{f}(\xi)|\leq |\hat{f}(\xi)-\hat{f}_m(\xi)|+|\hat{f}_m(\xi)|\leq \|f-f_m\|_1+|\hat{f}_m(\xi)|\to \|f-f_m\|_1$ as $\|\xi\| \to \infty.$ Letting $m\to \infty$ proves the claim.

To see that $\hat{f}$ is continuous, notice that
$\begin{eqnarray}|\hat{f}(x+h)-\hat{f}(x)|\leq \int_{\mathbb{R}^n}|f(x)||1-e^{-2\pi i x\cdot \xi}|dx\to 0\end{eqnarray}$
as $h\to 0$ by dominated convergence. ■

There are several theorems stating that the smoothness of $f$ and the decay of $\hat{f}$ are closely related; for instance, the following holds.

Proposition 14. If all the derivatives of $f$ up to order $k$ belong to $f\in C(\mathbb{R}^n)\cap L^1(\mathbb{R}^n)$, then $|\hat{f}(\xi)|=o(|\xi|^{-k})$.

Proof. By Propsition 2, if $\partial_1^k f$ is the derivative of $f$ of order $k$ with respect to $x_1$, then $\widehat{\partial_1^k f}(\xi)=(2\pi i \xi_1)^k |\hat{f}(\xi)|$, so the claim follows from the Riemann-Lebesgue lemma applied to $\partial_1^k f$.

Conversely, one can show by partial integration that if $x^k\hat{f}$ is integrable, then $f$ is weakly differentiable up to order $k$, but not necessarily everywhere differentiable as the condition on $x^k\hat{f}$ depends only on its values almost everywhere.

Besides $L^2(\mathbb{R}^n)$, also $S(\mathbb{R}^n)$ has a very simple behavior under the Fourier transform.

Theorem 15. We have $f\in S(\mathbb{R}^n)$ if and only if $\hat{f}\in S(\mathbb{R}^n)$. Moreover,the Fourier transform is a bijection in $S(\mathbb{R}^n).$

Proof. Let us first show that the Fourier transform of a Schwartz function is a Schwartz function. For $f\in S(\mathbb{R}^n)$, the definition of the Fourier transform can be differentiated under the integral sign arbitrarily many times to see that $\hat{f}$ is smooth. By Proposition 2, we have $\partial_j^k \hat{f}=(-2\pi i)^k \mathcal{F}(x_j^kf(x))(\xi)$, so it is enough to show that $\mathcal{F}(x_j^kf(x))(\xi)$ decays faster than polynomially, which follows directly from Proposition 14.

Now suppose $\hat{f}\in S(\mathbb{R}^n)$. Then $\mathcal{\hat{f}}\in S(\mathbb{R}^n),$ so Fourier inversion tells $f\in S(\mathbb{R}^n)$. Now we see that the Fourier transform is a bijction on $\mathbb{R}^n$ as $f\in S(\mathbb{R}^n)$ is the Fourier transform of the Schwartz function $\hat{f}(-\xi)$. ■

We now turn to results about the analyticity of $\hat{f}$. The Fourier transform was only defined for real $\xi_1,...,\xi_n$, but sometimes it extends to complex values as well, becoming an entire function.

Proposition 16. If $|f(x)|\leq C_a e^{a\|x\|}$ for every $a>0$, then $\hat{f}$ extends to an entire function from $\mathbb{C}^n$ to $\mathbb{C}$ (meaning that it is entire in each of its complex variables).

Proof. Since $f$ decays faster than any exponential (including $e^{2\pi i x\cdot \xi}$, which indeed grows exponentially in the complex plane), we can define the integral
$\begin{eqnarray}\int_{\mathbb{R}^n}f(x)e^{-2\pi i x\cdot \xi}dx\end{eqnarray}$
for any $\xi \in \mathbb{C}^n$, and as $\xi \mapsto e^{2\pi i x\cdot \xi}$ is entire, we can differentiate under the integral sign to see that $\hat{f}$ is also entire in $\mathbb{C}^n$, that is, entire in each of its variables separately. ■

We can even give a characterization of the Fourier transform of smooth compactly supposed functions in terms of entire functions of exponential growth. By entire functions in $\mathbb{C}^n$ we mean functions of $n$ complex variables that are analytic in the whole plane when any $n-1$ variables are fixed. By exponential growth we mean $|g(z)|\leq C_m \frac{e^{A\|\Im(z)\|}}{1+|z|^m}$ for some constants $C_m$ and $A$ and all $z\in \mathbb{C}$ (this is not the standard definition). Here $\|z\|=\sqrt{|z_1|^2+...+|z_n|^2}$ and $\Im(z)=(\Im(z_1),...,\Im(z_n))$.

Theorem 17 (Paley-Wiener). The image of the space $C_0^{\infty}(\mathbb{R}^n)$ under the Fourier transform is precisely the set of entire functions $g:\mathbb{C}^n\to \mathbb{C}$ of exponential growth.

Proof. Notice that the previous theorem shows already that the Fourier transform of a function from $C_0^{\infty}(\mathbb{R}^n)$ extends from $\mathbb{R}^n$ to $\mathbb{C}^n$ analytically, and the Fourier transform grows at most exponentially because
$\begin{eqnarray}\left|\int_{B(0,M)}f(x)e^{-2\pi i x\cdot \xi}dx\right|&\leq& \|f\|_{\infty}\cdot c_nM^n\cdot \max_{|x|\leq M}|e^{-2\pi i x\cdot \xi}|\\&\leq& \|f\|_{\infty}\cdot c_nM^n\cdot e^{2\pi \|\Im(x\cdot \xi)\|}\\&\leq& \|f\|_{\infty}\cdot c_nM^n\cdot e^{2\pi M\|\Im(\xi)\|}.\end{eqnarray}$
To prove the other direction of the theorem, let $g$ be an entire function of exponential growth. Let $C_m$ and $A$ be such that $|g(z)|\leq C_m \frac{e^{A|\Im(z)|}}{1+|z|^m}$. We want to show that $\hat{g}$ is a compactly supported and smooth. The function $g$ is a Schwartz function, so Fourier inversion gives
$\begin{eqnarray}\hat{g}(z)=\int_{\mathbb{R}^n}g(x)e^{2\pi i x\cdot z}dx.\end{eqnarray}$
Smoothness and compactness of support can be checked for each variable separately, so we may assume that $z_2,...,z_n$ are fixed. Since $g$ is a Schwartz function, $\hat{g}$ is smooth. By shifting the set of integration, we can write
$\begin{eqnarray}\hat{g}(z)=\int_{\mathbb{R}^n}g(x+it)e^{2\pi i (x+it)\cdot z}dx=e^{-2\pi t\cdot z}\int_{\mathbb{R}^n}g(x+it)e^{2\pi i x\cdot z}dx\quad (1)\end{eqnarray}$
for any $t\in \mathbb{R}^n$; this is allowed because $g$ is entire, so that its integral over any closed path is zero (recall that $z_2,...,z_n$ are fixed, and take as the closed path the rectangle with sides $\Re(z_1)=0,\Re(z_1)=t,\Im(z_1)=T,\Im(z_1)=-T$ where $T\to \infty$). Using the growth assumption, the inequality (1) becomes
$\begin{eqnarray}|\hat{g}(z)|&\leq& e^{-2\pi t\cdot z}\int_{\mathbb{R}^n}C_2\frac{e^{A\|\Im(x+it)\|}}{(1+|x+it|^2)}e^{2\pi ix\cdot z}dx\\&=&e^{A\|t\|-2\pi t\cdot z}\int_{\mathbb{R}^n}\frac{C_2}{(1+|x+it|^2)}dx\\&\leq& C_2'e^{A\|t\|-2\pi t\cdot z}.\end{eqnarray}$
Assume $z_1\neq 0$ (we are interested in large values of the variable), and let $t=(t_1,0,...,0)$. Then the exponential above becomes $e^{(A-2\pi z_1)t_1}$. If $z_1>\frac{A}{2\pi}$, letting $t_1\to \infty$ shows that $\hat{g}(z)=0$. Similarly, if $z_1<-\frac{A}{2\pi}$, letting $t_1\to -\infty$ yields $\hat{g}(z)=0$. In conclusion, $g$ is compactly supported in its first variable, and by the same argument also in the other variables. ■

We have now analyzed how assumptions on integrability, square integrability, smoothness and analyticity affect the Fourier transform. There is one more important aspect to consider, namely the behavior of $\hat{f}$ if $f\in L^p(\mathbb{R}^n)$ for $p\in [1,2]$. As already mentioned, each $f\in L^p(\mathbb{R}^n)$ for $p\in [1,2]$ can be written as a sum of an $L^1$ and $L^2$ function, so the Fourier transform is well-defined in these spaces. It does not map $L^p$ to $L^p$ for $p\neq 2$, but still it maps $L^p$ to another space $L^q$, by the Hausdorff-Young inequality, which we are going to prove. To prove it, we need a result that is generally useful when one knows the boundedness of an operator in two pairs of spaces (here they are $L^1\to L^{\infty}$ and $L^2\to L^2$) and one wants to prove boundedness in intermediate spaces.

Theorem 18 (Riesz-Thorin interpolation). A linear operator $T$ between two measure spaces $X$ and $Y$ is said to be of type $(p,q)$ if $T:L^p(X)\to L^q(Y)$ is bounded. Let $1\leq p_0\leq p_1\leq \infty,1\leq q_0\leq q_1\leq \infty,$ and let $T$ be of types $(p_0,q_0)$ and $(p_1,q_1)$ (the underlying measure spaces are the same for both types). Define
$\begin{eqnarray}p_{\theta}=\left(\frac{1-\theta}{p_0}+\frac{\theta}{p_1}\right)^{-1},\quad q_{\theta}=\left(\frac{1-\theta}{q_0}+\frac{\theta}{q_1}\right)^{-1},\quad \theta\in (0,1).\end{eqnarray}$
Then $T$ is also of type $(p_{\theta},q_{\theta})$, and the operator norms obey the inequality
$\begin{eqnarray}\|T\|_{L^{p_{\theta}}\to L^{q_{\theta}}}\leq \|T\|_{L^{p_0}\to L^{q_0}}^{1-\theta}\|T\|_{L^{p_1}\to L^{q_1}}^{\theta}.\end{eqnarray}$
We remark that although $T$ is not necessarily initially defined outside $L^{p_0}$ and $L^{q_0}$, it extends linearly to the sum space $L^{p_0}+L^{q_0}$, which is well-known to contain $L^{p_{\theta}}$. A proof of the Riesz-Thorin interpolation theorem is based on complex analysis, where we have an analogous statement with operator norms replaced by suprema of an entire function over vertical lines. We skip the details for brevity and because the roof is not related to Fourier analysis. We are now in a position to prove the following.

Theorem 19 (Hausdorff-Young inequality). Let $1\leq p\leq 2$ and $q=\frac{p}{p-1}$. We have the inequality
$\begin{eqnarray}\|\hat{f}\|_q\leq \|f\|_p\end{eqnarray}$
for all $f\in L^1(\mathbb{R}^n)\cap L^2(\mathbb{R}^n)$.

Proof. The Fourier transform $\mathcal{F}$ is a linear operator that maps $L^{1}(\mathbb{R}^n)$ to $L^{\infty}(\mathbb{R}^n)$ and $L^2(\mathbb{R}^n)\to L^2(\mathbb{R}^n)$, the operator norms being at most $1$ (actually, it is easy to see that they are equal to $1$). Now $\mathcal{F}$ is of types $(1,\infty)$ and $(2,2)$, and $p_{\theta}=(1-\frac{\theta}{2})^{-1}$, $q_{\theta}=\frac{2}{\theta}$. Clearly $p_{\theta}$ and $q_{\theta}$ are Hölder conjugates (that is, $q_{\theta}=\frac{p_{\theta}}{p_{\theta}-1}$), and $p_{\theta}$ attains all the values on $(1,2)$. Therefore Riesz-Thorin interpolation yields
$\begin{eqnarray}\|\mathcal{F}\|_{L^p\to L^q}\leq 1\end{eqnarray}$
for $q\geq 2$ and $p=\frac{q}{q-1}$. Writing out the definition of the operator norm produces $\|\hat{f}\|_q\leq \|f\|_p$, as desired. ■