*A function $f$ and its Fourier transform*$\hat{f}$

*cannot both be sharply ''localized''*. This of course allows many interpretations, and by interpreting localization in various ways, we get theorems (usually inequalities) about the simultaneous behavior of $f$ and $\hat{f}$. We will formalize and prove the following uncertainty principles.

- $|f|^2$ and $|\hat{f}|^2$ cannot both have small variance (Heisenberg uncertainty principle)
- Two self-adjoint operators cannot both have small norms compared to the norm of their commutator (Heisenberg uncertainty principle for operators)
- $f$ and $\hat{f}$ cannot both decay faster than gaussian functions (Hardy's uncertainty principle)
- Not both $f$ and $\hat{f}$ can be nonzero only in a set of finite measure (Benedick's inequality)
- The entropy of $f$ and $\hat{f}$ cannot both be positive (Hirschman's inequality)

The first hint that such principles might be true is that the Fourier
transform ''spreads out'' a function the more it is concentrated:
$\mathcal{F}(f(tx))=\frac{1}{t}\hat{f}(\frac{\xi}{t})$. We now proceed to formalize and prove the principles.

**Heisenberg's uncertainty principle and related results**
We follow the article

*The Uncertainty Principle: A Mathematical Survey*by G. B. Folland, A. Sitaram (The journal of Fourier analysis and applications 3 (1997): 207-238) as well as the book*Uncertainty Principles*by J. G. Christensen.
We start with the most familiar uncertainty principle, the Heisenberg uncertainty principle, which has fundamental significance in quantum mechanics. Heisenberg did not actually prove the mathematically precise version of the uncertainty principle; this was first done by Kennard in the same year that Heisenberg published his idea.

**Theorem 1 (Heisenberg uncertainty principle).**Let $f\in \mathcal{S}(\mathbb{R})$ (that is, $f$ is a Schwartz function). Then we have

\[\begin{eqnarray}\|xf\|_2\|\xi\hat{f}\|_2\geq \frac{\|f\|_2^2}{4\pi},\end{eqnarray}\]

and equality holds if and only if $f(x)=ae^{-bx^2}$ for some constants $a$ and $b>0$.

**Proof.**Since $\xi\hat{f}(\xi)=\frac{-1}{2\pi i}\hat{f'}(\xi)$, the left-hand side of the inequality is

\[\begin{eqnarray}\frac{1}{2\pi}\|xf\|_2\|\widehat{f'}\|_2=\frac{1}{2\pi}\|xf\|_2\|f'\|_2\end{eqnarray}\]

by Parseval's identity. By Cauchy-Schwarz, this is at least

\[\begin{eqnarray}\frac{1}{2\pi}\int_{\mathbb{R}}|xf(x)f'(x)|dx&\geq& \frac{1}{2\pi}\int_{\mathbb{R}}|x|\text{Im}(f(x)f'(x))dx\\&\geq& \frac{1}{2\pi}\int_{\mathbb{R}}x\text{Im}(f(x)f'(x))dx.\end{eqnarray}\]

The functions $f$ and $f'$ are Schwarz functions, so way may integrate by parts using $\frac{d}{dx}|f(x)|^2=2\text{Re}(f(x)f'(x))$ to get

\[\begin{eqnarray}\frac{1}{2\pi}\int_{\mathbb{R}}x\text{Im}(f(x)f'(x))dx=\frac{1}{4\pi}\int_{\mathbb{R}}|f(x)|^2 dx,\end{eqnarray}\]

which is the right-hand side of the desired inequality. We see that equality holds if and only if $f'(x)=cxf(x)$ for some $c<0$, that is, $f$ is a constant times a gaussian function. ■

An immediate corollary of the theorem is the following.

**Corollary.**With the assumptions of Theorem 1, and in addition with $x_0,\xi_0\in \mathbb{R}$, we have

\[\begin{eqnarray}\|(x-x_0)f\|_2\|(\xi-\xi_0)\hat{f}\|_2\geq \frac{\|f\|_2^2}{4\pi}.\end{eqnarray}\]

A couple of remarks about the Heisenberg uncertainty principle are in order. First, the theorem holds for all $f\in L^2(\mathbb{R})$, but the proof is more intricate in this case, as above we needed the assumption that $f'$ exists and is absolutely continuous to integrate by parts. One can conclude this by assuming that the left-hand side of the Heisenberg uncertainty principle is finite, but we omit ignore the case of general $L^2$-functions in this post to avoid further complications. We could also prove the Heisenberg uncertainty principle and other principles in $\mathbb{R}^n$, but we work usually in $\mathbb{R}$ since the higher dimensional cases equire a lot more carefulness without giving much more insight to the result.

One should of course mention a few words about the quantum mechanical interpretation of the Heisenberg uncertainty principle. The quantum mechanical situation is the following: A particle in space is represented by its wave function $\psi$, which is complex-valued (and satisfies Schrödinger's equation). The function $|\psi|^2$ is the probability density function of the particle, that is, the probability of finding the particle in a set $A$ is $\int_A|\psi(x)|^2 dx$ and $\psi$ is normalized to have $L^2$-norm equal to $1$. Then the expectation of the placement of the particle is $x_0:=\int_{\mathbb{R}} x|\psi(x)|^2 dx$ and the average displacement from $x_0$ is measured by the variance $\sigma^2:=\int_{\mathbb{R}} (x-x_0)^2|\psi(x)|^2$. It turns out that the momentum of the particle is represented by $\hat{\psi}$, so the variance or uncertainty in momentum is $\int_{\mathbb{R}} (x-x_0)^2|\hat{\psi}(x)|^2$. Now the Heisenberg uncertainty principle states that the product of the uncertainties of placement and momentum is at least a constant, namely $\frac{h}{4\pi}$ in physical units, where $h$ is Planck's constant. In particular, we can never know both of them simultaneously with arbitrary accuracy. Also time and energy form a Fourier transform pair, so also they obey the same law.

Using $L^2$-norms is not essential in the Heisenberg uncertainty principle. Indeed, it generalizes to other norms.

**Theorem 2 (Generalized Heisenberg uncertainty principle).**Let $f\in \mathcal{S}(\mathbb{R})$ and $1\leq p\leq 2$. Then

\[\begin{eqnarray}\|xf\|_p\|\xi\hat{f}\|_p\geq \frac{\|f\|_2^2}{4\pi}.\end{eqnarray}\]

**Proof.**From the proof of the Heisenberg uncertainty principle, we know

\[\begin{eqnarray}\|f\|_2^2=2\|xf\bar{f'}\|_1.\end{eqnarray}\]

Hölder's inequality, denoting $q=\frac{p}{p-1}$, gives us

\[\begin{eqnarray}2\|xf\bar{f'}\|_1\leq 2\|xf\|_p\|f'\|_q.\end{eqnarray}\]

Furthermore, applying the Hausdorff-Young inequality $\|\hat{g}\|_q\leq \|g\|_p$ (see this post), we get

\[\begin{eqnarray}2\|xf\|_p\|f'\|_q\leq 2\|xf\|_p\|\widehat{f'}\|_p=4\pi \|xf\|_p\|\xi \hat{f}\|_p,\end{eqnarray}\]

so the inequality is proved. ■

As the proof of the Heisenberg uncertainty principle involves rather simple ideas, it is perhaps not very surprising that it generalizes to a functional analytic inequality between operators.

**Theorem 3 (Heisenberg uncertainty principle for operators).**Let $V$ be an inner product space and $A:D(A)\to V$ and $B:D(B)\to V$ self-adjoint operators($D(A),D(B)\subset V$ are their domains) . Define the commutator of $A$ and $B$ as

\[\begin{eqnarray}[A,B]:=AB-BA.\end{eqnarray}\]

Then for any $a,b\in \mathbb{R}$, we have the inequality

\[\begin{eqnarray}\|(A+aI)u\|\|(B+bI)u\|\geq \frac{1}{2}|\langle [A,B]u,u \rangle|\end{eqnarray}\]

for all $u\in D[A,B]=D(AB)\cap D(BA)$.

**Proof.**Notice that $[A+aI,B+bI]=AB-BA$. Also note that the adjoint of $A+aI$ is $A+\bar{a}I$. When $u\in D([A,B])$, also $u\in D(A+AI)\cap D(B+bI)$, so we may compute

\[\begin{eqnarray}|\langle [A+aI,b+bI]u,u \rangle|&=&|\langle (A+aI)(B+bI)u,u\rangle-\langle(B+bI)(A+aI)u,u \rangle|\\&=&|\langle (B+bI)u,(A+\bar{a}I)u\rangle-\langle(A+aI)u,(B+\bar{b}I)u \rangle|\\&\leq& |\langle (B+bI)u,(A+\bar{a}I)u\rangle|+|\langle(A+aI)u,(B+\bar{b}I)u \rangle|\\&\leq& \|(B+bI)u\|(A+\bar{a}I)\|+\|(A+AI)u\|(B+\bar{b})I),u\|,\end{eqnarray}\]

so it is enough to show that $\|(A+\bar{a}I)u\|=\|(A+aI)u\|$. To see this, we calculate

\[\begin{eqnarray}\|(A+\bar{a}I)u\|^2&=&\|Au\|^2+|a|^2\|u\|^2-\langle \bar{a}u,Au\rangle -\langle Au, \bar{a}u\rangle\\&=&\|Au\|^2+|a|^2\|u\|^2-\bar{a}\langle u,Au\rangle -a\langle Au, u\rangle\\&=&\|Au\|^2+|a|^2\|u\|^2-\langle Au,au\rangle -\langle au, Au\rangle\\&=&\|(A+aI)u\|^2,\end{eqnarray}\]

which proves the claim.■

In quantum mechanics, operators represent quantities, so this inequality can be used to derive uncertainty principles between different quantities. Moreover, the case $(Af)(x)=if'(x),(Bf)(x)=xf(x)$ is the Heisenberg uncertainty principle for Schwarz functions. However, the theorem above is not mathematically as strong as one would hope. Namely, the domain of $[A,B]$ may be a smaller space than what would be needed. We can certainly extend $[A,B]$ from a dense subset of a vector space to the entire space, but the closure operator $\overline{[A,B]}$ is no longer forced to satisfy the inequality.

**Uncertainty principles via complex analysis**
We present now Hardy's uncertainty principle and Benedick's uncertainty principle. Both of them are qualitative principles, and as complex analysis possesses many qualitative results about analytic functions, it is quite natural to approach these principles with complex analysis. Hardy's uncertainty principle is very instructive in the sense that is shows that a (very) good estimate for $f$ does not imply a (very) good estimate for $\hat{f}$. Evidently knowing the size of $f$ well could not as such imply an estimate for $\hat{f}$, as the decay of $\hat{f}$ depends crucially on the smoothness of $f$, but Hardy's uncertainty principle expresses that even smoothness assumptions on $f$ cannot guarantee an estimate for $\hat{f}$.

**Theorem 4 (Hardy's uncertainty principle).**Let $f:\mathbb{R}\to \mathbb{C}$ be an integrable function satisfying

\[\begin{eqnarray}|f(x)|&\ll& e^{-\pi ax^2}\\|\hat{f}(\xi)|&\ll& e^{-\frac{\pi}{a}\xi^2}\end{eqnarray}\]

for some positive constant $a$ and all $x,\xi \in \mathbb{R}$. Then $f(x)=ce^{-\pi ax^2}$ for some constants $c$ and $a>0$.

**Proof.**Firstly notice that we can assume $a=1$. Indeed, assume that this case has been proved, and consider $f_1(x)=f(\sqrt{a}x)$. Then $|f_1(x)|\ll e^{-\pi x^2}$ and $|\hat{f_1}(\xi)|=|\frac{1}{\sqrt{a}}\hat{f}(\frac{\xi}{a})|\ll e^{-\pi \xi^2}$, and hence $f_1(x)=ce^{-\pi x^2}$, meaning that $f(x)=ce^{-\pi a x^2}$, so the general case follows.

The idea of the proof is the following: First consider the case of an even $f$. The function $\hat{f}(\sqrt{\xi})$ turns out to be an entire function for which $|e^{\pi \xi}\hat{f}((\sqrt{\xi})|$ is bounded, so it equals a constant by Liouville's theorem. This settles the case of an even $f$, and the case of an odd $f$ is similar, but we consider $\frac{1}{\xi}f(\sqrt{\xi})$. In this case, $f$ must be the zero function. Finally, we write $f$ as the sum of an odd and an even function to prove the general case.

Assume that $f$ is even. Since $f$ decays like a gaussian, the integrals

\[\begin{eqnarray}\int_{-R}^R e^{-2\pi i x \xi}f(x)dx\end{eqnarray}\]

converge uniformly to $\hat{f}(\xi)$ as $R\to \infty$. The truncated integrals are clearly analytic in $\xi\in \mathbb{C}$, so a theorem of Weierstrass tells that their uniform limit $\hat{f}$ is also entire. This is an even function, too, so it can be written as

\[\begin{eqnarray}\hat{f}(\xi)=\sum_{n=0}^{\infty}c_n \xi^{2n}, \xi \in \mathbb{C}.\end{eqnarray}\]

Now the function

\[\begin{eqnarray}h(\xi):=\sum_{n=0}^{\infty}c_n \xi^n, \xi \in \mathbb{C}\end{eqnarray}\]

is entire as well. Let $0<\delta<\pi$. We study this function in the sector

\[\begin{eqnarray}D_{\delta}=\left\{Re^{it}:R\geq 0, \,\,0\leq t\leq \delta.\right\}\end{eqnarray}\]

We use our assumption to estimate

\[\begin{eqnarray}|\hat{f}(\xi)|&\leq& \int_{\mathbb{R}}|f(x)||e^{-2\pi i x \xi}|dx\\&\ll& \int_{\mathbb{R}} e^{-\pi x^2}e^{\text{Im}(2\pi \xi)x}dx\\&=&\int_{\mathbb{R}} e^{-\pi x^2}e^{-2\pi i (-i\text{Im}(\xi x))}dx\\&=&\hat{g}(-i\text{Im}(\xi)),\end{eqnarray}\]

where $g(x)=e^{-\pi x^2}$. This is further equal to $g(-i\text{Im}(\xi))=e^{-\pi\text{Im}(\xi)^2}$. Therefore, we get the following bound for $h$:

\[\begin{eqnarray}|h(R)|&\ll e^{-\pi R}|h(Re^{it})| &\ll e^{-\pi R\sin^2 \frac{t}{2}}.\end{eqnarray}\]

We can say that a sufficiently large constant in both of these estimates is $M$.

Next, consider the auxiliary function $g_{\delta}(\xi)=\exp\left(\frac{i\pi \xi e^{-\frac {i\delta}{2}}}{\sin \frac{\delta}{2}}\right)$. We have

\[\begin{eqnarray}|g_{\delta}(Re^{it})|&=&\exp\left(\text{Im}\left(\frac{\pi e^{i(t-\frac{\delta}{2})}}{\sin \frac{\delta}{2}}\right)\right)\\&=&\exp\left(\frac{-\pi R\sin(t-\frac{\delta}{2})}{\sin \frac{\delta}{2}}\right).\end{eqnarray}\]

In particular, $|g_{\delta}(R)|= e^{\pi R}, |g_{\delta}(Re^{i\delta})|=e^{-\pi R}$, and therefore the entire function $g_{\delta}h$ is bounded by $M$ on the boundary of $D_{\delta}$. By the Phragmén-Lindelöf principle (which asserts that an analytic function that grows at most exponentially in $D_{\delta}$ and is bounded by a constant on the boundary, is actually bounded by the same constant in the interior), $|g_{\delta}h|\leq M$ in the whole sector $D_{\delta}$. Since $0<\delta<\pi$ was arbitrary, the same estimate holds for all $0\leq \delta \leq \pi$. By an analogous argument, it also holds for $-\pi\leq \delta \leq 0$, so $|g_{\delta}h|\leq M$ for any $\delta\in[-\pi, \pi]$. Since $|g_{\pi}(Re^{it})|\leq e^{-\pi R \cos t}$, we get for $z=Re^{it}$ that

\[\begin{eqnarray}|e^{\pi z}h(z)|=|e^{\pi R\cos t}h(Re^{it})|\leq M.\end{eqnarray}\]

But then $h(z)=ce^{-\pi z}$ by Liouville's theorem. This means that $\hat{f}(\xi)=e^{-\pi \xi^2}$, that is $f(x)=ce^{-\pi x^2}$, so the case of an even function is finished.

In the case of an odd function, we proceed entirely similarly, but this time we consider the entire function $\frac{1}{\xi}\hat{f}(\sqrt{\xi})$. We see that it is of the form $ce^{-\pi \xi^2}$, but this contradicts the growth condition on $\hat{f}$ unless $f$ is the zero function.

Lastly, write $f=f_{e}+f_{o},$ where the even part is $f_e(x)=\frac{f(x)+f(-x)}{2}$ and the odd part is $f_o(x)=\frac{f(x)-f(-x)}{2}$. Then $f_e$ and $f_o$ satisfy the same growth condition, so the former is a gaussian and the latter is zero. This leads to $f(x)=ce^{-\pi x^2}$, so the proof is complete. ■

The following Benedick's inequality is also very instructive, as one of the simplest formulations of localization is that a function has bounded support or support having finite measure, and Benedick's inequality says that at most one of $f$ and $\hat{f}$ is localized in this sense.

**Theorem 5 (Benedick's inequality).**Let $f\in L^1(\mathbb{R}^n)$, and denote $\Sigma(f)=\{x\in \mathbb{R}^n:f(x)\neq 0\}.$ Then $m(\Sigma(f))m(\Sigma(\hat{f}))=\infty$ unless $f$ is the zero function.

**Proof.**We may assume that $m(\Sigma(f))<1$. Indeed, if this case has been proved, and $k=m(\Sigma(f))$, we can apply the known case for $f(\frac{1}{2k}x)$.

We calculate

\[\begin{eqnarray}\int_{[0,1]^n}\sum_{k\in \mathbb{Z}^n}\chi_{\Sigma(\hat{f})}(x+k)dx=\int_{\mathbb{R}^n}\chi_{\Sigma(\hat{f})}dx=m(\Sigma(\hat{f}))<\infty\\\int_{[0,1]^n}\sum_{k\in \mathbb{Z}^n}\chi_{\Sigma(f)}(x+k)dx=\int_{\mathbb{R}^n}\chi_{\Sigma(f)}dx=m(\Sigma(f))<1.\end{eqnarray}\]

The first of these formulas tells us that $\sum_{k\in \mathbb{Z}^n}\chi_{\Sigma(\hat{f})}(x+k)<\infty$ for $x\in E\subset [0,1]^n,$ where $E$ has measure $1$. Thus for all $x\in E$, $\hat{f}(x+k)\neq 0$ for only finitely many $k$. The second formula tells that there exists a set $F\subset[0,1]^n$ of positive measure such that $\sum_{k\in \mathbb{Z}^n}\chi_{\Sigma(f)}(x+k)<1$ for $x\in F$. In other words, $f(x+k)=0$ for all $x\in F$ and $k\in \mathbb{Z}^n$.

For $a\in E,$ we consider the series $F_a(x)=\sum_{k\in \mathbb{Z}^n}f(x+k)e^{-2\pi i a(x+k)}$. From the proof of the Poisson summation formula (see this post), we know that $F_a\in L^1([0,1]^n)$ and that the Fourier series of $F_a$ is $\sum_{k\in \mathbb{Z}^n}\hat{f}(a+k)e^{2\pi ia x}$. Because of $a\in E$, we know that the Fourier series is just a trigonometric polynomial, so $F_a$ is equal to its Fourier series and is an entire function of $x\in \mathbb{C}^n$ (this just means that $F_a$ is entire in each of its variables separately). Therefore, from complex analysis we know that unless $F_a$ is identically zero, its zeros form a set of measure zero in $\mathbb{R}^n\subset \mathbb{C}^n$ (in one dimension, this is a classical result, and in $n$ dimensions it follows by noticing that if this set $A\subset \mathbb{R}^n$ has positive measure, it intersects some 'vertcal' line in a set of positive measure, so we can apply the one-dimensional case). However, for $x\in F$,

\[\begin{eqnarray}|F_a(x)|\leq \sum_{k\in \mathbb{Z}^n}|f(x+k)|=0,\end{eqnarray}\]

so $F_a$ is identically zero for any $a\in E$. But then $\hat{f}(a+k)=0$ for all $k\in \mathbb{Z}^n$ whenever $a\in E$. Thus $\hat{f}=0$ almost everywhere, so $f$ is the zero function. ■

**An uncertainty principle for entropy**
There is an interesting inequality for entropies of $f$ and $\hat{f}$, known as Hirschman's uncertainty principle.

**Theorem 6 (Hirschman).**For a nonnegative measurable function $g$, define its entropy as $H(g)=\int_{\mathbb{R}}g(x)\log g(x)dx$ whenever the integral exists. For $f \in L^1\cap L^2$ with $\|f\|_2=1$ we have $H(|f|^2)+H(|\hat{f}|^2)\leq 0$.

**Proof.**We may assume that $H(|f|^2)$ and $H(|\hat{f}|^2)$ are finite. Our starting point is the Hausdorff-Young inequality $\|\hat{f}\|_q\leq \|f\|_p$ for $1\leq p\leq 2$ and $q=\frac{p}{p-1}$ (see this post). We see that $0\leq \frac{1}{q}\log\|\hat{f}\|_q^q-\frac{1}{p}\log\|f\|_p^p$. As $f\in L^p$, $\hat{f}\in L^q$, we may define the functions

\[\begin{eqnarray}A(p)=\int_{\mathbb{R}}|f(x)|^p dx\quad B(q)=\int_{\mathbb{R}}|\hat{f}(\xi)|^q d\xi.\end{eqnarray}\]

Then for $0<2<h$

\[\begin{eqnarray}\frac{A(2)-A(2-h)}{h}=\int_{\mathbb{R}}\frac{|f(x)|^2-|f(x)|^{2-h}}{h}dx\to \int_{\mathbb{R}}|f(x)|^2\log |f(x)|dx\end{eqnarray}\]

as $h\to 0+$ by dominated convergence as $a^h=1+ha\log a+O(h^2)$. Similarly, we obtain

\[\begin{eqnarray}B'(2+)= \int_{\mathbb{R}}|\hat{f}(\xi)|^2\log |\hat{f}(\xi)|d\xi.\end{eqnarray}\]

On the other hand, differentiating $C(p):=\frac{1}{q}\log B(q)-\frac{1}{p} A(p)$ in $p$ (where $q=\frac{p}{p-1}$) gives

\[\begin{eqnarray}C'(p)=-q^{-2}q'\log B(q)+q^{-1}q'\frac{B'(q)}{B(q)}+p^{-2}A(p)-p^{-1}\frac{A'(p)}{A(p)}.\end{eqnarray}\]

As $p\to 2-$, using $A(2)=B(2)=1,$ we get $C'(2-)=-\frac{1}{2}A'(2-)-\frac{1}{2}B'(2+)=-\frac{H(|f|^2)+H(|\hat{f}|^2)}{2}$, so we must prove that $C'(2-)\geq 0$. Since $C(p)\leq 0$ by the Hausdorff-Young inequality and $C(2)=0,$ this is true, so the claim is proved. ■

The entropy inequality indicates for instance that $f$ and $\hat{f}$ cannot both be large (say $>2$) on an interval outside which they decay quickly. It turns out that the constant $0$ in the inequality is not optimal; instead the constant $-\log(\frac{e}{2})$ is. However, proving this is significantly harder since it requires the optimal constant $p^{\frac{1}{2p}}q^{-\frac{1}{2q}}$ in the Hausdorff-Young inequality, which was obtained by Beckner in 1975 (who also proved the strong entropy inequality). If this inequality is assumed, the proof of the stronger inequality follows along the same lines as above.