The Fourier transform is important in
mathematics, engineering, and the physical sciences. Its discrete
counterpart, the Discrete Fourier Transform (DFT), which is normally
computed using the so-called Fast Fourier Transform (FFT), has
revolutionized modern society, as it is ubiquitous in digital
electronics and signal processing. Radio astronomers are
particularly avid users of Fourier transforms because Fourier
transforms are key
components in data processing (e.g., periodicity searches) and
instruments (e.g., antennas, receivers, spectrometers), and they are
the cornerstores of
interferometry and aperture synthesis.

The Fourier transform is a reversible,
linear transform with many important properties. For any function
$f(x)$ (which in astronomy is usually real-valued, but $f(x)$ may be
complex), the Fourier transform can be denoted
$F(s)$, where the product of $x$ and $s$ is dimensionless. Often
$x$ is a measure of time $t$ (i.e., the time-domain signal) and so $s$
corresponds to inverse time, or frequency $\nu$ (i.e., the frequency-domain signal).

The Fourier
transform is defined by

$$\bbox[border:3px blue solid,7pt]{F(s) \equiv
\int^{\infty}_{-\infty}f(x)\,e^{-2\pi i s x}\,dx~,}\rlap{\quad \rm
{(SF1)}}$$

which is usually
known as the forward
transform,
and

$$\bbox[border:3px blue solid,7pt]{f(x) \equiv
\int^{\infty}_{-\infty}F(s)\,e^{2\pi i s x}\,ds~,

}\rlap{\quad \rm {(SF2)}}$$

which is
the inverse
transform.
In both cases, $i \equiv \sqrt{-1}$. Alternative definitions of
the
Fourier transform are based on angular frequency ($\omega = 2\pi \nu$),
have different normalizations, or the opposite sign convention in the
complex exponential. Since Fourier transformation is reversible,
the symmetric symbol $\Leftrightarrow$ is often used to mean "is the
Fourier transform of"; e.g., $F(s) \Leftrightarrow f(x)$.

The complex exponential is the heart
of
the transform. A complex exponential is simply a complex number
where both the real and imaginary parts are sinusoids. The exact
relation is called Euler's
formula

$$\bbox[border:3px blue solid,7pt]{e^{i\phi} = \cos\phi +
i\sin\phi,}\rlap{\quad \rm {(SF3)}}$$

which leads to the
famous (and beautiful) identity $e^{i\pi}+1=0$ that relates five of the
most important numbers in mathematics.
Complex exponentials are much easier to manipulate than trigonometric
functions, and they provide a compact notation for
dealing with sinusoids of arbitrary phase, which form the basis of the
Fourier transform.

Complex exponentials (or sines and cosines) are periodic functions, and the set of complex exponentials is complete and orthogonal. Thus the Fourier transform can represent any piecewise continuous function and minimizes the least-square error between the function and its representation. There exist other complete and orthogonal sets of periodic functions; for example, Walsh functions (square waves) are useful for digital electronics. Why do we always encounter complex exponentials when solving physical problems? Why are monochromatic waves sinusoidal, and not periodic trains of square waves or triangular waves? The reason is that the derivatives of complex exponentials are just rescaled complex exponentials. In other words, the complex exponentials are the eigenfunctions of the differential operator. Most physical systems obey linear differential equations. Thus an analog electronic filter will convert a sine wave into another sine wave having the same frequency (but not necessarily the same amplitude and phase), while a filtered square wave will not be a square wave. This property of complex exponentials makes the Fourier transform uniquely useful in fields ranging from radio propagation to quantum mechanics.

The continuous Fourier transform
converts a time-domain signal of infinite duration into a
continuous spectrum
composed of an infinite number of sinusoids. In astronomical
observations we deal with signals that are discretely sampled, usually
at constant intervals, and of finite duration or periodic. For
such data, only a finite number of sinusoids is needed and the
Discrete
Fourier Transform
(DFT) is appropriate. For almost every Fourier
transform theorem or property, there is a related theorem or property
for the
DFT. The DFT of $N$ uniformly sampled data points $x_j$ (where $j
=
0,\dots,N-1$) and its inverse are defined by

$$\bbox[border:3px blue
solid,7pt]{X_k =
\sum_{j=0}^{N-1}x_j\,e^{-2\pi i j k/N}}\rlap{\quad \rm {(SF4)}}$$ and

$$\bbox[border:3px blue solid,7pt]{x_j =
\frac{1}{N}\sum_{k=0}^{N-1}X_k\,e^{2\pi i j k/N}~.}\rlap{\quad \rm
{(SF5)}}$$

Once again, sign and
normalization conventions may vary, but our definition is the most
common. The continuous variable $s$ has been replaced by the
discrete variable (usually an integer) $k$.

The result of the DFT of an N-point
input time series is an N-point frequency spectrum, with Fourier
frequencies $k$ ranging from $-(N/2-1)$, through the 0-frequency or
so-called "DC" component, and up to the highest Fourier frequency
N/2. Each bin number represents the integer number of sinusoidal
periods present in the time series. The amplitudes and phases
represent the amplitudes $A_k$ and phases $\phi_k$ of those
sinusoids. In summary, each bin can be described by $X_k =
A_k\,e^{i\phi_k}$.

For real-valued input data, however,
the resulting DFT is hermitian—the
real-part of the spectrum is an even function and the
imaginary part is odd, such that $X_{-k} = \overline{X_k}$,
where the bar represents complex conjugation. This means that all
of the "negative" Fourier frequencies provide no new information.
Both the k=0 and k=N/2 bins are real-valued, and there
is a total of N/2+1 Fourier bins, so the total number of independent
pieces of information (i.e. real and complex parts) is $N$, just as for
the input time series. No information is created or
destroyed by the DFT.

Other symmetries exist between time-
and frequency-domain signals as well:

Time Domain |
Frequency Domain |

real |
hermitian
(real=even, imag=odd) |

imaginary |
anti-hermitian
(real=odd, imag=even) |

even |
even |

odd |
odd |

real
and even |
real
and
even (i.e. cosine transform) |

real
and odd |
imaginary
and odd (i.e. sine transform) |

imaginary
and even |
imaginary
and even |

imaginary
and odd |
real
and odd |

Usually the DFT is computed by a very
clever (and truly revolutionary) algorithm known as the Fast
Fourier Transform or
FFT. The FFT was discovered by Gauss in 1805 and re-discovered
many times since, but most people attribute its
modern incarnation to James W. Cooley and John W. Tukey ("An algorithm
for the machine calculation of complex Fourier series," *Math.
Comput.* **19**, 297–301) in
1965. The key advantage of the FFT over the DFT is that the
operational complexity decreases from $O(N^2)$ for a DFT to
$O(N\log_2(N))$ for the FFT. Modern implementations of the FFT
(such as FFTW) allow $O(N\log_2(N))$
complexity for any value of
$N$, not just those that are powers of two or the products of only
small primes.

Example: Estimate the speed increases obtained by computing the FFTs instead of DFTs for transforms of length $10^3$, $10^6$, and $10^9$ points.

$$\text{speed improvement}\,(N=10^3) \propto \frac{N^2}{N\log_2(N)} = \frac{N}{\log_2(N)} \sim \frac{10^3}{10} \sim 100$$ $$\text{speed improvement}\,(N=10^6) \propto \frac{N}{\log_2(N)} \sim \frac{10^6}{20} \sim 5\times 10^4$$ $$\text{speed improvement}\,(N=10^9) \propto \frac{N}{\log_2(N)} \sim \frac{10^9}{30} \sim 3\times 10^7$$

For a DFT to represent a function
accurately, the original function must be sampled at a sufficiently
high rate. The appropriate rate for a uniformly sampled time
series is determined by the Nyquist-Shannon
Theorem or Sampling
Theorem. This theorem
states that any continuous baseband signal (signal extending down to
zero frequency) may be identically reconstructed if the signal is
bandwidth limited and the sampling frequency is at least twice the
bandwidth of the signal (i.e. the highest frequency of a baseband
signal). That critical sampling rate, $1/\Delta t$, where $\Delta
t$ is the time between successive
samples, is known as
the Nyquist
rate, and it is a property of the time-domain signal based on
its frequency content. Somewhat confusingly, if a time-domain
signal is sampled uniformly, then the frequency corresponding to
one-half that rate is called the Nyquist
frequency,

$$\bbox[border:3px blue solid,7pt]{\nu_{\rm N/2} =

1/(2\,\Delta t)~.}\rlap{\quad \rm {(SF6)}}$$

The Nyquist frequency describes the high frequency cut-off of the
system doing the sampling, and is therefore a property of that
system. Any frequencies present in the original signal which are
at higher frequencies than the Nyquist frequency will be aliased
to other lower frequencies in the sampled band as described
below. If that signal was band-limited and then sampled at the
Nyquist rate, in accordance to the Sampling Theorem, no aliasing will
occur.

In a DFT, where there are $N$ samples
spanning a total time $T = N\,\Delta t$, the frequency resolution is
$1/T$. Each Fourier bin number $k$ represents exactly $k$
sinusoidal oscillations in the original data $x_j$, and therefore a
frequency $\nu = k/T$ in Hz. The Nyquist frequency corresponds
to bin $k = \nu_{\rm N/2}T = T/(2\,\Delta T) = NT/(2T) =
N/2$. If the signal is not bandwidth limited and
higher-frequency components [with $k > N/2$ or $\nu > N/(2T)$
Hz] exist, those frequencies will show up in the DFT aliased back into lower
frequencies $f_{\rm a} = N/T -
\nu$, assuming that $N/(2T) < \nu < N/T$. Such aliasing
can be avoided by filtering the input data to ensure that it is
properly band-limited.

Examples:

The (young and undamaged) human ear can hear sounds with frequency components up to around 20 kHz. Therefore, near-perfect audio recording systems must sample audio signals at at least 40 kHz to be Nyquist sampled. Because of the practical nature of real-life audio filters (which do not have infinitely sharp cutoffs), audio CDs are sampled at 44100 Hz. That gives a 2 kHz buffer which allows imperfect lowpass audio filters to filter out higher frequencies which would otherwise be aliased into the audible band.

A visual example of an aliased signal often occurs in western movies where the 24 frame-per-second rate of the movie camera performs "stroboscopic" sampling of a rapidly spinning wagon wheel. When the rotation rate of the wheel is below the Nyquist rate (12 Hz), it appears to be spinning at the correct rate and in the correct direction. When it is rotating faster than 12 Hz but slower than 24 Hz, it appears to be rotating the opposite direction, and at a slower rate. As the rotation rate approaches 24 Hz, the wheel apparently slows down and then stops when the rotation rate equals twice the Nyquist rate. (Note: Because of the symmetry of wagon wheels, this is a slightly simplified picture. Aliasing actually occurs at wheel rotation rates exceeding 12 Hz divided by the number of spokes.)

A useful quantity in astronomy is the
power spectrum $
\overline{F(s)}F(s) = \left|F(s)\right|^2$. The power spectrum
contains no phase information from the original function. Rayleigh's
Theorem (sometimes called
Plancherel's Theorem and related to Parseval's Theorem for Fourier
series) shows that the integral of the power spectrum equals the
integral of the squared modulus of the function (i.e. the energies
in the frequency and time domains are equal):

$$\bbox[border:3px blue
solid,7pt]{\int^{\infty}_{-\infty}\left|f(x)\right|^2\,dx
= \int^{\infty}_{-\infty}\left|F(s)\right|^2\,ds}\rlap{\quad \rm {(SF7)}}$$

Very much related to the convolution theorem, the cross-correlation theorem states that the Fourier transform of the cross-correlation of two functions is equal to the product of the individual Fourier transforms, where one of them has been complex conjugated:

$$\bbox[border:3px blue solid,7pt]{f\star g \Leftrightarrow \overline{F}\cdot G}\rlap{\quad \rm {(SF16)}}$$

Autocorrelation is a special case of cross-correlation with $f\star f$. The related autocorrelation theorem is also known as the Wiener-Khinchin theorem and states

$$\bbox[border:3px blue solid,7pt]{f\star f \Leftrightarrow \overline{F}\cdot F = \left|F\right|^2}\rlap{\quad \rm {(SF17)}}$$

In words, the Fourier transform of an autocorrelation function is the power spectrum, or equivalently, the autocorrelation is the inverse Fourier transform of the power spectrum. Many radio-astronomy instruments compute power spectra using autocorrelations and this theorem. The following table summarizes the relations between a function, its Fourier transform, its autocorrelation, and its power spectrum:

$x_j$ (function) |
$\Leftrightarrow$ DFT |
$X_k$ (transform) |

$\Downarrow$ |
$\Downarrow$ | |

$x_j
\star x_j$ (autocorrelation) |
$\Leftrightarrow$ DFT |
$\left|X_k\right|^2$ (power spectrum) |

One important thing to remember about
convolution and correlation using DFTs is that they are cyclic with a period corresponding
to the length of the longest component of the convolution or
correlation. Unless this periodicity is taken into account
(usually by zero-padding one of the input functions), the convolution
or correlation will wrap around the ends and possibly "contaminate" the
resulting function.

Tones,
harmonics, filtering, and sounds

A nice online
book on the mathematics of the DFT