The More Realistic Fourier Transform

Previously, we’ve taken a look at the continuous Fourier transform (FT), which is a powerful tool for decomposing a signal into its constituent frequencies. However, as we’ve briefly mentioned in the conclusion of that article, in practice, we never actually observe a continuous signal. Therefore, the tool is useless and we end our discussion here. Just kidding.

This is where the discrete Fourier transform (DFT) comes in. Essentially, the DFT is a discretized version of the continuous FT, which provides a much more realistic and applicable approach for applying the transformation on discrete data (although actually still not the one used by popular Fourier transform packages).

***

If you came here from the article on continuous FT, note the notation change from $F(\omega)$ to $X(\omega)$ and $f(t)$ to $x(t)$ in the coming equations. I simply did this to not confuse us because I’ll be separately defining a variable $f$ for frequency.

Recall that the continuous FT of a function $x(t)$ is defined as

X(\omega) = \int_{-\infty}^{\infty} x(t)\, e^{-i\omega t}\, dt,

where $\omega$ is a continuous frequency variable. This function simply calculates the projection of $x(t)$ onto the complex exponential $e^{-i\omega t}$ , and by the orthogonality of these basis functions, $X(t)$ tells us the contribution of $\omega$ to the overall signal.

But what if we don’t have $f(t)$ for all $t \in \mathbb{R}$ , and have instead a finite sequence of $N$ equally spaced observations $t_0, t_1, …, t_{N-1}$ , sampled at times $t_n = n \Delta t$ for some sampling interval $\Delta t$ ? To work on this data, we must first derive the DFT from the continuous case. There are different ways to achieve this, but a simple one I considered is using a Riemann sum.

Essentially, by replacing $t$ with $n \Delta t$ and $dt$ with $\Delta t$ , the integral becomes

X(\omega) \approx \sum_{n=0}^{N-1} x(n\Delta t)\, e^{-i\omega n \Delta t}\, \Delta t.

Now, we simply evaluate this at a discrete set of frequencies $\omega_k = \frac{2\pi k}{N \Delta t}$ for $k=0,1,…,N-1$ . By substituting and absorbing the constant $\Delta t$ into the definition, we arrive at the standard DFT:

X[k] = \sum_{n=0}^{N-1} x[n]\, e^{-i 2\pi k n / N}, \quad k = 0, 1, \dots, N-1,

where each output $X[k]$ is a complex number that encodes the contribution of frequency $k$ to the series. Keep in mind that because we started by discretizing and truncating an integral, we’ve committed to a set of assumptions whose consequences are worth examining carefully.

The Discreteness of Frequency

In the continuous FT, $\omega$ can take any real value, and $X(\omega)$ is a continuous function of frequency. In the DFT, however, frequencies are restricted to a finite grid where the $k$ -th output corresponds to a physical frequency of

f_k = \frac{k}{N \Delta t}

measured in cycles per unit time. These discrete frequencies are what we call frequency bins. There are exactly $N$ of them, matching the $N$ observations, which is actually just a direct reflection of the invertibility of the transformation.

As a consequence of this discreteness, the DFT can only detect periodicities that correspond exactly to one its $N$ bins. This doesn’t mean that a sinusoidal component with a frequency that falls between two bins is invisible. Rather, it bleeds its energy across multiple bins (i.e., cannot be cleanly represented), in a phenomenon we’ll discuss later on called leakage.

Speaking of “between two bins,” we define the spacing between adjacent frequency bins as

\Delta f = \frac{1}{N \Delta t}.

This spacing defines the frequency resolution of the DFT—the minimum separation between two frequencies that can be distinguished. A longer series (larger $N$ ) gives finer resolution, meaning the bins are more tightly packed and we can resolve nearby frequencies, while a shorter series gives coarser resolution. This trade-off between the length of the data and the granularity of the frequency domain is fundamental as a consequence of the uncertainty principle, and its optimization will always be one of our objectives.

Amplitude as Variance and Phase as Timing

So anyway, back to the output $X[k]$ . Because the Fourier transform was founded in engineering disciplines, the audience readily accepted concepts of amplitude and phase. However, I’d like to take the time to break down precisely what these two concepts mean to us as statisticians.

The modulus of the output $|X[k]|$ is often called the amplitude of frequency $k$ , but from a statistical perspective, it’s more illuminating to think about $|X[k]|^2$ .

Consider again what the DFT does: It projects some series $x[n]$ onto complex sinusoidal basis functions. $|X[k]|^2$ here basically measures how much of the total “energy” in the series is attributable to frequency $k$ . In time series analysis, when the series is mean-centered, energy and variance are directly related concepts; both measure the aggregate squared deviation from the mean.

This connection is made precise by Parseval’s theorem, which states that

\sum_{n=0}^{N-1} |x[n]|^2 = \frac{1}{N} \sum_{k=0}^{N-1} |X[k]|^2,

where the left-hand side is (proportional to) the total variance of the series and the right-hand side is a sum over all frequency bins of their squared amplitudes. In other words, Parseval’s theorem tells us that the DFT is a lossless decomposition of variance in the sense that the total variance of the series is exactly partitioned across frequency bins, with $|X[k]|^2 / N$ being the contribution of frequency $k$ . The power spectral density is precisely this quantity, and it’s the DFT’s answer to the question: “At which frequencies does my series fluctuate, and by how much?”

$|X[k]|$ actually also contains a phase argument $\phi_k = \angle X[k]$ . While amplitude tells you how much a frequency contributes to variance, phase tells you when that frequency’s cycle is. More precisely, it encodes the timing offset of the sinusoidal component relative to the start of the observation window.

Why does this matter to us? Consider two financial return series that share a common cyclical component at frequency $f_k$ , perhaps driven by a shared macroeconomic factor, but one series responds one trading day before the other. This lag would manifest as a difference in the phase of $X[k]$ between the two series. Specifically, if series $y[n]$ lags series $x[n]$ by $d$ periods at frequency $k$ , then we say that

\angle Y[k] – \angle X[k] \approx -\frac{2\pi k d}{N}.

This phase difference is a direct, interpretable measure of the lead-lag relationship between the two series at that specific frequency. Also, it’s frequency-specific. For example, it can spot that two series may be contemporaneous at low frequencies (business cycle) but exhibit a lag at higher frequencies (weekly oscillations). As we can see, the DFT gives us the resolution to distinguish these cases, whereas time-domain cross-correlations only produce a single aggregate picture.

By now, we should be interpreting amplitude as variance as phase as timing, which allows us to better appreciate DFT as a statistical analysis tool.

The Nyquist Frequency and Aliasing

Earlier, we noted that the DFT produces $N$ frequency bins for $N$ observations. As it turns out, not all of these bins are meaningful. Because our data is sampled at discrete intervals of $\Delta t$ , there’s a hard upper limit on the frequencies we can distinguish, called the Nyquist frequency, which we define as

f_{\text{Nyq}} = \frac{1}{2\Delta t}.

This is the highest frequency that can be unambiguously represented given the sampling rate. The intuition is straightforward: to resolve one full cycle of a sinusoid, we need at least two observations per cycle—one to capture the peak and another to capture the trough. Any frequency higher than this limit becomes indistinguishable from a lower one after sampling. We call this distortion aliasing.

That’s why if our DFT outputs show suspicious energy at very high frequencies, we should consider whether aliasing, rather than a genuine high-frequency phenomenon, is the source.

Leakage and Windowing

We’ve also made mention that leakage is an important consequence of the discretization process. The DFT actually assumes that the observed window of $N$ samples represents a periodic signal with period $N$ that repeats indefinitely. Obviously, this is almost never satisfied in practice because real series don’t typically loop; they begin and end, and the values at the start and end of the window are generally not equal.

Now, when the series doesn’t “close” cleanly, the abrupt transitions at the boundaries of the observation window series inject high-frequency content into the DFT that wasn’t in the original signal. More generally, if the true signal contains a sinusoidal component at a frequency that does not fall exactly on one of the DFT’s frequency bins, that component cannot be represented cleanly by a single bin. Instead, its energy leaks across adjacent bins, sometimes spreading quite broadly depending on the severity of the mismatch. This isn’t an issue of computing error, but rather a fundamental consequence of observing a finite stretch of data.

Leakage can obscure weak frequency components that are located near a dominant one, and it distorts the apparent shape of the power spectrum. Of course, any DFT output we compute from real data is, to some degree, affected by it.

The standard approach to mitigating leakage is windowing, meaning before applying the DFT, we multiply the series $x[n]$ by a smooth, tapering function $w_n$ that gradually reduces the series to zero at both ends of the observation window. The intuition is that the sharp edges at the boundaries of the raw series (i.e., where the signal jumps abruptly) are the main source of leakage. By tapering the edges, we smooth out the discontinuity and reduce the spread of energy into neighboring bins.

There are many different types of windowing like the Hann, Hamming, and Blackman windows, each representing a different engineering choice about how aggressively to taper. What each tool does is beyond the scope of this article, but it’s worth noting that they all work under the same fundamental trade-off we’re very familiar with: bias versus variance.

Essentially, more aggressive tapers reduce leakage (lower variance in the spectral estimate, less contamination from neighboring frequencies) but also widen the main lobe of the frequency response, meaning the window itself blurs frequency components that are close together, making them harder to distinguish (higher bias, lower resolution).

The Inverse DFT

Before concluding, I want to introduce that in the same way the continuous FT is invertible, the DFT is also. Given the $N$ complex outputs $X[0], X[1], \ldots, X[N-1]$ , the original series can be exactly recovered via the inverse DFT (IDFT):

x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k]\, e^{i 2\pi k n / N}, \quad n = 0, 1, \ldots, N-1.

As we can see from above, the IDFT is simply a weighted sum of complex sinusoids, each at a frequency bin $k$ , each with complex amplitude $X[k]$ . The factor $1/N$ is a normalization constant to ensure that the transform and its inverse compose to the identity.

So far, we’ve covered the fundamental limitations of the continuous FT and so introduced the DFT as the more applicable version. Then, we discussed two important engineering terms—amplitude and phase—and define them from the perspective of a statistician. Afterwards, we also saw how discretizing an originally continuous function also comes with its own set of drawbacks, one of which is leakage, whose remedy, windowing, is subject to the bias-variance trade-off. Finally, we rounded off with the invertibility of the DFT by introducing the IDFT, which perfectly reconstructs the time signals from the complex outputs, proving once again that the Fourier transform is indeed lossless.

The More Realistic Fourier Transform

The Discreteness of Frequency

Amplitude as Variance and Phase as Timing

The Nyquist Frequency and Aliasing

Leakage and Windowing

The Inverse DFT

Comments

Leave a Reply Cancel reply

More Posts

What K-means Says about Stocks

Penalized Regression for Stock Returns

Continuous Latent States with Kalman Filters

HMMs for Volatility Regime-Switching

GARCH Sees What ARIMA Cannot

Can ARIMA Predict SPY Data?