Beyond Constant Volatility

In our Black-Scholes post, we saw that the log-return over any interval assumed by the model is normally distributed with variance proportional to $\sigma^2 T$ , where $\sigma$ is a constant. This implies three things that empirical data consistently contradict: no volatility clustering (large moves tend to follow large moves in real markets), no mean-reversion (volatility spikes during crises and subsides afterward), and symmetric returns (market exhibits negative skew).

These limitations give rise to the Heston model. Unlike the Black-Scholes, this model lets variance $v_t = \sigma_t^2$ evolve as its own stochastic process, correlated with the stock price process. The correlation parameter $\rho$ is the key to generating skew: when $\rho < 0$ , a falling stock price tends to coincide with rising volatility, which fattens the left tail of the return distribution exactly as observed.

***

The Heston model is defined by a pair of coupled SDEs. Under the risk-neutral measure $\mathbb{Q}$ ,

dS_t = r S_t \, dt + \sqrt{v_t} \, S_t \, dW_t^S

dv_t = \kappa(\theta – v_t) \, dt + \xi \sqrt{v_t} \, dW_t^v

with $\text{Corr}(dW_t^S, dW_t^v) = \rho \, dt$ .

The first equation is simply GBM with a time-varying volatility $\sqrt{v_t}$ replacing the constant $\sigma$ . The second equation is the Cox-Ingersoll-Ross (CIR) process, originally developed for interest rate modeling, where

$\kappa > 0$ is the mean-reversion speed (large $\kappa$ means variance snaps back quickly to its long-run level)
$\theta > 0$ is the long-run variance (the level that $v_t$ gravitates toward over time)
$\xi > 0$ is the vol-of-vol (governs how erratically variance itself moves)
$\rho \in (-1, 1)$ is the correlation between the two Brownian motions ( $\rho < 0$ is the empirically relevant regime and generates the observed negative skew)
$v_0 > 0$ is the initial variance (must be estimated or inferred from the market)

We’re not going to discuss the CIR process because it falls out of the scope of the article. However, it’s worth noting that the process is appealing for variance modeling for two reasons:

Its mean-reverting drift $\kappa(\theta – v_t)$ encodes the empirical regularity that volatility doesn’t drift to infinity (or collapse to zero permanently)
The $\sqrt{v_t}$ diffusion term ensures that the process never becomes negative, which is a necessary property for a variance. The technical condition that guarantees $v_t > 0$ almost surely is the Feller condition, which says that

2\kappa\theta > \xi^2.

Intuitively, the condition requires the mean-reverting push to be strong enough relative to the random fluctuations to prevent the process from touching zero. Unfortunately, in practice, calibrated Heston parameters frequently violate the Feller condition, which typically causes numerical issues and is actually one of the model’s known limitations.

Volatility as a Latent Process

Just like in the case of Black-Scholes, $v_t$ is unobservable. We can observe stock prices $S_t$ (or equivalently, log-returns), but variance is a hidden state that drives those observations. If you’ve seen our posts on HMM and Kalman Filters, we can spot the defining structure of a state-space model.

Formally, let $x_t = \log(S_t)$ be the observed process and $v_t$ be the latent state. Then, the system can be written schematically as:

\text{Observation: } dx_t = \left(r – \frac{v_t}{2}\right)dt + \sqrt{v_t}\, dW_t^S

\text{State: } dv_t = \kappa(\theta – v_t)\,dt + \xi\sqrt{v_t}\, dW_t^v

This is basically a continuous-time (because we’re modeling infinitesimal changes), non-linear (via $\sqrt{v_t}$ terms) state-space model with correlated noise (since $\text{Corr}(dW_t^S, dW_t^v) = \rho \, dt$ ).

The statistical goal is therefore to estimate the filtered distribution $\Pr(v_t \mid x_0, x_1, \ldots, x_t)$ , i.e., our belief about the current variance given all observed prices up to today. If the model were linear and Gaussian, this would reduce to the Kalman filter, yielding closed-form updates. But because of non-linearity, we are instead forced to resort to approximations. Two common approaches include:

Extended Kalman Filters (EKFs), which linearize the dynamics around the current estimate via a first-order Taylor expansion. It’s fast but biased when non-linearity is severe.
Particle filters (Sequential Monte Carlo), which represent the filtering distribution as a cloud of weighted samples that are propagated and reweighed at each observation. It’s asymptotically exact but computationally expensive.

It’s important to note that this filtering problem is distinct from calibration (fitting parameters to option prices), which we’ll discuss later. Filtering asks: given fixed parameters, what is $v_t$ right now? Calibration asks: what parameters $(\kappa, \theta, \xi, \rho, v_0)$ make the model best fit observed prices? In practice, both problems must be solved jointly, and their interaction is a source of significant statistical difficulty.

Pricing via the Characteristic Function

So far, we’ve only talked about what the Heston model is. But more importantly, how do we use it to actually price options? Specifically, given the joint dynamics of $S_t$ and $v_t$ , how do we price a European call option? The Black-Scholes route of computing $\mathbb{E}^\mathbb{Q}[\max(S_T – K, 0)]$ by integrating over a log-normal density is no longer available because the marginal distribution of $\log(S_T)$ under Heston is not log-normal and has no simple closed-form density.

Instead, we use the characteristic function (CF). For a random variable $X$ , its CF is given by

\phi_X(u) = \mathbb{E}\left[e^{iuX}\right], \quad u \in \mathbb{R},

which is simply the Fourier transform of the p.d.f. of $X$ . Crucially, even when the p.d.f. itself is intractable, the CF may have a closed form—and this is exactly the case for Heston.

The reason traces back to the affine structure of the model. A model is affine if the drift and diffusion coefficients of the state vector are affine (linear plus constant) functions of the state. The Heston model is affine in the state $(x_t, v_t)$ : both the drift and the squared diffusion are linear in $v_t$ . This structure implies that the log-price characteristic function takes an exponential-affine form:

\phi(u, \tau) = \mathbb{E}^\mathbb{Q}\!\left[e^{iu\log S_T} \mid S_t, v_t\right] = \exp\!\left(A(u, \tau) + B(u, \tau)\, v_t + iu\log S_t\right)

where $\tau = T – t$ is time to maturity, and $A(u, \tau)$ and $B(u, \tau)$ are deterministic functions satisfying a system of Riccati ODEs (i.e., ODEs that admit closed-form solutions). The derivation simply follows from applying Itô’s lemma to the conjectured exponential-affine form and matching coefficients; the affine structure of the model guarantees that the ansatz is self-consistent.

Once we have $\phi(u, \tau)$ , option prices follow via the Carr-Madan formula, which expresses the call price as a Fourier integral:

C(K) = \frac{e^{-\alpha \log K}}{\pi} \int_0^\infty \text{Re}\!\left[e^{-iu\log K} \psi(u)\right] du

where $\alpha > 0$ is a damping parameter that ensures integrability, and $\psi(u)$ is a simple algebraic transform of $\phi$ . We then evaluate this integral numerically using the Fast Fourier Transform (FFT), which recovers option prices across an entire grid of strikes simultaneously in $\mathcal{O}(n \log n)$ time. The efficiency of this approach is one of the main reasons Heston remains the industry workhorse for stochastic volatility pricing.

Calibrating Heston

As with Black-Scholes, we frame the calibration of the parameter vector $\boldsymbol{\theta} = (\kappa, \theta, \xi, \rho, v_0)$ (that best fits a panel of observed market option prices) as a non-linear least squares problem:

\hat{\boldsymbol{\theta}} = \arg\min_{\boldsymbol{\theta}} \sum_{i} w_i \left(C_{\text{market}}(K_i, T_i) – C_{\text{Heston}}(K_i, T_i;\boldsymbol{\theta})\right)^2

where $w_i$ are weights (often inverse bid-ask spreads, to down-weight illiquid options) and $C_{\text{Heston}}$ is computed via the aforementioned Carr-Madan FFT. The objective is smooth in $\boldsymbol{\theta}$ , and gradients can be computed analytically or via automatic differentiation, enabling efficient gradient-based optimizers.

Now, there are several challenges to this in practice:

The five parameters are not all independently identifiable from a finite option panel, where $\kappa$ and $\xi$ are often nearly collinear in their effect on the smile shape, leading to flat likelihood landscapes and unstable estimates.
The objective surface is non-convex. Different initializations can converge to qualitatively different parameter sets that achieve similar in-sample fit but behave very differently when extrapolating to unobserved strikes or maturities (multiple local minima).
Calibrated parameters $\hat{\boldsymbol{\theta}}$ change day to day as the market moves, making Heston not a stable generative model of the data due to the need for daily recalibration. This instability, however, is a fundamental tension in derivatives modeling.

As we’ve seen, the Heston model is one the natural extensions to Black-Scholes via promoting volatility from a fixed constant to a mean-reverting stochastic process. It’s essentially a correlated, non-linear state-space model with a latent variance process, calibrated by non-linear least squares on option prices, and priced via Fourier inversion of a closed-form characteristic function.

It’s unfortunately also still an imperfect model, as its assumptions are still too well-behaved for real markets. Therefore, it’s still best to view it not as the final answer, but as the essential stepping stone between Black-Scholes and the richer world of jump-diffusion, rough volatility, and beyond.

Beyond Constant Volatility

Volatility as a Latent Process

Pricing via the Characteristic Function

Calibrating Heston

Comments

Leave a Reply Cancel reply

More Posts

What K-means Says about Stocks

Penalized Regression for Stock Returns

Continuous Latent States with Kalman Filters

HMMs for Volatility Regime-Switching

GARCH Sees What ARIMA Cannot

Can ARIMA Predict SPY Data?