by Jonathan Widarsa

How Black-Scholes Came to Be

·

he Black-Scholes model is inarguably one of the most important formulas to ever exist. Lots of people have seen it and memorized it, and some have applied it derivatives pricing to actually accumulate wealth. Personally, I’m deeply interested in how it came to be simply because fully understanding the system gives a level of intuition unlike anyone blatantly memorizing the formula can have.

So, unlike my other posts, I’ve decided to comprehensively show the derivation (abstracting non-trivial calculations) of the Black-Scholes formula, and only later explain the intuition of the model that many know.

***

How should we model a stock price StS_t over time? A naive approach would be to write

St=S0+μt+σWt,S_t = S_0 + \mu t + \sigma W_t,

where WtW_t is a standard Brownian motion (i.e., a continuous-time random walk with Wt𝒩(0,t)W_t \sim \mathcal{N}(0, t)). This is a simple linear model with a drift μ\mu and noise σ\sigma. But there’s an obvious problem: this model allows StS_t to go negative, which a stock price cannot.

How about amore sensible model which says that what’s random is not the absolute change in price, but the percentage change? This gives us the Geometric Brownian Motion (GBM), expressed as the stochastic differential equation (SDE),

dSt=μStdt+σStdWt.dS_t = \mu S_t \, dt + \sigma S_t \, dW_t.

This equation states that the infinitesimal change in price dStdS_t is proportional to the current price StS_t with a deterministic drift μ\mu and random noise σdWt\sigma dW_t. The fact that both terms are multiplied by StS_t is key—meaning a $1\$1 move when the stock is at $10\$10 is very different from a $1\$1 move when it’s at $1000\$1000.

Dividing both sides by StS_t, we get

dStSt=μdt+σdWt.\frac{dS_t}{S_t} = \mu \, dt + \sigma \, dW_t.

This becomes a random walk on returns, not on prices. The drift μ\mu now represents the expected annualized return, and σ\sigma is the volatility (i.e., the standard deviation of those returns). As we can see, this form defines GBM as simply a linear model applied to instantaneous returns, making it a lot less exotic than it first appears.

At this point, we understand that the SDE tells us how prices evolve infinitesimally. Next, let’s take a look at how we can determine the distribution of the price at some future time TT, STS_T. To do this, we need to “integrate” the SDE. But the catch is that we can’t use ordinary calculus because the dWtdW_t term has a stochastic coefficient σSt\sigma S_t, making the integral non-trivial. This is where Itô’s lemma, the chain rule of stochastic calculus, comes in.

For a smooth function f(St)=log(St)f(S_t) = \log(S_t), applying Itô’s lemma gives us

dlog(St)=1StdSt121St2(dSt)2.d\log(S_t) = \frac{1}{S_t} \, dS_t – \frac{1}{2} \cdot \frac{1}{S_t^2} \, (dS_t)^2.

Note the extra (dSt)2/2St2-(dS_t)^2 / 2S_t^2 term. This is the hallmark of stochastic calculus which has no counterpart in ordinary calculus and arises because Brownian increments satisfy (dWt)2=dt(dW_t)^2 = dt, not zero. If we substitute the GBM expression for dStdS_t, and using (dWt)2=dt(dW_t)^2 = dt, we can see that the second-order term contributes σ2dt/2-\sigma^2 dt /2. After simplification and grouping the dtdt (deterministic factor) and dWtdW_t (random factor) terms separately,

dlog(St)=(μσ22)dt+σdWt.d\log(S_t) = \left(\mu – \frac{\sigma^2}{2}\right) dt + \sigma \, dW_t.

Now, integrating the function from 00 to TT is straightforward (dWtdW_t has a constant coefficient σ\sigma). Since 0TdWt𝒩(0,T)\int_0^T dW_t \sim \mathcal{N}(0, T), we get

log(ST)𝒩(log(S0)+(μσ22)T,σ2T)\log(S_T) \sim \mathcal{N}\!\left(\log(S_0) + \left(\mu – \frac{\sigma^2}{2}\right)T,\; \sigma^2 T\right)

So far, we’ve proven that log-prices are normally distributed, and equivalently, stock prices are log-normally distributed. This means that stock prices can never go negative, and they have a right-skewed shape that looks qualitatively like what we actually observe in markets.

Black-Scholes PDE

We now know a great deal about the stock price path, specifically its evolution over time and distribution. Naturally, we ask: what is then a fair price for the derivative—a contract whose payoff depends on STS_T?

Let C(S,t)C(S, t) denote the price of an option as a function of the current stock price SS and time tt. Our goal is to pin down what functional form CC must take.

Before moving on, let’s talk about the principle of no arbitrage. In an efficient market, there should be no way to make a risk-free profit. Mathematically, this is enforced by requiring that discounted asset prices are martingales, which are processes with no predictable drift, i.e., “fair games” where the best forecast of the future is the present value. Intuitively, of course, if prices had a predictable drift after discounting, traders would exploit it until the drift disappeared.

Now, for C(St,t)C(S_t, t) to be consistent with the Markov property and to be a martingale under the risk-neutral measure (the probability measure that makes discounted prices martingales), applying Itô’s lemma to C(St,t)C(S_t, t) and requiring the drift term to equal the risk-free rate rr gives the Black-Scholes PDE:

Ct+rSCS+12σ2S22CS2=rC\frac{\partial C}{\partial t} + r S \, \frac{\partial C}{\partial S} + \frac{1}{2}\,\sigma^2 S^2 \,\frac{\partial^2 C}{\partial S^2} = r C

This is a constraint that any arbitrage-free price surface C(S,t)C(S, t) must satisfy. We can further decode each term using the language of the Greeks:

  • Θ=Ct\Theta = \frac{\partial C}{\partial t} : how the option price changes as time passes (Theta)
  • Δ=CS\Delta = \frac{\partial C}{\partial S} : sensitivity to the stock price (Delta)
  • Γ=2CS2\Gamma = \frac{\partial^2 C}{\partial S^2} : curvature of the price surface with respect to SS (Gamma)

Rearranging and substituting, the PDE becomes

Θ=r(CSΔ)12σ2S2Γ\Theta = r(C – S\Delta) – \frac{1}{2}\,\sigma^2 S^2 \Gamma

Based on this equation, the PDE says that the rate of time decay Θ\Theta must be exactly offset by the curvature Γ\Gamma (scaled by volatility and price level) plus a discounting adjustment. In other words, we conclude that time decay and curvature are two sides of the same coin.

Discounted Expected Future Payoff

As mentioned, the PDE gives the pricing constraint satisfied by any derivative on SS. Its solution for a particular contract is simply the discounted expected future payoff under the risk-neutral measure.

Consider a European call option: at time TT, the holder receives max(STK,0)\max(S_T – K, 0), where KK is the strike price. They earn if the stock finishes above KK, and nothing otherwise. Under the risk-neutral measure, the fair price of this contract is simply the discounted expected payoff:

C=erT𝔼[max(STK,0)].C = e^{-rT} \mathbb{E}^{\mathbb{Q}}\left[\max(S_T – K, 0)\right].

Since we know that log(ST)\log(S_T) is normally distributed, this expectation is a tractable integral over a log-normal distribution. Working through the integral, i.e., splitting max(STK,0)\max(S_T – K, 0) into two expectations and completing the square, yields the famous Black-Scholes formula:

C=S0Φ(d1)KerTΦ(d2),C = S_0 \Phi(d_1) – K e^{-rT} \Phi(d_2),

where Φ\Phi is the standard normal CDF and

d1=log(S0/K)+(r+σ2/2)TσT,d2=d1σT.d_1 = \frac{\log(S_0/K) + (r + \sigma^2/2)T}{\sigma\sqrt{T}}, \qquad d_2 = d_1 – \sigma\sqrt{T}.

Intuitively, Φ(d2)\Phi(d_2) is the risk-neutral probability that the option expires in-the-money (i.e., ST>KS_T > K), and Φ(d1)\Phi(d_1) is a similar probability adjusted for the stock’s expected growth.

Now here’s where things get statistical. Given a dataset of observed market prices CmarketC_{\text{market}}, we can treat the Black-Scholes formula as a parametric model:

Cmarket=f(S,K,T,r,σ)+εC_{\text{market}} = f(S, K, T, r, \sigma) + \varepsilon

where ff is the Black-Scholes formula and ε\varepsilon is the model error. Interestingly, all inputs except σ\sigma are directly observable, meaning Black-Scholes is actually a one-parameter model in practice, and the question immediately becomes: what is σ\sigma?

Volatility

As mentioned, volatility σ\sigma is the only unobservable input to the formula, and it’s by far also the most important one. Technically, Black-Scholes is more of a volatility quoting convention rather than a pricing model. Instead of quoting an option price in dollars, traders quote the value of σ\sigma that makes the formula match the observed price. We call this the implied volatility σ^\hat{\sigma}:

Cmarket=f(S,K,T,r,σ^)C_{\text{market}} = f(S, K, T, r, \hat{\sigma})

Solving this equation for σ^\hat{\sigma} is a simple one-dimensional root-finding problem (since ff is monotone in σ\sigma).

But there’s a twist: if Black-Scholes were perfect, then all options on the same underlying regardless of strike KK or maturity TT should imply the same σ^\hat{\sigma} because the model assumes a single constant volatility. Interestingly, in practice, σ^\hat{\sigma} isn’t constant. Instead, by plotting σ^\hat{\sigma} against strike KK, we can see a U-shaped or skewed curve known as the volatility smile.

What Is a Volatility Smile? -1
Volatility Smile Plot by Moomoo

The volatility smile in this plot reveals that the Black-Scholes model is misspecified. Specifically, it tells us that:

  • Deep out-of-the-money and in-the-money options are systematically mispriced by constant-volatility GBM.
  • Returns are not truly log-normal, i.e., real markets exhibit fat tails (crash risk) and skewness (downside moves are larger than upside moves), neither of which is captured by a simple normal distribution.

When we plot across multiple maturities TT, we get a full volatility surface σ^(K,T)\hat{\sigma}(K, T). This surface is a map of the model’s failures and serves as the starting point for more sophisticated models that allow volatility itself to be stochastic or to vary with the stock price level.

Model Calibration

Now consider a panel of observed market prices {Cmarket,i}\{C_{\text{market},i}\} for different strikes and maturities. How do we find the best σ\sigma (or, in a richer model, a full parameter vector 𝜽\boldsymbol{\theta})? This is called a calibration problem, which is fundamentally a statistical estimation problem. To solve this, let’s consider two methods: the MLE perspective vs. the Bayesian perspective.

From an MLE perspective, if we assume the pricing errors εi\varepsilon_i are i.i.d. Gaussian, maximizing the likelihood is equivalent to minimizing the sum of squared errors:

𝜽^MLE=argmin𝜽i=1n(Cmarket,ifi(𝜽))2.\hat{\boldsymbol{\theta}}_{\text{MLE}} = \arg\min_{\boldsymbol{\theta}} \sum_{i=1}^{n} \left(C_{\text{market},i} – f_i(\boldsymbol{\theta})\right)^2.

This is basically a non-linear least squares problem. Thankfully, since the Black-Scholes formula has a closed form, its gradient with respect to σ\sigma (i.e., Vega ν=Cσ\nu = \frac{\partial C}{\partial \sigma}) is also available analytically, making gradient-based optimization efficient.

From a Bayesian perspective, we start by having prior beliefs about volatility—perhaps from historical data, or from the belief that, say, σ\sigma should not jump dramatically overnight. The Bayesian approach then incorporates these beliefs directly:

Pr(𝜽|data)Pr(data|𝜽)Pr(𝜽).\Pr(\boldsymbol{\theta} \mid \text{data}) \propto \Pr(\text{data} \mid \boldsymbol{\theta}) \cdot \Pr(\boldsymbol{\theta}).

A prior Pr(𝜽)\Pr(\boldsymbol{\theta}) that concentrates around historically reasonable volatility levels acts as a regularizer, essentially preventing the calibrated surface from overfitting to noisy or illiquid option prices. This is especially valuable at the wings of the volatility surface, where market data is sparse and individual prices can be unreliable.

When we upscale to more complex models such as stochastic volatility models (e.g., Heston) or local volatility models, the parameter space 𝜽\boldsymbol{\theta} grows, and calibration evolves into a higher-dimensional optimization problem. However, the statistical framing remains exactly the same: we’re simply fitting a model to data, and we choose our estimator based on our assumptions about the error structure and our prior information.

From a statistical lens, we see that the same principles underlying regression, hypothesis testing, and Bayesian inference apply just as naturally to derivative pricing. The takeaway here isn’t that Black-Scholes is wrong (though it is, strictly speaking), but that it’s a good starting point, whose diagnostics can guide us toward more realistic models better suited to the data.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *


More Posts