How Black-Scholes Came to Be

he Black-Scholes model is inarguably one of the most important formulas to ever exist. Lots of people have seen it and memorized it, and some have applied it derivatives pricing to actually accumulate wealth. Personally, I’m deeply interested in how it came to be simply because fully understanding the system gives a level of intuition unlike anyone blatantly memorizing the formula can have.

So, unlike my other posts, I’ve decided to comprehensively show the derivation (abstracting non-trivial calculations) of the Black-Scholes formula, and only later explain the intuition of the model that many know.

***

How should we model a stock price $S_t$ over time? A naive approach would be to write

S_t = S_0 + \mu t + \sigma W_t,

where $W_t$ is a standard Brownian motion (i.e., a continuous-time random walk with $W_t \sim \mathcal{N}(0, t)$ ). This is a simple linear model with a drift $\mu$ and noise $\sigma$ . But there’s an obvious problem: this model allows $S_t$ to go negative, which a stock price cannot.

How about amore sensible model which says that what’s random is not the absolute change in price, but the percentage change? This gives us the Geometric Brownian Motion (GBM), expressed as the stochastic differential equation (SDE),

dS_t = \mu S_t \, dt + \sigma S_t \, dW_t.

This equation states that the infinitesimal change in price $dS_t$ is proportional to the current price $S_t$ with a deterministic drift $\mu$ and random noise $\sigma dW_t$ . The fact that both terms are multiplied by $S_t$ is key—meaning a $\$1$ move when the stock is at $\$10$ is very different from a $\$1$ move when it’s at $\$1000$ .

Dividing both sides by $S_t$ , we get

\frac{dS_t}{S_t} = \mu \, dt + \sigma \, dW_t.

This becomes a random walk on returns, not on prices. The drift $\mu$ now represents the expected annualized return, and $\sigma$ is the volatility (i.e., the standard deviation of those returns). As we can see, this form defines GBM as simply a linear model applied to instantaneous returns, making it a lot less exotic than it first appears.

At this point, we understand that the SDE tells us how prices evolve infinitesimally. Next, let’s take a look at how we can determine the distribution of the price at some future time $T$ , $S_T$ . To do this, we need to “integrate” the SDE. But the catch is that we can’t use ordinary calculus because the $dW_t$ term has a stochastic coefficient $\sigma S_t$ , making the integral non-trivial. This is where Itô’s lemma, the chain rule of stochastic calculus, comes in.

For a smooth function $f(S_t) = \log(S_t)$ , applying Itô’s lemma gives us

d\log(S_t) = \frac{1}{S_t} \, dS_t – \frac{1}{2} \cdot \frac{1}{S_t^2} \, (dS_t)^2.

Note the extra $-(dS_t)^2 / 2S_t^2$ term. This is the hallmark of stochastic calculus which has no counterpart in ordinary calculus and arises because Brownian increments satisfy $(dW_t)^2 = dt$ , not zero. If we substitute the GBM expression for $dS_t$ , and using $(dW_t)^2 = dt$ , we can see that the second-order term contributes $-\sigma^2 dt /2$ . After simplification and grouping the $dt$ (deterministic factor) and $dW_t$ (random factor) terms separately,

d\log(S_t) = \left(\mu – \frac{\sigma^2}{2}\right) dt + \sigma \, dW_t.

Now, integrating the function from $0$ to $T$ is straightforward ( $dW_t$ has a constant coefficient $\sigma$ ). Since $\int_0^T dW_t \sim \mathcal{N}(0, T)$ , we get

\log(S_T) \sim \mathcal{N}\!\left(\log(S_0) + \left(\mu – \frac{\sigma^2}{2}\right)T,\; \sigma^2 T\right)

So far, we’ve proven that log-prices are normally distributed, and equivalently, stock prices are log-normally distributed. This means that stock prices can never go negative, and they have a right-skewed shape that looks qualitatively like what we actually observe in markets.

Black-Scholes PDE

We now know a great deal about the stock price path, specifically its evolution over time and distribution. Naturally, we ask: what is then a fair price for the derivative—a contract whose payoff depends on $S_T$ ?

Let $C(S, t)$ denote the price of an option as a function of the current stock price $S$ and time $t$ . Our goal is to pin down what functional form $C$ must take.

Before moving on, let’s talk about the principle of no arbitrage. In an efficient market, there should be no way to make a risk-free profit. Mathematically, this is enforced by requiring that discounted asset prices are martingales, which are processes with no predictable drift, i.e., “fair games” where the best forecast of the future is the present value. Intuitively, of course, if prices had a predictable drift after discounting, traders would exploit it until the drift disappeared.

Now, for $C(S_t, t)$ to be consistent with the Markov property and to be a martingale under the risk-neutral measure (the probability measure that makes discounted prices martingales), applying Itô’s lemma to $C(S_t, t)$ and requiring the drift term to equal the risk-free rate $r$ gives the Black-Scholes PDE:

\frac{\partial C}{\partial t} + r S \, \frac{\partial C}{\partial S} + \frac{1}{2}\,\sigma^2 S^2 \,\frac{\partial^2 C}{\partial S^2} = r C

This is a constraint that any arbitrage-free price surface $C(S, t)$ must satisfy. We can further decode each term using the language of the Greeks:

$\Theta = \frac{\partial C}{\partial t}$ : how the option price changes as time passes (Theta)
$\Delta = \frac{\partial C}{\partial S}$ : sensitivity to the stock price (Delta)
$\Gamma = \frac{\partial^2 C}{\partial S^2}$ : curvature of the price surface with respect to $S$ (Gamma)

Rearranging and substituting, the PDE becomes

\Theta = r(C – S\Delta) – \frac{1}{2}\,\sigma^2 S^2 \Gamma

Based on this equation, the PDE says that the rate of time decay $\Theta$ must be exactly offset by the curvature $\Gamma$ (scaled by volatility and price level) plus a discounting adjustment. In other words, we conclude that time decay and curvature are two sides of the same coin.

Discounted Expected Future Payoff

As mentioned, the PDE gives the pricing constraint satisfied by any derivative on $S$ . Its solution for a particular contract is simply the discounted expected future payoff under the risk-neutral measure.

Consider a European call option: at time $T$ , the holder receives $\max(S_T – K, 0)$ , where $K$ is the strike price. They earn if the stock finishes above $K$ , and nothing otherwise. Under the risk-neutral measure, the fair price of this contract is simply the discounted expected payoff:

C = e^{-rT} \mathbb{E}^{\mathbb{Q}}\left[\max(S_T – K, 0)\right].

Since we know that $\log(S_T)$ is normally distributed, this expectation is a tractable integral over a log-normal distribution. Working through the integral, i.e., splitting $\max(S_T – K, 0)$ into two expectations and completing the square, yields the famous Black-Scholes formula:

C = S_0 \Phi(d_1) – K e^{-rT} \Phi(d_2),

where $\Phi$ is the standard normal CDF and

d_1 = \frac{\log(S_0/K) + (r + \sigma^2/2)T}{\sigma\sqrt{T}}, \qquad d_2 = d_1 – \sigma\sqrt{T}.

Intuitively, $\Phi(d_2)$ is the risk-neutral probability that the option expires in-the-money (i.e., $S_T > K$ ), and $\Phi(d_1)$ is a similar probability adjusted for the stock’s expected growth.

Now here’s where things get statistical. Given a dataset of observed market prices $C_{\text{market}}$ , we can treat the Black-Scholes formula as a parametric model:

C_{\text{market}} = f(S, K, T, r, \sigma) + \varepsilon

where $f$ is the Black-Scholes formula and $\varepsilon$ is the model error. Interestingly, all inputs except $\sigma$ are directly observable, meaning Black-Scholes is actually a one-parameter model in practice, and the question immediately becomes: what is $\sigma$ ?

Volatility

As mentioned, volatility $\sigma$ is the only unobservable input to the formula, and it’s by far also the most important one. Technically, Black-Scholes is more of a volatility quoting convention rather than a pricing model. Instead of quoting an option price in dollars, traders quote the value of $\sigma$ that makes the formula match the observed price. We call this the implied volatility $\hat{\sigma}$ :

C_{\text{market}} = f(S, K, T, r, \hat{\sigma})

Solving this equation for $\hat{\sigma}$ is a simple one-dimensional root-finding problem (since $f$ is monotone in $\sigma$ ).

But there’s a twist: if Black-Scholes were perfect, then all options on the same underlying regardless of strike $K$ or maturity $T$ should imply the same $\hat{\sigma}$ because the model assumes a single constant volatility. Interestingly, in practice, $\hat{\sigma}$ isn’t constant. Instead, by plotting $\hat{\sigma}$ against strike $K$ , we can see a U-shaped or skewed curve known as the volatility smile.

What Is a Volatility Smile? -1 — Volatility Smile Plot by Moomoo

The volatility smile in this plot reveals that the Black-Scholes model is misspecified. Specifically, it tells us that:

Deep out-of-the-money and in-the-money options are systematically mispriced by constant-volatility GBM.
Returns are not truly log-normal, i.e., real markets exhibit fat tails (crash risk) and skewness (downside moves are larger than upside moves), neither of which is captured by a simple normal distribution.

When we plot across multiple maturities $T$ , we get a full volatility surface $\hat{\sigma}(K, T)$ . This surface is a map of the model’s failures and serves as the starting point for more sophisticated models that allow volatility itself to be stochastic or to vary with the stock price level.

Model Calibration

Now consider a panel of observed market prices $\{C_{\text{market},i}\}$ for different strikes and maturities. How do we find the best $\sigma$ (or, in a richer model, a full parameter vector $\boldsymbol{\theta}$ )? This is called a calibration problem, which is fundamentally a statistical estimation problem. To solve this, let’s consider two methods: the MLE perspective vs. the Bayesian perspective.

From an MLE perspective, if we assume the pricing errors $\varepsilon_i$ are i.i.d. Gaussian, maximizing the likelihood is equivalent to minimizing the sum of squared errors:

\hat{\boldsymbol{\theta}}_{\text{MLE}} = \arg\min_{\boldsymbol{\theta}} \sum_{i=1}^{n} \left(C_{\text{market},i} – f_i(\boldsymbol{\theta})\right)^2.

This is basically a non-linear least squares problem. Thankfully, since the Black-Scholes formula has a closed form, its gradient with respect to $\sigma$ (i.e., Vega $\nu = \frac{\partial C}{\partial \sigma}$ ) is also available analytically, making gradient-based optimization efficient.

From a Bayesian perspective, we start by having prior beliefs about volatility—perhaps from historical data, or from the belief that, say, $\sigma$ should not jump dramatically overnight. The Bayesian approach then incorporates these beliefs directly:

\Pr(\boldsymbol{\theta} \mid \text{data}) \propto \Pr(\text{data} \mid \boldsymbol{\theta}) \cdot \Pr(\boldsymbol{\theta}).

A prior $\Pr(\boldsymbol{\theta})$ that concentrates around historically reasonable volatility levels acts as a regularizer, essentially preventing the calibrated surface from overfitting to noisy or illiquid option prices. This is especially valuable at the wings of the volatility surface, where market data is sparse and individual prices can be unreliable.

When we upscale to more complex models such as stochastic volatility models (e.g., Heston) or local volatility models, the parameter space $\boldsymbol{\theta}$ grows, and calibration evolves into a higher-dimensional optimization problem. However, the statistical framing remains exactly the same: we’re simply fitting a model to data, and we choose our estimator based on our assumptions about the error structure and our prior information.

From a statistical lens, we see that the same principles underlying regression, hypothesis testing, and Bayesian inference apply just as naturally to derivative pricing. The takeaway here isn’t that Black-Scholes is wrong (though it is, strictly speaking), but that it’s a good starting point, whose diagnostics can guide us toward more realistic models better suited to the data.

How Black-Scholes Came to Be

Black-Scholes PDE

Discounted Expected Future Payoff

Volatility

Model Calibration

Comments

Leave a Reply Cancel reply

More Posts

What K-means Says about Stocks

Penalized Regression for Stock Returns

Continuous Latent States with Kalman Filters

HMMs for Volatility Regime-Switching

GARCH Sees What ARIMA Cannot

Can ARIMA Predict SPY Data?