No Distribution Indescribable

The irony of the random variable (r.v.) is that although it takes on an “unpredictable” value every time, it’s not exactly random if we understand the shape of its distribution. This is why descriptive statistics matters a lot—they define the boundaries of the set of values an r.v. can take, otherwise known as, again, the shape of its distribution.

There isn’t just one descriptive statistics, so it sure would be nice if we had a way to systematically identify these instead of applying different formulas for each one. Well. Actually, there is. It’s called moments.

***

Before we even delve into moments and all other cool stuff, a quick disclaimer: not every distribution has moments.

Given an r.v. $X$ from a distribution with an existing moment. Then, the $n$ -th moment is

\mathbb{E}\left[X^n\right]

for any positive integer $n$ . Additionally, if $X$ has a mean $\mu$ and standard deviation $\sigma$ , then the $n$ -th central moment is

\mathbb{E}\left[\left(X-\mu\right)^n\right]

and the $n$ -th standardized moment is

\mathbb{E}\left[\left(\frac{X-\mu}{\sigma}\right)^2\right].

With the definitions above, we’re fit to derive four important descriptive statistical terms—mean, variance, skewness, and kurtosis—to describe the distribution of $X$ . As it turns out, the mean is the first moment,

\mathbb{E}\left[X\right],

variance is the second central moment,

\mathbb{E}\left[\left(X-\mu\right)^2\right],

skewness is the third standardized moment,

\mathbb{E}\left[\left(\frac{X-\mu}{\sigma}\right)^3\right],

and excess kurtosis is the fourth standardized moment,

\mathbb{E}\left[\left(\frac{X-\mu}{\sigma}\right)^4\right].

The expression for kurtosis is understandably that of excess kurtosis minus three.

So far, everything has been pretty straightforward. Using moments and functions of moments, we can provide powerful summaries of a distribution. However, as we consider higher-order moments, the formula becomes increasingly complicated. We therefore shift our attention to the moment generating function (MGF). What’s really neat about the MGF is that it’s a single tool that encodes all of these moments at once. Mathematically, the MGF of an r.v. $X$ is

M_X(t) = \mathbb{E}\left[e^{tX}\right],

where $t$ is a bookkeeping variable we use that equals zero when used to actually calculate the $n$ -th moment of interest. If we expand $e^{tX}$ using Taylor series, we get

M_X(t) = \sum_{n=0}^{\infty} \mathbb{E}\left[X^n\right] \frac{t^n}{n!}.

From this, we can easily see that the $n$ -th moment of $X$ can be derived by evaluating the $n$ -th derivative of the MGF, substituting $t=0$ .¹ Mathematically,

M_X^{(n)}(0) = \mathbb{E}\left[X^n\right].

To convince you why this is elegant, let’s consider an r.v. $Y \sim \text{Pois}(\lambda).$ Using LOTUS, we can obtain its MGF as

M_Y(t) = \sum_{k=0}^{\infty} e^{tk} \cdot \frac{e^{-\lambda}\lambda^k}{k!} = \exp\left(\lambda\left(e^t-1\right)\right).

Then, the first moment (mean) is simply

M_Y\prime(0) = \lambda e^0 \cdot \exp\left(\lambda\left(e^0-1\right)\right) = \lambda.

Now, its second moment is

M_Y\prime\prime(0) = \lambda e^0 \cdot \exp\left(\lambda\left(e^0-1\right)\right) + \lambda e^0 \cdot \lambda e^0 \cdot \exp\left(\lambda\left(e^0-1\right)\right) = \lambda + \lambda^2,

and hence the second central moment (variance) is

\text{Var}(X) = M_Y\prime\prime(0) – \left[M_Y\prime(0)\right]^2 = \lambda

For higher moments, where summations and integrals get messy, calculating descriptive statistical terms using MGFs remain simple.

So far, we’ve taken the MGFs for granted. Again, not all distributions have moments. To be more specific, if $\mathbb{E}\left[e^{tX}\right]$ is infinite, then the MGF for that distribution doesn’t exist. For example, heavy-tailed distributions such as Cauchy and log-normal don’t have finite moments, so their MGFs don’t exist.

Fortunately, there’s actually a more general tool than MGF that always exists for any probability distribution: the characteristic function (CF). Defined as

\phi_X(t) = \mathbb{E}\left[e^{itX}\right],

the CF plays a very similar role to the MGF but with one key difference; it uses a complex exponential. The awe-striking generality of the CF is that if moments exist, then its derivatives at $t=0$ also generates them, just like MGFs. And even if moments don’t exist, the CF still encodes the distribution at a one-to-one correspondence, meaning knowing $\phi_X(t)$ for all $t$ is equivalent to knowing the distribution of $X$ .

Since CFs are generalized MGFs, technically, there won’t be any need for MGFs anymore. However, it’s worth considering that using MGFs is neater for distributions where moments exist just to save the ourselves from the sorrow-inducing complex operations.

Although we evaluate the MGF at $t=0$ to extract moments, the function must be differentiable within a small open interval $(-a,a)$ around $t$ . ↩︎

No Distribution Indescribable

Comments

Leave a Reply Cancel reply

More Posts

What K-means Says about Stocks

Penalized Regression for Stock Returns

Continuous Latent States with Kalman Filters

HMMs for Volatility Regime-Switching

GARCH Sees What ARIMA Cannot

Can ARIMA Predict SPY Data?