The irony of the random variable (r.v.) is that although it takes on an “unpredictable” value every time, it’s not exactly random if we understand the shape of its distribution. This is why descriptive statistics matters a lot—they define the boundaries of the set of values an r.v. can take, otherwise known as, again, the shape of its distribution.
There isn’t just one descriptive statistics, so it sure would be nice if we had a way to systematically identify these instead of applying different formulas for each one. Well. Actually, there is. It’s called moments.
***
Before we even delve into moments and all other cool stuff, a quick disclaimer: not every distribution has moments.
Given an r.v. from a distribution with an existing moment. Then, the -th moment is
for any positive integer . Additionally, if has a mean and standard deviation , then the -th central moment is
and the -th standardized moment is
With the definitions above, we’re fit to derive four important descriptive statistical terms—mean, variance, skewness, and kurtosis—to describe the distribution of . As it turns out, the mean is the first moment,
variance is the second central moment,
skewness is the third standardized moment,
and excess kurtosis is the fourth standardized moment,
The expression for kurtosis is understandably that of excess kurtosis minus three.
So far, everything has been pretty straightforward. Using moments and functions of moments, we can provide powerful summaries of a distribution. However, as we consider higher-order moments, the formula becomes increasingly complicated. We therefore shift our attention to the moment generating function (MGF). What’s really neat about the MGF is that it’s a single tool that encodes all of these moments at once. Mathematically, the MGF of an r.v. is
where is a bookkeeping variable we use that equals zero when used to actually calculate the -th moment of interest. If we expand using Taylor series, we get
From this, we can easily see that the -th moment of can be derived by evaluating the -th derivative of the MGF, substituting .1 Mathematically,
To convince you why this is elegant, let’s consider an r.v. Using LOTUS, we can obtain its MGF as
Then, the first moment (mean) is simply
Now, its second moment is
and hence the second central moment (variance) is
For higher moments, where summations and integrals get messy, calculating descriptive statistical terms using MGFs remain simple.
So far, we’ve taken the MGFs for granted. Again, not all distributions have moments. To be more specific, if is infinite, then the MGF for that distribution doesn’t exist. For example, heavy-tailed distributions such as Cauchy and log-normal don’t have finite moments, so their MGFs don’t exist.
Fortunately, there’s actually a more general tool than MGF that always exists for any probability distribution: the characteristic function (CF). Defined as
the CF plays a very similar role to the MGF but with one key difference; it uses a complex exponential. The awe-striking generality of the CF is that if moments exist, then its derivatives at also generates them, just like MGFs. And even if moments don’t exist, the CF still encodes the distribution at a one-to-one correspondence, meaning knowing for all is equivalent to knowing the distribution of .
Since CFs are generalized MGFs, technically, there won’t be any need for MGFs anymore. However, it’s worth considering that using MGFs is neater for distributions where moments exist just to save the ourselves from the sorrow-inducing complex operations.

- Although we evaluate the MGF at to extract moments, the function must be differentiable within a small open interval around . ↩︎

Leave a Reply