by Jonathan Widarsa

A Tale of Gender Bias from Berkeley

June 21, 2025

Jonathan Widarsa

During my early days of learning statistics, I encountered a pretty interesting phenomenon while reading a (relatively) ancient article. The story goes like this:

In the fall of 1973, a study on gender bias among graduate school admissions to University of California, Berkeley made headlines. The reason for this was that the admission figures showed that out of almost 13,000 applicants, the men had a 44% chance of being admitted while the women only had a 35% chance. This stark difference sent a clear message to the masses—Berkeley was simply discriminating women and prioritized education for men.

Or was it?

To save Berkeley’s reputation, let’s understand the situation in math-speak. First, we define several terms as follows:

$A$ = applicant is admitted
$M$ = applicant is male
$W$ = applicant is female
$D_i$ = applicant applied to department $i$

Based on the aggregated data, we observe that the overall admission probability for women is

P(A \mid W) = 0.35

and similarly for men,

P(A \mid M) = 0.44.

Given this, it’s obvious that

P(A \mid W) \lt P(A \mid M)

which does suggest bias.

Before we destroy this argument, it’s helpful to introduce the concept behind it: Simpson’s Paradox. This phenomenon occurs when a trend observed in several separate groups of data disappears or reverses when those groups merge together. This idea is largely applicable in our situation because our previously defined conditional probabilities ( $P(A \mid W)$ and $P(A \mid M)$ ) are already joint over all departments the applicants are applying to. These two probabilities are essentially the “when those groups merge together” in the definition of the phenomenon.

To obtain the probabilities for each department that makes up the full cohort, we apply the law of total probability. First, for the women,

P(A \mid W) = \sum_i P(A \mid W, D_i) \; P(D_i \mid W),

and for the men,

P(A \mid M) = \sum_i P(A \mid M, D_i) \; P(D_i \mid M),

where the first probability in each sum explains the “trend observed in several separate groups of data.” Interestingly, based on what the study found using chi-square tests, it turned out that out of 85 departments, only four of them were significantly biased against women and six of them were significantly biased against men. The remaining 75 departments were not biased, but how could it be?

To fully understand the reason, we take a look at the second probabilities, $P(D_i, W)$ and $P(D_i, M)$ , in each sum. Each of these corresponds to the probability that a woman (man) applied to department $i$ respectively. As is observed from the law of total probability, each probability of admission given the gender and department of choice $P(A \mid \text{gender}, D_i)$ is weighted by the probability that the gender applied to that specific department $P(D_i \mid \text{gender})$ before summing across all the departments.

Acknowledging the weights is crucial, because neglecting it is precisely why many statisticians arrive at erroneous conclusions about events. In the case of this study, the researchers observed that:

Departments with high acceptance rates had low $P(D_i \mid W)$ (few women applied)
Departments with low acceptance rates had high $P(D_i \mid W)$ (many women applied)

In other words, the larger weights were attached to low $P(A \mid W, D_i)$ terms, which essentially pulled down the weighted average $P(A \mid W)$ even though within the departments women often did as well as or better than men.

For a closing statement to this interesting phenomenon, always be aware of concluding from aggregate statistics, which can be misleading if necessary conditioning variables are omitted.

Jonathan Widarsa

A Tale of Gender Bias from Berkeley

Comments

Leave a Reply Cancel reply

More Posts

What K-means Says about Stocks

Penalized Regression for Stock Returns

Continuous Latent States with Kalman Filters

HMMs for Volatility Regime-Switching

GARCH Sees What ARIMA Cannot

Can ARIMA Predict SPY Data?