You are presented with the following statement:

University X accepts:
– 75% of all women who apply
– 69% of all men who apply

s
Based on the following statement, is it fair to say that University X is not biased against women? Do you think the answer is obvious? (it isn’t)

Allow me to introduce you to an elegant statistical paradox!

Part 1: More Data

Let’s put aside sociological factors, and examine bias from a purely statistical point of view.

At face value, it might seem like the university above is not biased against women. In fact, you might even veer in the direction of thinking that the university is biased against men.

I’d now like to deepen the data, and present you with what I hope will be a truly trippy set of numbers.

	Men	Women
Law	18/20 = 90%	80/100 = 80%
Mathematics	12/20 = 60%	2/5 = 40%
Engineering	15/25 = 60%	4/10 = 40%
Overall	45/65 = 69%	86/115 ~ 75%

This is a hypothetical set of data, but I think it does a very nice job of illustrating the point I’m trying to make.

The university accepts 75% of the women who apply, compared to only 69% of the men.
However, every single department in the university clearly accepts a higher proportion of men than women, and quite clearly at that (the differences are 10%, 20% and 20% respectively).

How on earth can it be that men ‘defeat’ women in every single category, but still lose out overall?

We’ve got ourselves into a nice little tangle, haven’t we?

So now what?

Part 2: The Paradox

The data above fall victim to what is known in statistics and probability as Simpson’s Paradox, or the Yule-Simpson effect (I somewhat modeled the data on the infamous UC Berkeley example). It describes a situation where the groups of data demonstrate one trend/conclusion separately, but display the complete opposite when combined.

To put in the context of the data I gave you above:
– When examined separately, each department accepts a higher % of men than women.
– When the numbers for all departments are combined, the university as a whole accepts a higher % of women than men.

I’d like to delve into the data to explain why the paradox occurs, and show you what you might have expected to see.

Before anything else, I’ll explain what the term ‘weightage’ means, since I’ll be using it a few times in this piece.
In statistics, it’s used to signify the relative importance of a category of data.
How do we assign a weightage?
By calculating how big that category is with respect to the entire population.
s
A quick example:
From the above data, Law has a very high weightage for women.
The total number of women who applied to university was 115.
Of these 115 women, 100 of them applied to study Law.
Therefore, the weightage of Law for women is 100/115.
Simple! Now, let’s get into the logic.
s
The first thing to note is that the number of men applying to each course was roughly the same, whereas a majority of the women applied to Law, rather than Mathematics or Engineering.
s
The second thing to note is that the acceptance rates for Law, as a whole, are higher than the acceptance rates for Mathematics and Engineering. (No offense to any lawyers reading this; I’m not claiming Law to be an easier course).
s
When these two bits of data are put together, we have an explanation for what caused the paradox: A high proportion of women applied to the course with a high acceptance rate (Law).
s
In fact, if you break down female applications:
– 100/115 = 87% applied for Law
– 5/115 = 4% applied for Mathematics
– 10/115 = 9% applied for Engineering.
This meant that the overall statistic for women was skewed towards the acceptance rate for Law, since that category held the highest weightage. Note that the overall acceptance rate for women (75%) was very close to the acceptance rate for women in Law (80%).
s
On the other hand, the overall acceptance rate for men (69%) was roughly equal to the average of the three acceptance rates (the average of 90,60,60 is 70%).
If you break down male applications:
– 20/65 = 31% applied for Law
– 20/65 = 31% applied for Mathematics
– 25/65 = 38% applied for Engineering.
All 3 categories had roughly the same weightage, so the high acceptance rate for Law wasn’t able to ‘rescue’ the lower rates for Mathematics and Engineering.
s
Summing up, the higher overall acceptance rate for women was caused by the fact that a large proportion of women applied to the course with a higher acceptance rate.
s
For those of you who want a neat mathematical way to sum this up, here it is:
$\dfrac{a}{b} > \dfrac{c}{d} \text{ and } \dfrac{e}{f} > \dfrac{g}{h} \text{ does not imply } \dfrac{a+e}{b+f} > \dfrac{c + g}{d + h}$
s

Still can’t digest it? Feeling slightly uncomfortable? Intellectually short-circuited?

“Yeah yeah, what he’s saying makes mathematical sense, but I still don’t get how the university can accept more women than men, if every department accepts more men!”, you might be thinking.

I getcha. The first time I was introduced to this paradox (shoutout to GWJC), my mind just refused to comprehend the data I was looking at.
Therefore, I thought it might be helpful if I presented the data in a way that gave you a more ‘acceptable’ conclusion.

The acceptance rates for men are 90%, 60%, and 60% respectively.
The average of these three numbers is 70%.
The acceptance rates for women are 80%, 40%, and 40% respectively.
The average of these three numbers is 53%.

This is what you might have expected to see!
Men: overall acceptance rate of 70%.
Women: overall acceptance rate of 53%.
Conclusion: Men higher in each department, and men higher overall.

That leaves us feeling nice and comfortable, with no paradox.

The reason for our inclination to analyse the data in this manner is quite simple.
It isn’t intuitive to factor in things like weightage and relative importance into one’s perception of data. The average person isn’t naturally inclined in that direction.
Of course, if you’ve been exposed to this kind of problem or thinking before, you might have done so; I’m simply pointing out that it’s not the immediately obvious option.

The key takeaway, as the paradox shows us, is that numbers are nasty little things that, at the hands of a skilled manipulator, can even tell two completely contradicting stories.

Part 3: Application

As with all statistics, the importance of this paradox lies in how we use it to better interpret and analyse the world around us.
At the beginning of this article, I asked you if it was acceptable to say that the university in question here was NOT biased against women.

Now that we have a deeper set of data to work with and an understanding of the paradox, the question becomes: Which numbers should we use?
Should we use the overall figure, or should we break it down?

Here’s my opinion:
We should use the detailed data, and ignore the overall conclusion.
Why?
When you apply to the university, you are not just a man or a woman (~~you are a warrior~~).
You are a man or woman applying to a specific course.
Therefore, the overall picture is completely irrelevant to you.
All that should matter to you in predicting your chances of acceptance is the acceptance rate for men/women in your course.

Feel free to disagree with me on this one! If you think the overall rate is more important, and can justify your position, let me know why! I’ll be waiting eagerly.

However, there’s a far more important point to be made where the application of Simpson’s paradox is concerned.

Sometimes, it’s not so clear that we should use the specific figure rather than the overall one. To illustrate this ambiguity, I’m going to present you with another set of data.

Consider the effectiveness two treatments for a rare disease, Economitis, which causes economists to falsely believe that they are correct:

s	Treatment A	Treatment B
People with brown eyes	45/50 = 90%	72/90 = 80%
People with blue eyes	20/50 = 40%	3/10 = 30%
Overall	65/100 = 65%	75/100 = 75%

As before, the paradox presents itself.
Both treatments for Economitis apparently work far better on people with brown eyes than on people with blue eyes.
This allowed Simpson’s paradox to manifest itself, due to the relatively high weightage of Brown-Eyes among those who were given Treatment B.
By comparison, the weightages for Brown-Eyes and Blue-Eyes are equal in the sample for Treatment A (half each).
s
If I used the same reasoning as before, the conclusion might be that we should ignore the overall conclusion, and instead focus on the data for the specific categories.
This would lead us to believe that regardless of whether you have brown eyes or blue eyes, you should opt for Treatment A.
s
Wait a second! Are we sure?
The table above partitions the data based on the eye colour of patients.
But what if eye-colour actually has nothing to with the effectiveness of treatment? If that were the case, then the data have been intentionally presented in this manner by an evil statistician to trick you into believing that Treatment A is more effective, when in fact, breaking down the data based on eye colour is meaningless.In statistics, we’d call this correlation but not causation.
Having brown eyes correlates with being successfully treated for Economitis, but does not cause it!
s
If this were the case, then given this set of data, it might be wise to ignore the success rates for the specific categories, and instead focus on the overall rate of success. This would lead us to choose Treatment B, rather than Treatment A.
s
The point I’m trying to hammer home here is that the manner in which we break down our data matters, and one should pay attention to it.
In our earlier example, it was reasonable and logical to break down university admissions based on the course people were applying to it.
In this example, it makes no sense to break down treatment success based on eye colour.
s
Therefore, if one isn’t cautious and rigorous about the manner in which data is handled, the data can end up being broken down and presented in all sorts of arbitrary ways, which can very easily mislead someone who doesn’t have all the knowledge.
s
In the case of treatment for Economitis, someone devious with a vested interest could have fooled unwitting patients into choosing Treatment A based on their eye colour, when they should have chosen Treatment B instead. Judea Pearl, the famous computer scientist and philosopher, wrote a book in which he postulated this to be the real conundrum underlying Simpson’s paradox: Knowing when to accept the paradox, and when to reject it.

Wrapping Up

There is that famous quote: “There are three kinds of lies: lies, damned lies, and statistics.”

For better or worse (I say for better), statistics are a reality of our world. Economists have to be comfortable sinking their teeth into data and teasing out the truth it hides.

The more you know, the less likely you are to be fooled by statistical quagmires like Simpson’s Paradox.

Nevertheless, caution and prudence are always a good idea. Otherwise, you might end up suffering from Economitis…