Survivorship Bias

Table of Contents

A story from WWII

During World War II, Abraham Wald was tasked with finding a way to decrease the number of losses to US aircraft. Based on his findings the military (the US Airforce didn’t exist then) would improve the structural make-up of aircraft and possibly change tact against the Axis powers.

95,000 American airplanes were destroyed in WWII so you can understand why this was a problem that needed to be fixed.

The natural solution is to add more armour to the airplanes. This will decrease the damage that the planes take during missions. The problem with adding more armour is that it increases the weight of the aircraft. Weight decreases the total distance that an aircraft can travel, decreasing it’s operational range.

Airplanes returning from a mission were consistently damaged in the same areas. The above image is a representation of what that data may have looked like. The red dots represent where airplanes had taken the most damage.

Based on this data, the US Military concluded that they needed to put more armour where more damage was taken, where the red dots were.

Wald heavily opposed this idea saying “Gentlemen, you need to put more armour-plate where the holes aren’t because that’s where the holes were on the airplanes that didn’t return.”

This is is what is called Survivorship bias. You see, the US Military had assumed that they needed to put more armour where there was the most bullet holes. What they were forgetting is that the holes that existed in the returning aircraft were probably not fatal. Furthermore, the aircraft that did receive fatal shots probably didn’t return and therefor weren’t apart of the final data sample.

So What?

So what does airplanes in WWII have to do with our lives today?

We live in a world where you can track almost any type of data. There is a famous saying “Hindsight is 20/20”. In retrospect data is 20/20 vision, however, it’s exactly that. Retrospective. Just because something has happened doesn’t mean it will happen again or, that it’s accurate.

Often times the data that is missing is the most important data.

Google analytics allows you to track just about any statistic on a website. Here are some of the statistics you can track on Google Analytics:

  • Users (Total People that visited a website)
  • Bounce Rate (How relevant a page was to someone)
  • Page Views (How many pages a person viewed on average)
  • Page View Time (How long someone viewed a page on average)
  • Views per page
  • Acquisition source (Where people came from before they landed up on a page)

And the list goes on, but what Google Analytics doesn’t tell you is how many people didn’t view your website.

During COVID-19 thousands of people who traditionally worked from an office or classroom had to take to the internet. Companies were making statements along the lines of: “Our online survey showed that all of the students/employees had the required hardware and internet bandwidth to work from home”. Again the issue with this is that it obviously doesn’t account for those that didn’t have the necessary hardware and bandwidth. This is Survivorship bias.

Just because the data exists, ask yourself: “Is this a fair sample of the total whole?”

Leave a Comment