It’s time to continue our discourse about Statistical Bias Types. This is part 2 – if you missed part 1, read it here: Statistical Bias Types part 1. In the previous article I have introduced 5 ways (not) to get biased during the data collection/sampling phase of your researches. Now I’ll focus on what can (but shouldn’t) go wrong during the analysis and the presentation part.
Statistical Bias #6: Omitted Variable Bias
Omitted Variable Bias occurs, when you are leaving out one or more important variables from your model. This issue comes up especially often regarding Predictive Analytics.
Everyday example of Omitted Variable Bias:
Imagine a grocery store. You are finished with shopping and you want to pay. There are 3 lines and you want to pick the one, where you have to spend the least time. So you are checking, which one is the shortest and queuing up. Murphy’s Law: the other line is going much faster. Your prediction failed – maybe because you have omitted an important variable, namely how packed the carts were in the different lines. This mistake caused you 5 more minutes in the line…
Online Analytics Related example of Omitted Variable Bias:
In real life data projects you can lose much more than 5 minutes with wrong predictions. Here’s an example:
It’s quite common, that online businesses want to predict the possible churns of their users, so they can act beforehand. Let’s say you are monitoring all user activity on your product and based on your own data you built up a model, that predicts if a user will cancel her subscription in one week – with 75% accuracy. Nice job! But the next day you see, that a big chunk of the users are cancelling their subscription without any warning from your model. What did just happen? In this hypothetical scenario a strong competitor entered your market and offered the same solution you have, but on half the price. Of course, this is something your model wasn’t ready for. The presence of the competitor is an omitted variable in this case. In fact it’s a variable, that’s almost impossible to prepare any predictive models for.
Note: Predictive Analytics nowadays work pretty much by the principle of “what happened in the past will happen in the future”. This makes these models very vulnerable. If something new is happening on the market, it’s often not calculated in the predictions and it causes major inaccuracy. The bottom line is: don’t expect a predictive model to be accurate for more than 1 or 2 years.
Statistical Bias #7: Cause-effect Bias
Our brain is wired to see causation everywhere, where correlation shows up.
Cause-effect bias is usually not mentioned as a classy statistical bias, but I wanted to include it on this list as many decision makers (business/marketing managers) are not aware of that. Even those (me too), who are aware of it, have to remind themselves from time to time: correlation does not imply causation.
Everyday example of Cause-Effect Bias:
Here’s my favorite example: the kids who had tutors in high schools, got eventually worse grades, than the kids, who didn’t. I intentionally put this in this misleading way. But the point is, that even though you see a correlation between bad grades and tutoring, the tutoring wasn’t the cause of the bad grades. The bad grades were the cause the tutors were needed.
Online analytics related example of Cause-Effect Bias:
You have a new loyalty program! You see that the customers, who signed up into that loyalty program are spending 5-times more money in your e-commerce store, than those who didn’t. Is the loyalty program successful? Maybe, but we don’t know that for sure. Because it’s also possible, that only those more committed (or with other words: loyal) customers are interested in the loyalty program on the first hand, who were going to spend 5-times more anyway. (See more here: self-selection bias.)
Unfortunately the only way to crack the correlation vs. causation issue is to run experiments. While it’s easy to A/B test your loyalty program online – it’s a bit more difficult to say to the half of the kids who perform bad at school, that they don’t get tutors because of a scientific research. But let this be the problem of the social economists.
Statistical Bias #8: Funding Bias
I briefly mentioned Funding Bias (sometimes called sponsorship bias) already in Statistical Bias Types part 1. We are talking about it, when the results of a scientific study is biased in a way, that it supports the financial sponsor of the research.
Everyday example of Funding Bias:
I won’t name any particular industry here, but I think we all know, what I’m talking about. Anytime, you are watching “documentaries”, when you are reading the “news”, when you are checking “research results” – try to make sure first, that you are consuming content of independent creators, who are not biased by their sponsors’ expectations.
Online analytics related example of Funding Bias:
If you are working for a company as a Data Scientist or Analyst, you are getting your money from that company – so in a sense, it’s your sponsor. Now, of course you want to deliver good news to make your “sponsors” happy. Let’s imagine a game developer company. A data analyst might feel really bad for reporting, that the new game, that everybody was working on in the last 3 months: looks like a huge failure. But keep it in mind and train your colleagues too: as a data scientist/analyst your are not getting paid to deliver good news. You are getting paid to deliver accurate, useful and actionable information. Was the new product a failure? It’s OK, but make sure, that everyone can learn from the data that you have collected during the test phase, so the new version can be better!
Statistical Bias #9: Cognitive Bias
Cognitive biases are related to human perception, thus it’s a much broader category originally. But they have a relation to statistical biases too! They can also have a huge affect on how you should present and interpret the data.
Everyday and online analytics related examples of Cognitive Bias:
For cognitive biases I’m gonna lump together the everyday and the online examples. Here are the most important ones:
- Hindsight bias. Even the greatest findings seem very trivial – looking back at them a few days later. You feel, that it was so logical. You should have known this the whole time. When you are presenting the results of your 1-month data analysis project, there will be always someone in the room, who will tell: “I was gonna say the very same thing on the last meeting…” My suggestion: smile inside and try to keep the comment “of course, but then why weren’t you?” – for yourself.
- Confirmation bias. A variation of the previous one, but this is a bit more dangerous. Confirmation bias happens, when a decision maker has serious pre-conceptions and listening only to that part of your presentation, that confirms his/her beliefs and missing the rest. Suggestion: always have a one sentence take away for your presentations, that’s impossible to miss even if someone’s eyes are covered by preconceptions. (Also feel free to point out to possible confirmation biases and send over this article. ;-))
- Belief bias. When someone is so sure about his own gut feelings, that he/she is ignoring the results of a data research project. Suggestion: ehh… hustle. In more details here: Data-resistance – how to evangelize the data driven mindset?
- Curse of knowledge. When you are assuming someone has the same background knowledge, that you do. Especially important to be aware of this bias, when you are presenting your data projects to non-data-minded people. Mind that business managers not necessarily have in their dictionary the “statistically significant”, “multiple regression”, “least square estimates” phrases, so try to communicate them using their words. (Eg. “statistically significant” = “pretty damn sure”)
There are many-many more cognitive bias types, but I’ll limit my article to these four most important ones. If you want to learn more, look up this Wikipedia article: List of Cognitive Biases.
How not to be biased?
Now, that we have learned about all the important statistical bias types, the only question left is, how can we overcome them. How can we ultimately avoid to be biased? In next weeks article I’ll write about that and will give you some practical advice!
Till then, stick with me and subscribe to my weekly Newsletters (no spam, just 100% useful data content)! And if you have any comments, let me know below!