In the last 2 weeks I’ve introduced 9 common statistical bias types. If you went through them, you have already taken the first very important step towards overcoming these issues and not letting yourself be biased: you are aware of these bias types.

In this article I’ll share a bit more practical advice on how to prevent biased statistics in your data science and analytics projects – or just in everyday life.

If you missed the first two articles:

#1: Do not underestimate the amount of stupidity around you!

This is a non-scientific study, but it is worth a read, because it seems to be legit: the basic laws of human stupidity.

But here’s the one-sentence summary: just because someone says something doesn’t necessarily make it true.

We tend to have trust in what people say to us. Especially when they have a higher social status (e.g. celebrities) or they are in higher company positions (e.g. the boss of the boss of your boss’ boss). But the thing is that people in higher positions are human beings as well. I’m not suggesting that they are stupid, but:

  1. They are usually just as biased as everyone else. (If not more.)
  2. They are not data analysts, so they don’t have second thoughts on possibly fake statistics. (In most cases, at least.)
  3. They usually have second- (or third- (or fourth-))hand information.
  4. That’s not even talking about the different personal ambitions and internal politics…

So dare to question your boss: are his/her statements based on data and hard facts — or maybe just gossip, assumptions or opinions?

#2: Always ask about the research method!

Anytime you see statistics: learn about the research method first.

Remember the different survey errors I mentioned in the previous articles? It seems like 90% of all  studies that are available online are skewed by one or more of those statistical bias types.

And they are online.

Nobody criticizes them. Nobody asks questions. People are reading them, liking them, sharing them. Writing more articles about them. That’s how fake stuff spreads around the world.

So, you, dear fellow data-driven Friend: be critical and always check whether the research was done properly or not when you read a study.

Note: and never ever trust studies that don’t mention the research methods at all.

#3: Do your own analyses and research!

“If you want something done right, do it yourself!” – Charles-Guillaume Étienne

Nothing is more trustworthy than your own analyses… Of course, you have to be critical with yourself as well, because there is always the potential to make mistakes. But! At least you know about yourself, that:

  1. you care about doing your research right (I know you do, otherwise, you would not be reading this article ;-))
  2. you have the statistical background to do your research right. (If you don’t, please learn it first! Book recommendation: Practical Statistics for Data Scientists.)

If you are still not 100% sure about your results, that’s normal. In fact, that’s fantastic in a way, because being skeptical with yourself means that you really want to do first class research.

When in doubt, don’t worry, just…

#4: Ask smarter people!

I’ve been lucky to always have mentors in my analytics projects. And I know my mentors are:

  1. smarter than me.
  2. very critical about what I do.

And this is just perfect for me because when they say that my analysis/research looks correct, that means that it met even their higher standards, so my findings are most probably good.

Where to find these smarter people? Tough question. But – for instance – it can be a senior data professional at the company you are working for, or your favorite data professor at the university. If you are a younger data scientist who doesn’t have these kind of connections, try to reach out to people on Twitter/Linkedin and ask them to give a second opinion on your recent publication (or something) – if you are (just a little bit) lucky, somebody will help you out!

#5: Think!

Remember that I started my first article about Statistical Bias Types with the sentence: “Humans are stupid.” I said something similar in this article, too.

I was just kidding.

Humans are not stupid. Only too lazy – or more often too busy – to think.

And that’s the bottom line:

When you are doing research, when you are trying to interpret statistical results, when you are working with data: spend time thinking!

If you are aware of the different possible statistical bias types and if you are spending enough time to think about things, there is a very high chance that you will avoid every common pitfall that can possibly bias your results.

Spending 1 month on research and delivering 100% accurate and actionable information is infinity times better than spending only 2 weeks on it (2 weeks research + 0 days thinking) and delivering misleading results.

If your manager doesn’t support this idea, show him this article series (ep1, ep2, ep3) to give him an idea of the possible disasters that statistical bias can cause! 😉

Conclusion

Everybody makes mistakes. I hope this article series will help you avoid the potential mistakes that come from statistical bias! Remember the takeaway: be aware of the different statistical bias types, be critical, and think!

Cheers,
Tomi Mester