Statistics is difficult. Of course it is, as it’s most of the actual science part in data science. But that doesn’t mean that you couldn’t learn it by yourself if you are smart and determined enough.
In this article, I am going to list 6 books that I recommend starting with if you want to learn statistics. The first three are lighter reads. These books are really good for setting your mind to think more numerically, mathematically and statistically. They also do a good job of presenting why statistics is exciting (it is!).
The second three books are more scientific — with formulas and Python or R codes.
Don’t get intimidated though! Mathematics is like LEGO: if you build the small pieces up right, you won’t have trouble with the more complex parts either!
Let’s see the list!
1. You Are Not So Smart — by David McRaney
When I first saw the title, I loved it already! This is a very well written book, containing many stories — and everything in it is based on real experiments and real scientific research.
David McRaney introduces one sad but true fact of life: that our brain constantly tricks us and we are not even smart enough to realize it. For an aspiring data scientist, this book is essential, because it lists many common statistical bias types. It points out classic mistakes like the self-serving bias, the availability heuristic, and the confirmation bias. It also shows why people tend to be tricked by fake news or scams and why people don’t always help when seeing someone having a heart attack on a busy street. Being aware of these biases should be basic, but I see even practicing data professionals fall for them from time to time…
(I wrote a detailed article about Statistical Bias Types.)
2. Think Like a Freak — by Dubner & Levitt
The previous book was about why we are not so smart. But this one is about how to be smarter! Think Like a Freak shows us how critical and unconventional thinking can lead to huge success… and, hey, that’s something that as a data scientist, you should practice every day.
The book lists a bunch of case studies from everyday life, goes into details and analyzes why a solution for a problem is good or bad. Reading it will definitely boost your analytical thinking.
3. Innumeracy — by John Allen Paulos
If you hated mathematics in middle or high school, it was for one reason: you had a bad teacher. A good teacher turns mathematical equations into mystical puzzles, probability theory into detective stories, and linear algebra into the ultimate solution for all the big questions in life. Luckily, I had really good math teachers, so I was always generally excited by mathematics and statistics. Looking back, this really affected my life.
If you didn’t have a good math teacher, John Allen Paulos is here to make up the loss for you: he’s the awesome teacher you wish you’d had. Innumeracy focuses mostly on one specific segment of statistics: probability theory and calculations. It explains the math behind it, shows the formulas and puts everything into a very logical context. And it does it by showing the real life applications of these calculations, so you can immediately understand the advantage of being more math-minded.
The Junior Data Scientist's First Month
A 100% practical online course. A 6-week simulation of being a junior data scientist at a true-to-life startup.
“Solving real problems, getting real experience – just like in a real data science job.”
4. Naked Statistics — by Charles Wheelan
This book is the perfect transition between the previous light-read statistics books and the next two more scientific ones. Reading it, you can easily understand basic concepts like mean, median, mode, standard deviation, variance, and standard error, or the more advanced things like the central limit theorem, normal distribution, correlation analysis or regression analysis.
Almost needless to say that all of these are packed into metaphors for ease of understanding.
5. Practical Statistics for Data Scientists — by Andrew & Peter Bruce (2nd edition)
This book contains everything that a Junior Data Scientist has to know about the practical part of statistics. In my opinion, the biggest advantage of the book is the structure. It really makes it clear how things are built on top of each other. But it also goes into detail on the most common prediction and classification models — and it talks a bit about Machine Learning and Unsupervised Learning too.
The second edition of the book comes with Python code examples, too. (If you don’t know Python, that’s not a problem; you can simply skip those parts.)
6. Think Stats — by Allen B. Downey
Topic-wise, Think Stats is really similar to Practical Statistics for Data Scientists. I wanted to have it on the list, though, because even if the topic is the same, different writers usually approach things differently. On a topic as complex as data science, I think it’s worth looking at different angles and having things explained by two different data professionals.
Plus, this is a book from 2011. It’s good to see how much the interpretation of (even these standard) things has changed in as short as six years.
Oh, and I almost forgot to mention that Think Stats is available for free in PDF format, here: http://greenteapress.com/thinkstats/
And that’s it!
By reading these 6 books you can get a solid understanding of Statistics for Data Science!
- If you want to learn more about how to become a data scientist, take my 50-minute video course: How to Become a Data Scientist. (It’s free!)
- Also check out my 6-week online course: The Junior Data Scientist’s First Month video course.
Cheers,
Tomi Mester