Python is one of the, if not the, most essential Data Science languages. It’s fairly easy to learn, it’s free, many companies are using it, and it has a tons of powerful statistical and data visualization libraries. In one sentence: if you are looking for a Data Science career, sooner or later you have to learn Python.
So I put together a Python for Data Science tutorial series starting from the very basics. As far as I know, this is one of the few Python tutorials online that’s:
- in Python 3 and not in Python 2 (see why this is important below)
- written for those who are just starting with coding
- started from the basics, then guides you all the way through to advanced things like using pandas and other analytical data science libraries
- 100% dedicated to being practical
- and free…
Here are the articles!
Note: I’m continuously writing new articles and adding them to the list.
Need a free Python Cheat Sheet first?
The very first step will be to set up your own Python environment. This article will show you how to do that. Plus, as an extra, if you go through the whole process, you will have bash, SQL and R too. The setup comes with the famous iPython and Jupyter Notebook Python extensions that will make your data-coding-life much easier! READ>>
I introduce the Jupyter Notebook, your soon-to-be-favorite interactive Python workspace. After that, we dig into the basics of Python: variables and data types (integers, strings, booleans, etc.). At the end of the episode you will find a quick exercise too! READ>>
The next article is about the most important data structures in Python: lists, dictionaries and tuples. You will learn how to create and modify these – and also how to access or update their elements. READ>>
Functions and methods are the one of the greatest advantages of Python. Using them, you can carry out simple but important data processes (like counting the number of elements, calculating the sum of integers, making strings upper- or lowercase, and so on…). In this article, I introduce the whole concept and give you a list of the most essential built-in functions and methods of Python. READ>>
At this point, you understand the basics of Python for Data Science. It’s time to clarify why we are using Python 3 and not Python 2. READ>>
Let’s get back to coding! The next chapter presents the if statements. You can learn about the logic of Python if statement – as well as the syntax and advanced applications. READ>>
For loops in Python are perfect for processing repetitive programming tasks. In this article, I’ll show you everything you need to know about them: the syntax, the logic, advanced applications and best practices too! READ>>
Now that you know how if statements and for loops work, it’s time to combine them. I’ll show you how to build nested for loops, put if statements within for loops, and at the end of the article I’ll give you an intermediate Python task to test the skills you’ve gathered so far. READ>>
In my Python workshops and online courses I see that one of the trickiest things for newcomers is the syntax itself. It’s very strict and many things might seem inconsistent at first. In this article I’ve collected the Python syntax essentials you should keep in mind as a data professional — and I added some formatting best practices as well, to help you keep your code nice and clean. READ>>
So far we have worked with the most essential concepts of Python: variables, data structures, built-in functions and methods, for loops, and if statements. These are all parts of the core semantics of the language. But this is far from everything that Python knows. Actually this is just the very beginning and the exciting stuff is yet to come. Because Python also has tons of modules and packages that we can import into our projects… READ>>
In this article, I’ll introduce the five most important data science libraries and packages that do not come with Python by default. These are: Numpy, Pandas, Matplotlib, Scikit-Learn and Scipy. At the end of the article, I’ll also show you how to get (download, install and import) them. READ>>
Pandas is one of the most popular Python libraries for Data Science and Analytics. I like to say it’s the “SQL of Python.” Why? Because pandas helps you manage two-dimensional data tables in Python. Of course, it has many more features. In this episode we will start with the pandas basics! READ>>
I’ll introduce aggregation (such as min, max, sum, count, etc.) and grouping in pandas. Both are very commonly used methods in analytics and data science projects. READ>>
In the 3rd episode of the pandas tutorial, I’ll show you four data formatting methods that you might use a lot in data science projects. These are: merge, sort, reset_index and fillna! READ>>
The 4th episode of the pandas tutorial series is about data visualization. I show you how you can create a histogram using pandas and matplotlib. And as an intro, I’ll also show you how you can draw a line chart and a bar chart, too. READ>>
And I couldn’t miss from these articles one of the favorit charts of data scientists: the scatter plot. It’s a pretty cool way to discover and show correlations between two or more variables in a dataset. In this episode, I’ll demonstrate how you can create your first scatter plot using pandas and matplotlib. READ>>
Pandas and Python are very popular for machine learning. You know my thoughts about machine learning (it’s only 5% of the job for junior data scientists). Regardless, I wanted to give you a brief intro into the most basic machine learning model: linear regression. Here, you’ll learn the statistics behind it, how it works — and of course, how to get it done in Python and Pandas… or well, in this case using numpy. READ>>
This is a continuously expanding article. So check back time to time!
- If you want to learn more about how to become a data scientist, take my 50-minute video course: How to Become a Data Scientist. (It’s free!)
- Also check out my 6-week online course: The Junior Data Scientist’s First Month video course.