Python is one of the, if not the, most essential Data Science languages. It’s fairly easy to learn, it’s free, many companies are using it, and it has a tons of powerful statistical and data visualization libraries. In one sentence: if you are looking for a Data Science career, sooner or later you have to learn Python.

So I put together a Python for Data Science tutorial series starting from the very basics. As far as I know, this is one of the few Python tutorials online that’s:

  • in Python 3 and not in Python 2 (see why this is important below)
  • written for those who are just starting with coding
  • started from the basics, then guides you all the way through to advanced things like using pandas and other analytical data science libraries
  • 100% dedicated to being practical
  • and free…

Here are the articles!

Note: I’m continuously writing new articles and adding them to the list.

1) Install Python, SQL, R and Bash (for non-devs)

The very first step will be to set up your own Python environment. This article will show you how to do that. Plus, as an extra, if you go through the whole process, you will have bash, SQL and R too. The setup comes with the famous iPython and Jupyter Notebook Python extensions that will make your data-coding-life much easier! READ>>

2) Python Basics: the environment, Python variables and data types

I introduce the Jupyter Notebook, your soon-to-be-favorite interactive Python workspace. After that, we dig into the basics of Python: variables and data types (integers, strings, booleans, etc.). At the end of the episode you will find a quick exercise too! READ>>

3) Python Data Structures

The next article is about the most important data structures in Python: lists, dictionaries and tuples. You will learn how to create and modify these – and also how to access or update their elements. READ>>

4) Python Built-in Functions and Methods

Functions and methods are the one of the greatest advantages of Python. Using them, you can carry out simple but important data processes (like counting the number of elements, calculating the sum of integers, making strings upper- or lowercase, and so on…). In this article, I introduce the whole concept and give you a list of the most essential built-in functions and methods of Python. READ>>

5) Python 2 vs Python 3

At this point, you understand the basics of Python for Data Science. It’s time to clarify why we are using Python 3 and not Python 2.  READ>>

6) Python if statements

Let’s get back to coding! The next chapter presents the if statements. You can learn about the logic of Python if statement – as well as the syntax and advanced applications. READ>>

7) Python for loops

For loops in Python are perfect for processing repetitive programming tasks. In this article, I’ll show you everything you need to know about them: the syntax, the logic, advanced applications and best practices too! READ>>

8) Python For Loops and If Statements Combined

Now that you know how if statements and for loops work, it’s time to combine them. I’ll show you how to build nested for loops, put if statements within for loops, and at the end of the article I’ll give you an intermediate Python task to test the skills you’ve gathered so far. READ>>

9) Python Syntax Essentials and Best Practices

In my Python workshops and online courses I see that one of the trickiest things for newcomers is the syntax itself. It’s very strict and many things might seem inconsistent at first. In this article I’ve collected the Python syntax essentials you should keep in mind as a data professional — and I added some formatting best practices as well, to help you keep your code nice and clean. READ>>

10) Python Import Statement and the Most Important Built-in Modules for Data Scientists

So far we have worked with the most essential concepts of Python: variables, data structures, built-in functions and methods, for loops, and if statements. These are all parts of the core semantics of the language. But this is far from everything that Python knows. Actually this is just the very beginning and the exciting stuff is yet to come. Because Python also has tons of modules and packages that we can import into our projects… READ>>

11) The 5 most important Python libraries and packages for Data Scientists

In this article, I’ll introduce the five most important data science libraries and packages that do not come with Python by default. These are: Numpy, Pandas, Matplotlib, Scikit-Learn and Scipy. At the end of the article, I’ll also show you how to get (download, install and import) them. READ>>

12) Pandas Tutorial 1: Pandas Basics (Reading Data Files, DataFrames, Data Selection)

Pandas is one of the most popular Python libraries for Data Science and Analytics. I like to say it’s the “SQL of Python.” Why? Because pandas helps you manage two-dimensional data tables in Python. Of course, it has many more features. In this episode we will start with the pandas basics! READ>>

13) Pandas Tutorial 2: Aggregation and Grouping

I’ll introduce aggregation (such as min, max, sum, count, etc.) and grouping in pandas. Both are very commonly used methods in analytics and data science projects. READ>>

14) Pandas Tutorial 3: Important Data Formatting Methods (merge, sort, reset_index, fillna)

In the 3rd episode of the pandas tutorial, I’ll show you four data formatting methods that you might use a lot in data science projects. These are: merge, sort, reset_index and fillna! READ>>

15) Coming soon…

This is a continuously expanding article. So check back time to time!

Check out the SQL and the bash tutorials, too!

Tomi Mester