Python for Data Science – Tutorial for Beginners #1 – Python Basics

If you are learning Data Science, pretty soon you will meet Python. Why is that? Because it’s one of the most commonly used data languages.
It’s popular for 3 main reasons:

  • Python is fairly easy to interpret and learn.
  • Python handles different data structures very well.
  • Python has very powerful statistical and data visualization libraries.

In my Python for Data Science articles I’ll show you everything you have to know. I’ll start from the very basics – so if you have never touched code, don’t worry, you are at the right place. I’ll focus only on the data science related part of Python – and I will skip all the unnecessary and impractical trifles. We will go step by step and by the end of this tutorial series we will even do some fancy data things – like predictive analytics!

Here we go!

This is a Hands-On Tutorial!

I always prefer learning by doing over learning by reading… If you do the coding part with me on your computer, you will understand and recall everything at least 10 times better. Besides, at the end of every article I’ll attach one or two little exercises, so you can test yourself!
This means, though, that you will need a data server to practice. Follow this tutorial to set one up:

How to install Python, R, SQL and bash to practice data science

Note: In the above tutorial we set up Jupyter (with iPython) only. Later on we will install other Python libraries – eg. pandas, numpy, scikit, matplotlib – right when they will be needed!

Why should you learn Python for Data Science?

When it comes to learn data coding, you should focus on these four languages:

  • SQL
  • Python
  • R
  • Bash

Of course, it’s very nice if you have time to learn all four. But if you are newer to this field, you have to pick one or two first. I always suggest to start with Python and SQL. Using these two languages, you will cover 99% of the data science and analytics problems you’ll have to deal with in the future.

Note: I’ve already written an SQL for Data Analysis tutorial series. Go and check it out here: SQL for Data Analysis, episode #1!

Now why is it worth learning Python for Data Science?

  • It’s easy and fun.
  • It has many package as suitable for simpler Analytics projects (eg. segmentation, cohort analysis, explorative analytics, etc.) as advanced Data Science projects (eg. building machine learning models)
  • The job market begs for more data professionals with solid Python knowledge. It means knowing Python will be an extremely competitive element in your CV.

What is Python? Is Python for Data Science only?

I’ll keep the theoretical part short. But there are two things that you have to know about Python before you start using it.

Firstly, Python is a general purpose programming language and it’s not only for Data Science. This means, that you don’t have to learn every part of it to be a great data scientist. At the same time, if you learn the basics well, you will understand other programming languages too – which is always very handy, if you work in IT.

Secondly, Python is a high-level language. It means, that in terms of CPU-time it’s not the most effective language on the planet. But on the other hand it was made to be simple, “user-friendly” and easy to interpret. Thus what you might lose on CPU-time, you might win back on engineering time.

Python 2 vs Python 3 – which one to learn for Data Science?

Maybe you have heard about this Python 2.x vs Python 3.x battle. I won’t go into details here, because I’ve written another article about this topic already (here: Python 2 vs Python 3), but the point is:
Python 3 has been around since 2008 – and 95% of the data science related features and libraries have been migrated from Python 2 already. On the other hand Python 2 won’t be supported after 2020. So learning Python 2 at this point is like learning Latin – it’s useful in some cases, but the future is for Python 3.

Because of this, all my Python for Data Science tutorials will be written in Python 3.

Note: However, I’ll try to use code that works in both versions whenever possible.

Enough theory! Let’s get to coding!

How to open your Jupyter Notebook

Again: if you haven’t done it yet, go through this article first:
How to install Python, R, SQL and bash to practice data science

Once you have this data infrastructure in place – anytime, you want to use Python + Jupyter do these four steps:

1. Login to your server! Open iTerm2 and type this on the command line:
ssh [your_username]@[your_ipaddress]
(In my case: ssh dataguy@178.62.1.214)

Python for Data Science - Python Basics - Access Data Server

2. Start Jupyter Notebook on your server with this command:
jupyter notebook --browser any

Python for Data Science - Python Basics - Start Jupyter Notebook

3. Access Jupyter from your browser! Open Google Chrome (or whichever) and type this into the browser bar:
[IP Address of your remote server]:8888
(eg. in my case: 178.62.1.214:8888)

You will land on this screen:

Python for Data Science - Python Basics - Jupyter Login Token

You will be asked for a “password” or a “token”. As we haven’t generated a password, you need to use the token that you can easily find if you go back to your terminal window. Here:

Python for Data Science - Python Basics - Jupyter Token

4. You are in! Create a new Jupyter Notebook! (Or if you already have, open an existing one.)

Python for Data Science - Python Basics - create Python 3 Jupyter notebook

Important! While you are working in the browser, the iTerm window with the Jupyter command should run in the background. If you shut it down, your notebook in your browser will shut down too.

That’s it! Remember this workflow – you will use it quite often during my Python for Data Science tutorials.

How to use Jupyter Notebook

Let’s recap how to use Jupyter Notebook!

How to use Jupyter Notebook - Python For Data Science

  1. Type your Python command! It can be a multi-line command too – if you hit return/enter, it won’t run, it will just start a new line in the same cell!
    jupyter notebook type a command
  2. Hit SHIFT + ENTER to run your Python command!
    jupyter notebook run a command - python for data science
  3. Start typing and hit TAB! If it’s possible, Jupyter will auto-complete your expression (eg. for Python commands or for variables that you have already defined). If there is more than one possibility, you can choose from a drop-down menu.
    jupyter notebook autocompletition - python for data science

Python Basics

Great! You have everything from the technical side to start coding in Python! Now this tutorial will start off with the base concepts that you must learn before we go into how to use Python for Data Science. The six base concepts will be:

  1. Variables and data types
  2. Data Structures in Python
  3. Functions and methods
  4. If statements
  5. Loops
  6. Python syntax essentials

How to Become a Data Scientist
(free 50-minute video course by Tomi Mester)

Just subscribe to the Data36 Newsletter here (it’s free)!

To make it easier to read, learn and practice, I’ll break down these six topics into six articles! The first one is here:

Python Basics 1: Variables and Data types

In Python we like to assign values to variables. Why? Because it makes our code better — more flexible, reusable and understandable. At the same time one of the trickiest things in coding is exactly this “assignment concept.” When we refer to something, that refers to something, that refers to something… well, understanding that needs some brain capacity. But don’t you worry, you will get used to it – and you will love it!

Let’s see how it works!
Say we have a dog (‘Freddie’), and we would like to store some of his attributes (name, age, is_vaccinated, year_of_born, etc.) of this dog in Python variables! We will type this into a Jupyter notebook cell:

dog_name = 'Freddie'
age = 9
is_vaccinated = True
height = 1.1
birth_year = 2001

Python For Data Science - Variables and Data Types

Note: we could have done this one per cell. But this all-in-one solution was easier and more elegant.

From now on, if we type these variables, the assigned values will be returned:

Python for Data Sceince - Variables Demo

Just like in SQL, in Python we have different data types.

For instance the dog_name variable holds a string: 'Freddie'. In Python 3 a string is a sequence of Unicode characters (eg. numbers, letters, punctuation, etc.), so it can have numbers or exclamation marks or almost anything (eg. ‘R2-D2’ is a valid string). In Python it’s super easy to identify a string as it’s usually between quotation marks.
The age and the birth_year variables store integers (9 and 2001), which is a numeric Python data type. Another numeric data type is float, in our example: height, which is 1.1.
The is_vaccinated’s True value is a so called Boolean value. Booleans can be only True or False.

Summarized in a table:

Variable Name Value Data Type
dog_name 'Freddie' str (short for string)
age 9 int (short for integer)
is_vaccinated True bool (short for Boolean)
height 1.1 float (short for floating)
birth_year 2001 int (short for integer)

There are many more data types, but as a start, knowing these four will good enough and the rest will come along the way.

It’s important to know that in Python every variable is overwritable. Eg. if we now run:

dog_name = 'Eddie'

in our Jupyter Notebook, our dog won’t be Freddie any more…

Python for Data Science - overwriting a python variable

Python Variables – Basic Operators

You have just learned about variables. It’s time to play around with them!
Let’s define two new variables a and b:

a = 3
b = 4

python variable demo

What we can do with a and b? Well, first of all, a bunch of basic arithmetic operations! It’s nothing special, you could have found out these by common sense, but just in case, here’s the list:

Operator What does it do? Result in our example
a + b Adds a to b 7
a - b Subtract b from a -1
a * b Multiply a by b 12
a / b Divide a by b 0.75
b % a Divides b by a and returns remainder 1
a ** b a raised to the power of b 81

And how it looks in Jupyter:

Python for Data Science - Python Arithmetic Operators

Note: try it for yourself with your values in your Jupyter Notebook! It’s fun!

We can use some variables with comparison operators. The results will always be Boolean values! (Remember? Booleans can be only True or False.) a and b are still 3 and 4.

python variable demo

Operator What does it do? Result in our example
a > b Evaluate if a is greater than b False
a < b Evaluate if a is less than b True
a == b Evaluate if a equals b False

In the notebook:

Python for Data Science - python comparsion operations

And eventually we can use logical operators on our variables!
Let’s define c and d first:

c = True
d = False

Operator What does it do? Result in our example
c and d True if both c and d are True False
c or d True if either c or d is True True
not c The opposite of c False

Python for Data Science - Logical Operators

This is easy and maybe less exciting, but again: just start to type this into your notebook, run your commands and start to combine things – and it’s gonna be much more fun!

Speaking of which! Spice things up with some exercises!

Test yourself #1

Here are some new variables:

a = 1
b = 2
c = 3
d = True
e = 'cool'

What will be the returned data type and the exact result of this operation?
a == e or d and c > b

Note: First try to find it out without typing it into Python – then check if you have guessed right!
.
.
.
The answer is: it’s gonna be a Boolean and it will be True.
Why? Because:

  • a == e is False – as 1 is not equal to ‘cool’
  • d is True by definition
  • c > b is True, because 3 is greater than 2

So a == e or d and c>b translated is: False or True and True, which is True.

test yourself results python operators 0

Test yourself #2

Use the variables from the previous assignment:

a = 1
b = 2
c = 3
d = True
e = 'cool'

But this time try to figure out the result of this slightly modified expression:
not a == e or d and not c > b
Uh-oh, wait a minute! There is a trick here! To give a proper answer you have to know one more rule! The evaluation order of the logical operators is: 1. not 2. and 3. or
.
.
.
Here’s the solution: True.
Why?
Let’s see! Using the previous exercise’s logic, this is what we have:
not False or True and not True

As we have discussed, the first logical operator evaluated is the not. After firing all the nots, this is what we have:
True or True and False

The second step is to evaluate the and operator. Translated it’s:
True or (True and False), which leads to True or False.

And the last step is the or:
True or False –» True

test yourself results python operators 1

Conclusion

Done with episode 1!
Did you realize that you have just started to code in Python 3? Wasn’t it easy and fun?
Well, good news: the rest of Python is just as easy as this was. The difficulty will come from the combination of these simple things… But that’s why learning the basics very well is so important!
So stay with me – in the next chapter of “Python for Data Science” I’ll introduce the most important Data Structures in Python!

Cheers,
Tomi Mester

Cheers,
Tomi Mester

The Junior Data Scientist's First Month
A 100% practical online course. A 6-week simulation of being a junior data scientist at a true-to-life startup.