If you are learning Data Science, pretty soon you will meet Python. Why is that? Because it’s one of the most commonly used data languages.
It’s popular for 3 main reasons:
- Python is fairly easy to interpret and learn.
- Python handles different data structures very well.
- Python has very powerful statistical and data visualization libraries.
In my Python for Data Science articles I’ll show you everything you have to know. I’ll start from the very basics – so if you have never touched code, don’t worry, you are at the right place. I’ll focus only on the data science related part of Python – and I will skip all the unnecessary and impractical trifles. We will go step by step and by the end of this tutorial series we will even do some fancy data things – like predictive analytics!
Here we go!
This is a Hands-On Tutorial!
I always prefer learning by doing over learning by reading… If you do the coding part with me on your computer, you will understand and recall everything at least 10 times better. Besides, at the end of every article I’ll attach one or two little exercises, so you can test yourself!
This means, though, that you will need a data server to practice. Follow this tutorial to set one up:
How to install Python, R, SQL and bash to practice data science
Note: In the above tutorial we set up Jupyter (with iPython) only. Later on we will install other Python libraries – eg. pandas, numpy, scikit, matplotlib – right when they will be needed!
Why should you learn Python for Data Science?
When it comes to learn data coding, you should focus on these four languages:
- SQL
- Python
- R
- Bash
Of course, it’s very nice if you have time to learn all four. But if you are newer to this field, you have to pick one or two first. I always suggest to start with Python and SQL. Using these two languages, you will cover 99% of the data science and analytics problems you’ll have to deal with in the future.
Note: I’ve already written an SQL for Data Analysis tutorial series. Go and check it out here: SQL for Data Analysis, episode #1!
Now why is it worth learning Python for Data Science?
- It’s easy and fun.
- It has many package as suitable for simpler Analytics projects (eg. segmentation, cohort analysis, explorative analytics, etc.) as advanced Data Science projects (eg. building machine learning models)
- The job market begs for more data professionals with solid Python knowledge. It means knowing Python will be an extremely competitive element in your CV.
What is Python? Is Python for Data Science only?
I’ll keep the theoretical part short. But there are two things that you have to know about Python before you start using it.
Firstly, Python is a general purpose programming language and it’s not only for Data Science. This means, that you don’t have to learn every part of it to be a great data scientist. At the same time, if you learn the basics well, you will understand other programming languages too – which is always very handy, if you work in IT.
Secondly, Python is a high-level language. It means, that in terms of CPU-time it’s not the most effective language on the planet. But on the other hand it was made to be simple, “user-friendly” and easy to interpret. Thus what you might lose on CPU-time, you might win back on engineering time.
Python 2 vs Python 3 – which one to learn for Data Science?
Maybe you have heard about this Python 2.x vs Python 3.x battle. I won’t go into details here, because I’ve written another article about this topic already (here: Python 2 vs Python 3), but the point is:
Python 3 has been around since 2008 – and 95% of the data science related features and libraries have been migrated from Python 2 already. On the other hand Python 2 won’t be supported after 2020. So learning Python 2 at this point is like learning Latin – it’s useful in some cases, but the future is for Python 3.
Because of this, all my Python for Data Science tutorials will be written in Python 3.
Note: However, I’ll try to use code that works in both versions whenever possible.
Enough theory! Let’s get to coding!
How to open your Jupyter Notebook
Again: if you haven’t done it yet, go through this article first:
How to install Python, R, SQL and bash to practice data science
Once you have this data infrastructure in place – anytime, you want to use Python + Jupyter do these four steps:
1. Login to your server! Open iTerm2 and type this on the command line:ssh [your_username]@[your_ipaddress]
(In my case: ssh dataguy@178.62.1.214
)
2. Start Jupyter Notebook on your server with this command:jupyter notebook --browser any
3. Access Jupyter from your browser! Open Google Chrome (or whichever) and type this into the browser bar:[IP Address of your remote server]:8888
(eg. in my case: 178.62.1.214:8888
)
You will land on this screen:
You will be asked for a “password” or a “token”. As we haven’t generated a password, you need to use the token that you can easily find if you go back to your terminal window. Here:
4. You are in! Create a new Jupyter Notebook! (Or if you already have, open an existing one.)
Important! While you are working in the browser, the iTerm window with the Jupyter command should run in the background. If you shut it down, your notebook in your browser will shut down too.
That’s it! Remember this workflow – you will use it quite often during my Python for Data Science tutorials.
How to use Jupyter Notebook
Let’s recap how to use Jupyter Notebook!
- Type your Python command! It can be a multi-line command too – if you hit return/enter, it won’t run, it will just start a new line in the same cell!
- Hit
SHIFT + ENTER
to run your Python command! - Start typing and hit
TAB
! If it’s possible, Jupyter will auto-complete your expression (eg. for Python commands or for variables that you have already defined). If there is more than one possibility, you can choose from a drop-down menu.
Python Basics
Great! You have everything from the technical side to start coding in Python! Now this tutorial will start off with the base concepts that you must learn before we go into how to use Python for Data Science. The six base concepts will be:
- Variables and data types
- Data Structures in Python
- Functions and methods
- If statements
- Loops
- Python syntax essentials
How to Become a Data Scientist
(free 50-minute video course by Tomi Mester)
Just subscribe to the Data36 Newsletter here (it’s free)!
Thank you!
You have successfully joined our subscriber list.
To make it easier to read, learn and practice, I’ll break down these six topics into six articles! The first one is here:
Python Basics 1: Variables and Data types
In Python we like to assign values to variables. Why? Because it makes our code better — more flexible, reusable and understandable. At the same time one of the trickiest things in coding is exactly this “assignment concept.” When we refer to something, that refers to something, that refers to something… well, understanding that needs some brain capacity. But don’t you worry, you will get used to it – and you will love it!
Let’s see how it works!
Say we have a dog (‘Freddie’), and we would like to store some of his attributes (name, age, is_vaccinated, year_of_born, etc.) of this dog in Python variables! We will type this into a Jupyter notebook cell:
dog_name = 'Freddie'
age = 9
is_vaccinated = True
height = 1.1
birth_year = 2001
Note: we could have done this one per cell. But this all-in-one solution was easier and more elegant.
From now on, if we type these variables, the assigned values will be returned:
Just like in SQL, in Python we have different data types.
For instance the dog_name
variable holds a string: 'Freddie'
. In Python 3 a string is a sequence of Unicode characters (eg. numbers, letters, punctuation, etc.), so it can have numbers or exclamation marks or almost anything (eg. ‘R2-D2’ is a valid string). In Python it’s super easy to identify a string as it’s usually between quotation marks.
The age
and the birth_year
variables store integers (9
and 2001
), which is a numeric Python data type. Another numeric data type is float, in our example: height
, which is 1.1
.
The is_vaccinated
’s True
value is a so called Boolean value. Booleans can be only True
or False
.
Summarized in a table:
Variable Name | Value | Data Type |
dog_name |
'Freddie' |
str (short for string) |
age |
9 |
int (short for integer) |
is_vaccinated |
True |
bool (short for Boolean) |
height |
1.1 |
float (short for floating) |
birth_year |
2001 |
int (short for integer) |
There are many more data types, but as a start, knowing these four will good enough and the rest will come along the way.
It’s important to know that in Python every variable is overwritable. Eg. if we now run:
dog_name = 'Eddie'
in our Jupyter Notebook, our dog won’t be Freddie any more…
Python Variables – Basic Operators
You have just learned about variables. It’s time to play around with them!
Let’s define two new variables a
and b
:
a = 3
b = 4
What we can do with a
and b
? Well, first of all, a bunch of basic arithmetic operations! It’s nothing special, you could have found out these by common sense, but just in case, here’s the list:
Operator | What does it do? | Result in our example |
a + b |
Adds a to b | 7 |
a - b |
Subtract b from a | -1 |
a * b |
Multiply a by b | 12 |
a / b |
Divide a by b | 0.75 |
b % a |
Divides b by a and returns remainder | 1 |
a ** b |
a raised to the power of b | 81 |
And how it looks in Jupyter:
Note: try it for yourself with your values in your Jupyter Notebook! It’s fun!
We can use some variables with comparison operators. The results will always be Boolean values! (Remember? Booleans can be only True
or False
.) a
and b
are still 3
and 4
.
Operator | What does it do? | Result in our example |
a > b |
Evaluate if a is greater than b |
False |
a < b |
Evaluate if a is less than b |
True |
a == b |
Evaluate if a equals b |
False |
In the notebook:
And eventually we can use logical operators on our variables!
Let’s define c
and d
first:
c = True
d = False
Operator | What does it do? | Result in our example |
c and d |
True if both c and d are True |
False |
c or d |
True if either c or d is True |
True |
not c |
The opposite of c |
False |
This is easy and maybe less exciting, but again: just start to type this into your notebook, run your commands and start to combine things – and it’s gonna be much more fun!
Speaking of which! Spice things up with some exercises!
Test yourself #1
Here are some new variables:
a = 1
b = 2
c = 3
d = True
e = 'cool'
What will be the returned data type and the exact result of this operation?a == e or d and c > b
Note: First try to find it out without typing it into Python – then check if you have guessed right!
.
.
.
The answer is: it’s gonna be a Boolean and it will be True
.
Why? Because:
a == e
isFalse
– as 1 is not equal to ‘cool’d
isTrue
by definitionc > b
isTrue
, because 3 is greater than 2
So a == e or d and c>b
translated is: False or True and True
, which is True
.
Test yourself #2
Use the variables from the previous assignment:
a = 1
b = 2
c = 3
d = True
e = 'cool'
But this time try to figure out the result of this slightly modified expression:not a == e or d and not c > b
Uh-oh, wait a minute! There is a trick here! To give a proper answer you have to know one more rule! The evaluation order of the logical operators is: 1. not
2. and
3. or
.
.
.
Here’s the solution: True
.
Why?
Let’s see! Using the previous exercise’s logic, this is what we have:not False or True and not True
As we have discussed, the first logical operator evaluated is the not
. After firing all the not
s, this is what we have:True or True and False
The second step is to evaluate the and
operator. Translated it’s:True or (True and False)
, which leads to True or False
.
And the last step is the or
:True or False
–» True
Conclusion
Done with episode 1!
Did you realize that you have just started to code in Python 3? Wasn’t it easy and fun?
Well, good news: the rest of Python is just as easy as this was. The difficulty will come from the combination of these simple things… But that’s why learning the basics very well is so important!
So stay with me – in the next chapter of “Python for Data Science” I’ll introduce the most important Data Structures in Python!
- If you want to learn more about how to become a data scientist, take my 50-minute video course: How to Become a Data Scientist. (It’s free!)
- Also check out my 6-week online course: The Junior Data Scientist’s First Month video course.
Cheers,
Tomi Mester
Cheers,
Tomi Mester