In the last few months I have worked really hard to put together a introductory course in data coding for those, who are new to Data Science. I’ve selected bash (aka. the command line) as the first data language to show you, because I find it easy to interpret – even for first timers. In my articles I’ve started “the story” from the very beginning, so if you have never touched coding/programming so far, don’t worry, you will understand everything. My main focus was to keep everything easy-to-follow, but also practical and hands-on.

If you go through these 7 articles, you will learn how to do basic data cleaning, data formatting and analytics at the command line. On the top of that you will have your own data server to practice – we will use that not only here, but in my future SQL, Python and R tutorials too.

Note: if you are new to data science, read the data analytics basics first!

Here are the 7 articles:

1) Data Coding 101 – How to install Python, SQL, R and Bash (for non-devs)

Step 0 is creating your data environment. In this tutorial I’ll show you step by step, how to do that – and as a result you will have your own data infrastructure with bash, python, R and SQL. Plus you will get access to the famous tools like iPython, Jupyter, RStudio, and pgadmin4. All of these are free. READ>>

2) Data Coding 101 – Introduction to Bash – ep1

The first episode of my bash specific tutorials covers the basic “orientation” commands (how to create directories, how to change directories, how to move files, how to download files, etc.), some basic data sampling tools (such as head and tail) and the word count tool. READ>>

3) Data Coding 101 – Introduction to Bash – ep2

In the second episode I’m introducing 3 major concepts in the command line: options, pipe and print-in-a-file. Besides I’m showing you the grep command, which is a widely used filter tool in bash. READ>>

4) Data Coding 101 – Introduction To Bash – ep3

This chapter gets closer to applied statistics as here we are doing our own Median, Max and Min calculations in a 7M+ rows data file. The tools we are gonna learn for this are the sorting and unique commands! READ>>

5) Data Coding 101 – Introduction To Bash – ep4 (with video)

Here I am showing you some best practices to speed up your daily work at the command line. 9 tricks – and if you don’t like to read, I’m glad to tell you that this was my first article, that came with a full video tutorial as well. (Find it in the article.) READ>>

6) Data Coding 101 – Introduction To Bash – ep5

The next step of bash is to learn the control flows such as if-then-else statements and while loops. You will place these into scripts and on the way you will learn how to use bash variables as well. Plus I’ll show you a little script to prank your friends by pretending wifi password hacking… READ>>

7) Data Coding 101 – Introduction to Bash – ep6 (last episode)

In this closing episode I’ll give you a brief introduction to 4 more command line tools: sed, awk, join and date. All these will help you format and clean your data to be more flexible on your data science/analytics projects in the future! READ>>


By the end of these series you will be ready to start your own pet project and sharpen your skills via learning-by-doing!

If you have any questions, feel free to ask in the comment section below.

In the near future I’ll continue with Python and SQL tutorials! If you don’t want to miss them, subscribe to my Newsletter!

Tomi Mester