Data Coding 101 – How to install Python, SQL, R and Bash (for non-devs)

The ultimate step by step guide – for non-developers and wannabe data scientists! This is how to install R, Python, SQL and Bash in 30 minutes and start learning data coding today.

If you want to be a data scientist, a data analyst or just simply want to analyze the data of your startup (or any online business), sooner or later you will have to learn data coding.

data coding languages-data-mining-r-python-sql

source: KDNuggets

There are 4 important languages, you should learn:

  • SQL
  • Python
  • R
  • Bash (sometimes referred as “the command line”)

In the next months I’ll provide more and more articles on data36.com regarding to data coding. However as a foundation of every further conversation on this topic, you need to have Python, R, SQL and bash on your computer. Once you have them, you will be able to practice by yourself (and later build your pet-projects too), as well as follow, practice and learn via my upcoming data coding tutorial articles and videos.

In this article I’ll drive you through step by step, how to get these data tools. At the end of the article you will have your own fully functioning data infrastructure with:

  • bash (and mcedit),
  • postgreSQL (and pgAdmin4),
  • Python 2.7 (and Jupyter) and
  • R (and RStudio)

Good news:

  1. I will show you here the exact same tools, that we are using in real life data science projects.
  2. All these tools are completely free!

Yes, the funny thing is, that most of the super-scientific stuff what you can read about in different data science articles are made by open-source data tools. How cool is that?

Anyway: first things first – let’s go and get Python, SQL, R and bash on your computer!

Note: This article is enjoyable, if you are doing the steps and the coding part with me. If you are reading this on your mobile, I suggest to save the article for later (eg. send it to yourself in e-mail) and get back here when you have ~60 minutes and when you are on your Desktop computer or notebook.

Note 2: To have your data infrastructure correctly set up, you just simply need to follow my instructions down here step by step. Most of the time just copy-paste my code. So don’t be afraid to work on Terminal and writing codes. It’s easy, even if you are not a developer (yet). The article is long however, because I tried to explain each step as much as I could.

The Operating System

I use all my data tools on Ubuntu – which is a Linux operating system – and I suggest you to do the same. I use a Mac as a notebook, but you can have a PC too. In this case it doesn’t really matter, because we won’t install Ubuntu on our computer, but access it via internet.

What we will do here is to connect to a remote server – type commands and make the remote server do the data analyses instead of our local computer.

data coding remote server simplified

A bit simplified, but this is how you can imagine it.

(Note that you can set up Ubuntu on your personal/work computer too, if you really want to. But this is something we usually don’t do in real life, because with that solution we would limit our data processes to our computer’s capacity. Also we would lose some cool features.)

If you use a remote server for data analysis, you will be able to:

  • Access your data infrastructure from any computer with a login-name and a password (even if you lose or break your personal notebook or something). Don’t worry, nobody else will able to access your data on your remote server – this is a completely private thing.
  • Automate your data scripts (eg. make them run in every 3 hours, even if you turn off your notebook).
  • Scale your stuff. You won’t be limited to your computer’s capacity. Renting a few more processors or memory is just one click-away, if you are using a remote server.
  • Use Ubuntu without installing it on your computer.

The only downside of going with a remote server, that it costs money. Fortunately these prices are very low (starting with $5 per month).

I will create all my data coding tutorials (videos and articles) using the exact same data stack, that I’ll describe in this article. So if you want to follow me, it will be much easier for you, if you are going through this article step by step and set up everything like I do. To make everything working properly, please make sure that you won’t miss any of the steps and you do them in the same order as it’s written here! The most important parts of course are the code parts. All the codes are marked:  like this.

************
UPDATE: Actually, you don’t need to go through on this whole article step-by-step, if you don’t want. 🙂 Based on reader requests I developed a solution, where all the steps below (the exact same steps) have been already done by me.
I built a data infrastructure – called the Data36 Learn server -, where you can simply log in and use Python (+ Jupyter), postgreSQL (+ pgAdmin), bash (+ mcedit) and R (+ RStudio). This also means, that you can leave the hassle, skip the rest of this article and jump instantly where you want: practice data coding.
Note: as I pay for the server, I’ll charge for this a monthly fee too.
More info here: Start with Python, SQL, R and bash in 1 minute!

If you don’t need my pre-built solution, just continue the article and you can still build everything by yourself following this guide!
************

Step 1: Get your remote server!

The next step is to find a hosting service to create your first remote server. I used many services and so far I’ve found DigitalOcean the best. You can rent here a server for $5/month (this will be perfect for us for now).

First, go to their website and create an account: DigitalOcean.com
Disclaimer: the link above is a special invitation by me – if you use that, you’ll get $10 free credit (and I’ll get $25 free credit). If you don’t want to use my link, you can simply click here instead. Note that in this case you won’t get the $10 credit.

You will land here:

data coding register to digital ocean

Register with your email address and you will get a confirmation email in your inbox. Confirm and you will see a screen, where you can add your credit/debit card details or use PayPal. (For security reasons I always use PayPal.)

If you are done with that, you are just one step away to create your first remote server. You will see the “Droplets” screen. Click “Create Droplet” (big green button, top right corner).

And you will end up here:

data coding create a droplet digital ocean

Make sure, you are using these settings:

  1. “Choose an image” : Distributions: Ubuntu 14.04.5 x64data coding ubuntu 1404
  2. “Choose a size” : Standard: $5/month
    This will be more than enough for now. If it will be needed, you can scale it up in the future. As you can see you’ll pay on hourly basis. This means that if you are using the server for 4 hours only, then delete it, you will pay 0.02$. This is a very good deal.data coding digital ocean choose a size
  3. “Add block storage” : You don’t need this.
  4. “Choose a datacenter region”: Choose the one, that is the closest to you. Eg. if you are in San Diego, choose San Francisco and if you are in India, choose Bangalore. I’ll choose Frankfurt as I’m in Stockholm at the moment.data coding digital ocean region
  5. “Select additional options” : You don’t need this.
  6. “Add your SSH keys”: You don’t need this.
  7. “Finalize and create”:
    “How many Droplets”:
    1
    “Choose a hostname”: You can add here anything. I chose “data36-learn-datascience”data coding digital ocean finalize
  8. Click “Create”.
    Your server will be ready in ~60 secs.data coding digital ocean creating server

CONGRATS! You have your first remote server, where you can practice data coding.
(Note: you can anytime destroy this server by clicking “Destroy”.)

Step 2: Access your remote server!

Now it’s time to login to your remote server. When you’ve created your server, you will receive an email from Digital Ocean. It will look something like this:

data coding email from digital ocean

(Note: I removed my password. Yours won’t be **********, but numbers and characters. Also the IP Address you see here is fake, so don’t try to use it! ;-))

Make sure you save this email, because you will use these information in the future (especially the IP Address, you’ve got.)

Depending on which operating system you use on your computer, you can access your server different ways.

For Mac/Linux Users:
Open “Terminal” (on Mac I suggest to download iTerm2  and use that instead of Terminal).

Type:
ssh [Username]@[IP Address]

[Username] is the username from the email, in this case: root
[IP Address] is the IP Address from the email, you’ve got.

In my case I will type:
ssh root@46.101.128.25

Hit enter and you are in…
(The next paragraph is important for Windows users only. You can skip it and scroll down to “Both Windows/Mac”!)

For Windows Users:
First download and install a program called PuTTy from here.

If you open Putty, you need to add the details (from the email you’ve got) on this window:

data coding putty

Host Name (or IP address): the IP Address from the email (eg. 46.101.128.25 in my case)
Port: 22
Connection type: SSH

Click “Open” and you are in. It will ask for your username (“login as:”). You can find this in the email as well. Type: root .

Both Windows/Mac (oh and Linux of course):
Nice, you SSH-d (logged in) into your remote Ubuntu server. From this point, when you are on the terminal window (until you are connected to your remote server in your terminal window) you are going to be using Ubuntu 14.04. It also means that any changes you make here, won’t affect your personal computer!

Let’s finalize things, before we start with setting up your data infrastructure and start data coding!

If everything’s correct, the server asks some question like:
Are you sure you want to continue connecting (yes/no)?
Type yes, hit enter.

Then it will ask for your password. Copy-paste it from the email and hit enter. (If this is your first time on the command line, you might find it weird, that the stars don’t appear on the screen when you type your password, but this is how it is on Ubuntu. Even if you don’t see any characters typed in – don’t worry – it’s typed in.)

Then it will give back some messages to you and ask you to change your password. First, type (copy-paste) the old password again, then type the new one (whatever you want).

data coding login to the data server

And done! You can start to install the data coding tools!

(Note: if you have used so far graphical user interfaces only (eg. SPSS, Excel, Rapidminer, etc.), I know, this command line stuff could be intimidating. But believe me, once you’ve played around for 30 mins on this interface, you will find it fun! ;-))

Step 3: Install Bash!

Or wait… Great news! Bash is already set up as it’s the built in language of Ubuntu 14.04. (Again: sometimes it’s referred as “the command line”.)

I’ll get back to bash/command line in the next data coding tutorial articles and videos, but for now it’s enough if you know, that you really need to care about learning this language, because:

  1. You will use bash for every basic server operation – like moving files between folders, creating/deleting files, automating data scripts, installing new programs (eg. R or SQL!), giving permissions to users, etc.
  2. You will find it a powerful data tool as well. (Actually bash became my favorite data coding tool recently.)

For now, execute your first command. The “Hello, World!”:
echo 'Hello, World!'

You will have “Hello, World!” printed on your screen.

data coding bash hello world

Don’t ask, why we did that. This is a nice habit of developers, so we did it too, but let’s move forward quickly and execute our second, more important command. We are going to create a new user:

adduser [newusername]
You can add anything to the [newusername] part. I’ll add “dataguy”. Like this:

adduser dataguy
If you hit enter, you will have some text on your terminal screen, then you need to add a new password for this user, some more text, then the name (your name preferably) and you can leave the rest empty.

data coding terminal create new user

What happened here is that we have created a new user.
This was needed for further steps. So far your username was “root” – and by default “root” user is denied to do a few important stuff, that we want to do.

Let’s execute one more command to give the right privileges to your new user:

usermod -aG sudo dataguy (obviously: don’t forget to replace “dataguy” with your new username of course)

From now on we won’t use root, we will use the new user, you’ve created. So let’s logout from root user:

exit
This command will close the connection between your computer and your remote server. Log back with your new username! Do everything the same way as it has been described in “Step 2” above, but change root to your new username (in my case “dataguy”) and to your new password. As I’m on Mac I’ll type this – for instance:

ssh dataguy@46.101.128.25

Now you are logged in as a normal user. And you can continue with setting up Python.

Step 4: Install Python and Jupyter!

More great news! Python is already installed on Ubuntu 14.04 too! You can try the Python-way of “Hello, World!” very easily. Type into the command line:

python
This will start Python. (While you are on Python, you can’t use Bash codes.)
Now type:

print 'Hello, World!'

data coding python hello world

Notice that you get the same effect as it was with the “echo” command on bash. “Print” and “echo” are pretty much the same, but “Print” will work on Python and “echo” will work on Bash.

Anyway. Type:
exit()
This will stop Python and you will be back to Bash.

To use Python more efficiently in the future, you’ll need to install some add-ons.
The easiest way to install things in the command line is using apt-get install, then the name of the add-on, that you want to install. If the add-on exists, apt-get will find and install it. Unfortunately the version of apt-get on your server is not the most recent one, so as a first step update it with this command:

sudo apt-get update
(Note: sudo is an extra addition that let’s bash know, that your user has the privileges to do installations.)
It will ask for your password! Remember: it is not the one from the email anymore, but the one you set, when you’ve created the new user! Anytime when it asks for password, just type that one.

Now, that you have the latest version of apt-get, give it a try and type:
sudo apt-get install mc
(If it asks if continue, just say yes.)
Mc – that you have just installed – is an advanced text editor. We will use it soon.

Next 2 steps (one by one):
sudo apt-get -y install python-pip
sudo apt-get -y install python-dev
Again if it asks for your password, type it – if it’s asks if continue, say yes.
These commands installed pip and python-dev on your server, which will help you to download python specific packages.

Then type:
sudo pip install --upgrade pip
This command upgraded pip3 to the latest version of it.

Let’s install Jupyter:
sudo -H pip install jupyter
You have installed the coolest Python package: Jupyter. This is a tool that helps you to create easy-to-use notebooks from your Python codes. Why is it so awesome? I promise I’ll show you in my upcoming data coding tutorial articles and videos, but for now, let’s just configure and try it:

jupyter notebook --generate-config
This will create a config file for jupyter on your server.

echo "c.NotebookApp.ip = '*'" >> /home/[your_username]/.jupyter/jupyter_notebook_config.py
(Note: this is one line of code! Only your browser breaks it into 2 lines!)
This will add one line to the newly created config file, that will make you able to use your jupyter notebook from a browser window (like Chrome or Firefox).

Now you can go ahead and use Jupyter by typing:
jupyter notebook --browser any
This command will start to run the Jupyter application on your remote server. While it’s running in Terminal, you should just open a browser and type to the address bar [IP Address of your remote server from the email]:8888

So in my case I open in Google Chrome the:
46.101.128.25:8888 “website”. Well, it’s not a real website. It connects me to the interface of my Jupyter notebook.

data coding jupyter token

On this screen you need to type a “password” or a “token” first. As we haven’t generated any password, you need to use the token, that you can easily find, if you go back to your terminal window. Here:

data coding jupyter token command line

If you manage to copy-paste your token, you will be logged into your Jupyter Notebook. And you can create your first Python Notebook on top right corner: “New” –» “Python 2”

data coding jupyter notebook

On this surface you can try again the “Hello, World!” command. Once you have typed it, you can execute this command by hit SHIFT+ENTER.

data coding jupyter notebook hello world

And done! Now you can use Python + Jupyter any time.

Note1: when you are done, don’t forget to shut down Jupyter in your terminal by hitting CTRL+C. If you want to use Jupyter again in the future do the same what we’ve did above: type jupyter notebook --browser any and open a browser…
Note2: this setup is not the most data-secure version of using Jupyter, so I’d suggest not to use any confidential data for now. (Later I’ll cover the security settings.)

Step 5: Install SQL and pgadmin4!

To continue you should be on Bash. You will know it, if you check the beginning of the line in your Terminal window. If you are really on bash, it will look something like this (not necessarily green, it can be white or gray as well):

data coding bash commandline

If you are not, just double-check if you haven’t missed anything above… Or just simply hit CTRL+C several times (that’s the hotkey to skip every running process on your terminal screen).

(If somehow accidentally you are still in Python, you will see “>>>” at the beginning of the line. If it’s so, hit CTRL + D.)

When you are back to Bash, you can set up postgreSQL fairly quickly by a similar apt-get command we’ve used before:
sudo apt-get install postgresql postgresql-contrib
(If it asks for your password, type it – if it asks if continue, say yes!)
Done! You have postgreSQL just like that. Let’s try to access it!

When you’ve installed SQL, it generated an SQL-super-user called “postgres”. Right now this is the only user, who can access your freshly created SQL database. The good thing is, that you can sign in to this superuser’s account with this command:

sudo -i -u postgres

Notice the small change on the command line:

data coding command line prosgres

The superuser will be able to access SQL with this command (type it):

psql

You are in! You can type SQL commands!
This first one will generate a new user. With that you will be able to access your database in the future with your normal user too (which is the preferred way).
CREATE USER [your_user_name] WITH PASSWORD '[your_preferred_password]';

In my case:
CREATE USER dataguy WITH PASSWORD '[the_same_password_i_used_so_far]';

Exit from postgreSQL and go back to bash! Type:
\q
(this is the exit command in postgres.)
Then you have to log out from the superuser as well and go back to your normal user! Type:
exit

data coding psql exit

Now you can login with your normal user to your SQL database with this command:
psql -U dataguy -d postgres

Great! You are back to SQL again! Let’s do some data coding and test SQL queries:
CREATE TABLE test(column1 TEXT, column2 INT);
INSERT INTO test VALUES ('Hello', 111);
INSERT INTO test VALUES ('World', 222);
SELECT * FROM test;

The first line generates a new table called “test”. The 2nd and the 3rd fill some values in it. The 4th print all the values to the screen from “test” table!

I’ll also get back to the usage of SQL later!

Exit from postgreSQL again:
\q

It’s time to set up pgadmin! This is a desktop application for postgreSQL, that you can use to access your SQL database from your personal computer (without connecting to your remote server in terminal) and write queries much easier. You will find this program very useful, when you’ll start writing complex queries.

As a first step – make your remote server ready to connect by typing these 5 lines of code (copy-paste it one by one):

sudo -i -u root

echo "listen_addresses = '*'" >> /etc/postgresql/*/main/postgresql.conf
(Note: this is one line of code! Only your browser breaks it into 2 lines!)

echo 'host all all 0.0.0.0/0 md5' >> /etc/postgresql/*/main/pg_hba.conf
(Note: this is one line of code! Only your browser breaks it into 2 lines!)

sudo /etc/init.d/postgresql restart

exit

What you are doing here is to login to the root user and make some modification in the config files of postgreSQL. (Remember: as you are on your remote server, the changes you make won’t affect your personal computer!)

Then download pgadmin4 from here: pgadmin4.
Select your OS, then download, install and run it!

Once you are done, you will see this screen:

data coding pgadmin add server

Click the Add New Server Icon!

And fill the popup:

data coding pgadmin connect

“GENERAL”:

  • “Name”: anything you want (eg. “Data36 Test Server“)

“CONNECTION”:

  • “Host name/address”: your remote server’s IP Address (in my case: 46.101.128.25)
  • “Port”: 5432
  • “Maintenance database”: postgres
  • “User name”: [your_user_name]
  • “Password”: your recently generated SQL password for your user

Click save and BOOM! You are connected to your database!

data coding pgadmin browse

At the first sight it’s not really straightforward, but you can discover the different folders on the left side. If you right click on the name of the server (on my screenshot: “Data36 Test Server”), you can disconnect from your server. Or connect the same way, when you want to get back.
Also if you left click on one of your databases (on my screenshot: “postgres”), then you select on the top menu “Tools” –» “Query tool”, you will be able to run SQL queries (execute with the little Flash Icon):

data coding pgadmin sql queries

Notice that on my screenshot you can see the very same result, that we got in the Terminal SQL! 🙂

Yay, you have SQL!
Only one small step left…

Step 6: Install R and RStudio!

R is the easiest tool to set up! That’s why I left it to the end.
First use apt-get again to install R:
sudo apt-get install r-base-core
(If it asks for your password, type it – if it asks if continue, say yes!)

Now you have R. You can test the “Hello, World!” here as well! Start R first:
R

Then type:
print ("Hello, World!");

data coding r hello world

The syntax is a bit different, than it was on Python and much more different than it was in Bash.
You can exit from R:
quit()
Save workspace image? [y/n/c] —» Say: n

We have an application for R as well to make your data coding easier. It’s called the RStudio and you can set it up by these 4 lines of commands (copy-paste it one by one)!

sudo apt-get install gdebi-core

wget https://download2.rstudio.org/rstudio-server-1.0.136-amd64.deb

sudo gdebi rstudio-server-1.0.136-amd64.deb

sudo restart rstudio-server

Then just go to your browser and type [your IP Address] and port 8787. In my case:

http://46.101.128.25:8787

You can login with your username (eg. dataguy) and password. (The same, you were using to access your remote server so far.) And try “Hello, World!” here as well.

data coding rstudio hello world

You have R and RStudio too! Congrats!

CONCLUSION

Nice job there!
You have created your own remote data server and you have Bash, Python, SQL and R on it! This is a fantastic first step to become a Data Scientist!
As I’ve mentioned several times during this article, I’ll help you to learn and use these languages in the upcoming data coding tutorial videos and articles on data36.com. We will start from the very basics, I promise!

If you want to be notified first about new content on Data36 (like articles, videos, handbooks, etc.), sign up for my Newsletter!

************
UPDATE: If you don’t feel the confidence to go through on this whole article step-by-step, I have good news! Based on reader requests I developed a solution, where all the steps have been already done by me.
I built a data infrastructure – called the Data36 Learn server -, where you can simply log in and use Python (+ Jupyter), postgreSQL (+ pgAdmin), bash (+ mcedit) and R (+ RStudio). This also means, that you can leave the hassle of the setup process and jump instantly where you want: practice data coding.
Note: as I pay for the server, I’ll charge for this a monthly fee too.
More info here: Start with Python, SQL, R and bash in 1 minute!
************

Cheers,
Tomi Mester

Sources and further reads

Bash/Command Line:
Data Science At The Command Line: http://datascienceatthecommandline.com

R:
https://support.rstudio.com/hc/en-us/articles/200552306-Getting-Started

SQL:
https://www.cyberciti.biz/faq/howto-add-postgresql-user-account/
https://help.ubuntu.com/community/PostgreSQL
http://stackoverflow.com/questions/1287067/unable-to-connect-postgresql-to-remote-database-using-pgadmin

Python:
http://jupyter.readthedocs.io/en/latest/install.html
https://www.digitalocean.com/community/tutorials/how-to-set-up-a-jupyter-notebook-to-run-ipython-on-ubuntu-16-04

Share:Tweet about this on Twitter88Share on LinkedIn229Share on Facebook4

← Previous post

Next post →

10 Comments

  1. Nice summary, though I think you should really put more emphasis on the security aspects (eg. disabling root access, setting up ufw, etc. but I’m not an expert at all). Nevertheless it gave me ideas how to move forward with these tools (been playing around with Python for only 6 months).

    Few questions/ideas:

    – Is there any particular reason why you use Python 2.7 and not 3.x?
    – What do you think of using MongoDB in a similar setup instead of SQL? (maybe noSQL with schemaless design fits better to ‘exploratory’ coding – ie. tinkering in a trial-and-error manner. Or is it too abstract from tabular data formats?)
    – Does this way of using Jupyter work from behind nginx? (I guess you have to point to the right port in the config, but I’m not an expert here either.)
    – Maybe Miniconda could make managing Python stuff even easier. I think it comes with Jupyter by default, and also the conda package manager, replacing pip. What do you think?
    – Also you might want to add some kind of a process manager to restart your dbase in case the server reboots for some reason (if it is an issue that your db stops)

    • Hey Andras,

      thanks, nice questions and suggestions!
      I’ll go ahead and answer them one by one!

      (Before that, note that in this article I tried to cover, how to setup a data server to practice with the least steps – so obviously this is a solution, that will need further optimizations performance- and security-wise too… Most of these questions wouldn’t have fit to this 101 article! 🙂

      1. Security:
      It’s definitely a thing. I’ll cover this topic in another article as this one became a little bit too long already! 🙂

      2. Is there any particular reason why you use Python 2.7 and not 3.x?
      No… I use Python 2.7, because most of the data scientists are using it. It means that all the commonly used data science packages (numpy, scikit, pandas) are supported on that version “better” (not just development, but learning materials). But I’ve friends who are using 3.x without any trouble.
      UPDATE: Also if you go to a workplace, there is a very high chance, that you have to use 2.7, not 3.x.

      3. What do you think of using MongoDB in a similar setup instead of SQL?
      I’m not using MongoDB, so I don’t have any educated opinion on that. However, I’ve heard some very bad things about security around that. Just the most recent one: https://www.bleepingcomputer.com/news/security/mongodb-apocalypse-is-here-as-ransom-attacks-hit-10-000-servers/

      4. Does this way of using Jupyter work from behind nginx?
      No.

      5. Maybe Miniconda could make managing Python stuff even easier.
      Unfortunately Miniconda doesn’t come with Jupyter (afaik at least), so I left it out for now, because it would have been an extra unnecessary step. But I’ll cover it later too!

      6. Also you might want to add some kind of a process manager to restart your dbase in case the server reboots for some reason.
      Good point! I forgot about that one, but I’ll definitely will expand the article with this section! 🙂

      Thanks for the suggestions again! I’ll will use them!
      Cheers,
      Tomi

      • Most people starting new projects are not in fact using 2.7. Metrics vary from here and there, but any of the (informal) polling done in /r/datascience have shown that hands down 3 is the standard. You’ll see a lot of metrics passed around saying downloads from pip show 2.7 is still more popular, but I suspect that these are skewed with automated dependencies, and also not including new distributions with new methods of install – like Anaconda.

        Which brings me to my next point – if you’re going to give people info on how to get started with Python, I’d suggest something like Anaconda. Installing binary dependencies can quickly become a nightmare without it, and since you’re using your sys python you’re also looking for trouble there. Anaconda makes using virtual (conda) environments a breeze – leaving the sys python to do what it’s meant to do.

        Also – for Andras. The goal with learning SQL for data science is because the majority of your data is likely to come from legacy systems, these are for the most part RDBMS, and not NoSQL. No doubt in the future the mix will tilt, but I’d focus on traditional SQL/csv data manipulations first, and then mix in some Mongo later. Also note – Mongo is better at OLTP (transactional) than OLAP (analytical), and often times with analysis you’re going to be working with OLAP (aggregates, crosstabs, etc).

        Just my $.02.

        • @Kevin D!
          Thank you for your 2 cents! Very useful opinions and suggestions.
          On Python 2.7 vs 3.x – I guess it’s still an opinion, there are many discussions on this out there. I prefer 2.7, so I’ll use 2.7 on my tutorials, so I added that in this article! 😉
          But 2 other reasons here:
          1. The majority of the workplaces started to use 2.7 and they are still doing it, so if you start to work somewhere, you can expect 2.7 more than 3.x.
          2. You can access learning materials easier in 2.7. (as far as I’ve experienced.)

          Anaconda-wise: yes, you might be right… now I’m considering to expand this article with Anaconda. (On the first hand I didn’t want because to keep it “simple”)

          And thank you for the comments on Mongo!

          Cheers,
          Tomi

      • Thanks! The mongo hacks are really nasty, though only instances with lousy security setup are affected I think (a trap a beginner could definitely fall into…). But I’m not a promoter of mongo. You need to know SQL anyway if you’re to work with data, your setup is just perfect for this.

        Jupyter is not included in Miniconda indeed, my bad. Anaconda does include it, but it felt like a huge overshoot for me when I tried it.

        Looking forward to the upcoming posts!

  2. Hey Tomi,

    Very important question: Am I still able to follow your guide if I use Ubuntu 16.04.1 x64 instead of 14.04.5 x64? I’ve noticed that the command ‘python’ didn’t install it for me, rather I had to sudo apt install python to have it install python2.7 for me, also saying ‘After this operation, 16.6 MB of additional disk space will be used.’ -> I thought I’m working in the cloud?

    Sorry for the noob questions. Really want to get started with Data Science, but not screw up anything with my personal device.

    Thank you for any help!

    • hey Lukas,

      (I assume in my answer that you set up the remote server as it was written in the article above – except the 16.04 vs 14.04 question…)

      If you use 16.04, it comes with python3.x (python 3.5.2 recently anyway)
      If you want to use python2.7 (the one that I suggested), you can set it up with the command you used:
      sudo apt install python

      And don’t worry about the:
      ‘After this operation, 16.6 MB of additional disk space will be used.’

      It will need disk space from your remote server (“cloud computer”).
      So just say “yes” and python will be installed on your remote server.

      Alternatively you can use Python3 – without installing anything – with this command:
      python3
      But note that there will be minor differences in the usage compared to Python2.7… Not very big things, but if you want to follow my tutorials in the future (or most of the online tutorials around Python, that’s available online), then it will be easier, if you go with python2.7.

      Also just a small comment, I’m using Ubuntu 14.04 in my tutorial for a reason. I experienced that Ubuntu releases are not always super stable in their first years (bugs, compatibility issues)… So Ubuntu14.04 is something, that I trust, when I set up my data coding tools on that. But this is kind of a question of taste! 🙂

      Hope, I answered all your questions and good luck on your path to become a Data Scientist!!

      Tomi

  3. Tomi

    Thanks for getting me started. I have followed the steps you described and read some of the references, and now I’m ready for the follow up lesson. : – )

Leave a Reply