How to Install Python, SQL, R and Bash (for non-devs)

Last updated on January 12, 2021

This is the ultimate step by step guide to installing Python, SQL, R and Bash and starting to learn coding for data science. It takes no more than 60 minutes and it’s 100% understandable for non-developers, too!

If you want to be a data scientist, a data analyst or just simply want to analyze the data of your business, sooner or later you will have to learn coding.

coding for data science languages python r sql bash
source: KDNuggets

There are 4 important languages that you should know:

  • SQL
  • Python
  • R
  • Bash (sometimes referred as “the command line”)

Here, on the Data36 blog, I provide many articles about coding for data science. However, as a foundation of every further conversation on this topic, you need to have Python, R, SQL and bash on your computer. Once you have them set up, you will be able to practice by yourself (and later build your own data projects too), as well as follow, practice and learn via my tutorial articles and videos.

In this article I’ll take you through getting these data tools step by step. At the end of the article you will have your own fully functioning data infrastructure with:

  • bash (and mcedit),
  • postgreSQL (and pgAdmin4),
  • Python 3 (and Jupyter) and
  • R (and RStudio)

Good news:

  1. I will show you the exact same tools that are used in real life data science projects.
  2. All these tools are completely free!

Yes, the funny thing is that most of the super-scientific stuff that you can read about in different data science articles are made using open-source data tools. How cool is that?

Anyway: first things first – let’s go and install Python, SQL, R and bash on your computer!

Note 1: The article was last updated on 30 October 2020.

Note 2: This article is more enjoyable if you do the steps and the coding part with me. If you are reading this on your mobile, I suggest saving the article for later (e.g. send it to yourself in e-mail) and coming back here when you have ~60 minutes and are on your Desktop computer or notebook.

Note 3: To have your data infrastructure correctly set up, you simply need to follow my instructions down here step by step — most of the time just copy-paste my code. So don’t be afraid to work on Terminal and writing code. It’s easy, even if you are not a programmer or data scientist (yet). The article is long, though, because I tried to explain each step as much as I could.

The Operating System

I use all my data tools (Python, R, SQL) on Ubuntu – which is a Linux operating system – and I suggest you do the same. My personal computer is a Mac, but you can have a PC too. In this case it doesn’t really matter, because we won’t install Ubuntu on our computer, but access it via the internet.

What we will do here is to connect to a remote server – type commands and make the remote server do the data analyses instead of our local computer.

coding for data science remote server simplified

(Note that you can set up Ubuntu on your personal/work computer too, if you really want to. But this is something we usually don’t do in real life, because with that solution we would limit our data processes to our computer’s capacity. Also we would lose some cool features.)

If you use a remote server for data analysis, you will be able to:

  • Access your data infrastructure from any computer with a login-name and a password (even if you lose or break your personal notebook). Don’t worry, nobody else will able to access your data on your remote server – it is completely private.
  • Automate your data scripts (e.g. make them run every 3 hours, even if you turn off your notebook).
  • Scale your computing power. You won’t be limited to your own computer’s capacity. Renting a few more processors or more memory is just one click away if you are using a remote server.
  • Use Ubuntu without installing it on your computer.

The only downside of going with a remote server is that it costs money. Fortunately these prices are very low (starting at $5 per month).

I’ve been creating all my coding for data science tutorials (videos and articles) using the exact same data stack that I’ll describe in this article. So if you want to follow me, it will be much easier for you if you go through this article step by step and set up everything like I do. To make everything work properly, please make sure that you don’t miss any of the steps and you do them in the same order as it’s written here! The most important parts of course are the code snippets. All the snippets are marked:

 like this

or

like this

Step 1: Get your remote server!

The next step is to find a hosting service to create your first remote server. I used many services and so far I’ve found DigitalOcean to be the best. You can rent here a server for $5/month (this will be perfect for us for now).

First, go to their website and create an account: DigitalOcean.com
Disclaimer: the link above is a special invitation link – if you use that, you’ll get $100 free credit for 60 days (and I’ll get $25 free credit). If you don’t want to use my link, you can simply click here instead. Note that in this case you won’t get the $100 credit.

You will land here:

coding for data science register to digital ocean

Register with your email address and you will get a confirmation email in your inbox. Confirm and you will see a screen where you can add your credit/debit card details or use PayPal. (For security reasons I always use PayPal.)

If you are done with that, you are just one step away from creating your first remote server. You will see the “Droplets” screen. Click “Create Droplet” (big green button, top right corner).

And you will end up here:

Make sure you are using these settings:

  1. “Choose an image:” Distributions: Ubuntu 18.04 x64data server on ubuntu 18 04
  2. “Choose a size:” Standard: $5/month
    This will be more than enough for now. If needed, you will be able to scale it up in the future. As you can see you’ll pay on hourly basis. This means that if you are using the server for 4 hours only, then delete it, you will pay $0.02. This is a very good deal.
    install python sql bash digital ocean choose a plan and a droplet size
  3. “Add block storage:” You don’t need this.
  4. “Choose a datacenter region:” Choose the one that is the closest to you. E.g. if you are in San Diego, choose San Francisco and if you are in India, choose Bangalore. I’ll choose Frankfurt as I’m in Stockholm at the moment.install python r sql bash digital ocean region
  5. “Select additional options:”  You don’t need this.
  6. “VPC Network”: Leave it as is.
  7. “Authentication:” One-time password.
    Here, you’ll have to create your own root password. You’ll see something like this:
    server setup passwordThe password requirement is pretty strict… Make sure you choose something that’s: At least 8 characters long — contains 1 uppercase (first and last characters don’t count) — contains 1 number — doesn’t end in a number or special character — as I said pretty strict.

    server setup new password
    Also, make sure you remember this password because you’ll need it soon!
  8. “Finalize and create:”
    “How many Droplets:”
     1
    “Choose a hostname:” You can use anything. I chose “data36-learn-data-science.”data coding digital ocean finalize
  9. “Add tags:”  You don’t need this.
  10. “Select project:” You don’t need this, either. (Go with the default.)
  11. “Add backups:” this is up to you – for my tutorials, you won’t need backups but on the long-term in real projects I recommend to use it. For now, you can just skip it.
  12. Click “Create.”
    Your server will be ready in ~60 secs.data coding digital ocean creating server

CONGRATS! You have your first remote server, where you can install Python, R, SQL and bash and then practice coding for data science.
(Note: you can destroy this server at any time by clicking “Destroy.”)

Step 2: Access your remote server!

It’s time to login to your freshly created remote data server.

You’ll need 2 pieces of important information for the login process.

  • The IP address of your droplet and
  • your root password.

The IP address can be found on the DigitalOcean website, where you have created your droplet. It’s right here:

ip address of digitalocean droplet old

Find yours on your DigitalOcean Droplets tab.

You’ll also need your root password… This is the one that you created when you have created your droplet.

server setup password

Depending on which operating system you use on your computer, you can access your server in different ways.

For Mac/Linux Users:

Open Terminal (on Mac I suggest downloading iTerm2  and using that instead of Terminal).

Type:

ssh [Username]@[IP Address]

In my case I will type:

ssh root@46.101.128.25

Hit enter and you are in…
(The next paragraph is important for Windows users only. You can skip it and scroll down to “Both Windows/Mac”!)

For Windows Users:

First download and install a program called PuTTy from here.

If you open Putty, you need to add the details (from the email) on this window:

data coding putty
  • Host Name (or IP address): the IP Address from the DigitalOcean droplet tab I showed you above (eg. 46.101.128.25 in my case)
  • Port: 22
  • Connection type: SSH

Click “Open” and you are in. It will ask for your:

  • username (“login as:”). Type: root
    (putting it simply, root is some sort of master user for your data server)
    Hit Enter…

Both Windows/Mac (oh and Linux of course):

Nice, you ssh-d (logged in) into your remote Ubuntu server. From this point, when you are on the terminal window, until you are disconnected to your remote server, you are going to be using Ubuntu 18.04. It also means that any changes you make here won’t affect your personal computer!

Let’s finalize things before we start setting up your data infrastructure!

If everything’s correct, the server asks some questions like:
Are you sure you want to continue connecting (yes/no)?
Type yes, hit enter.

Then it will ask for your password. Type the one that you provided when you have created your droplet, like 2 minutes ago… (Remember that I said that you’d need it soon?)

server setup password

IMPORTANT! If this is your first time on the command line, you might find it weird that the stars (*) don’t appear on the screen when you type your password, but this is how it is on Ubuntu. Even if you don’t see any characters typed in – don’t worry – it’s typed in!

data coding login to the data server

And done! You can start to install the data tools!

(Note: if you have so far used only graphical user interfaces (e.g. SPSS, Excel, Rapidminer, etc.), I know, this command line thingy could be intimidating. But believe me, once you’ve played around for 30 mins on this interface, you will find it fun! ;-))

Step 3: Install Bash!

Or wait… Great news! Bash is already set up, since it’s the built-in language of Ubuntu 18.04. (Again: sometimes it’s referred as “the command line.”)

I’ll get back to bash/command line in the next data coding tutorial articles and videos, but for now it’s enough if you know that you really need to care about learning this language because:

  1. You will use bash for every basic server operation – like moving files between folders, creating/deleting files, installing new programs (eg. Python, R or SQL, too), giving permissions to users, etc.
  2. It’s great for creating automations.
  3. It can be used as the “glue” between other data languages. (eg. moving something from SQL to Python, then from Python to R.)

For now, execute your first command: the “Hello, World!”

echo 'Hello, World!'

You will have Hello, World! printed on your screen.

data coding bash hello world

Don’t ask why we did that. This is a nice habit of programmers, so we did it too, but let’s move forward quickly and execute our second, more important command. We are going to create a new user:

adduser [newusername]

You can type anything for the [newusername] part. Whatever you put there, that will by your username. I’ll type dataguy. Like this:

adduser dataguy

If you hit enter, you will have some text on your terminal screen, then you need to add a new password for this user, some more text, then the name (your name preferably) and you can leave the rest empty.

data coding terminal create new user

You have just created a new user! Great!

This was needed for further steps: so far your username was root – and by default root user is not allowed to do a few important installation steps that we want to do.

Let’s execute one more command to give the right privileges to your new user:

usermod -aG sudo dataguy

(Obviously: don’t forget to replace dataguy with your new username)

From now on, we won’t use root user, we will use the new user you’ve created. So let’s logout from root user:

exit

This command will close the connection between your computer and your remote server. Log back in with your new username! Do everything as described in Step 2 above, but change root to your new username (in my case dataguy) and to your new password. As I’m on Mac I’ll type this – for instance:

ssh dataguy@46.101.128.25

Now you are logged in as a normal user.

And you can continue by setting up Python3.

Step 4: Install Python 3 and Jupyter!

Note: previously this article was written for Python 2 – but I have decided to upgrade it to Python 3. Python 2 won’t be supported after 2020. And Python 3 has been around since 2008. So if you are new to Python, it is definitely worth much more to learn the new Python 3 and not the old Python 2.

More great news! Python is already installed on Ubuntu 18.04 too! You can try the Python-way of “Hello, World!” very easily. Type into the command line:

python3

This will start Python. (While you are on Python, you can’t use Bash codes.)
Now type:

print('Hello, World!')
python3 hello world example

Notice that you get the same effect as it was with the echo command on bash. print and echo are pretty much the same, but print will work on Python and echo will work on Bash.

Anyway. Type:

exit()

This will stop Python and you will be back to Bash.

To use Python more efficiently in the future, you’ll need to install some add-ons.

The easiest way to install things in the command line is using the apt-get application’s install feature. You only have to type apt-get install, then the name of the add-on that you want to install. If the add-on exists, apt-get will find and install it. Unfortunately the version of apt-get on your server is not the most recent one, so as a first step, update it with this command:

sudo apt-get update

(Note: sudo is an additional keyword before your apt-get command that lets bash know that your user has the privileges to do installations.)

The command line will ask for your password! Remember: it is not necessarily the same as it was for your root user, but the one you set when you created the new user! Anytime it asks for your password, just type that one.

Now that you have the latest version of apt-get, give it a try and type:

sudo apt-get install mc

(If it asks whether to continue, just say yes.)

mc – that you have just installed – is a text editor for coding. We will use it soon.

Next 2 steps (one by one):

sudo apt-get -y install python3-pip
sudo apt-get -y install python3-dev

Again if it asks for your password, type it – if it’s asks if continue, say yes.
These commands installed pip and python-dev on your server, which will help you to download Python-specific packages.

Then type:

sudo -H pip3 install --upgrade pip

This command upgraded pip to the latest version.

Install Jupyter!

Let’s install Jupyter:

sudo -H pip3 install jupyter

You have installed one of the coolest Python packages: Jupyter. This is a tool that helps you to create easy-to-use notebooks from your Python code. Why is it so awesome? In my Python for Data Science tutorial articles and videos I tell more about it, but for now, let’s just configure and try it:

jupyter notebook --generate-config

This will create a config file for Jupyter on your server.

echo "c.NotebookApp.ip = '*'" >> /home/[your_username]/.jupyter/jupyter_notebook_config.py

(Note: this is one line of code! Only your browser breaks it into more lines! And of course, the [your_username] part should be replaced by the actual bash username that you’ve created earlier, for me it was dataguy.)

echo "c.NotebookApp.allow_remote_access = True" >> /home/[your_username]/.jupyter/jupyter_notebook_config.py

(Note: this is one line of code! Only your browser breaks it into more lines! And of course, the [your_username] part should be replaced by the actual bash username that you’ve created earlier, for me it was dataguy.)

These will add two lines to the freshly created config file that will make you able to use your Jupyter notebook from a browser window (like Chrome or Firefox).

Now you can go ahead and try Jupyter out by typing:

jupyter notebook --browser any

This command will start to run the Jupyter application on your remote server. While it’s running in Terminal, you should just open a browser and type this in the address bar:

[IP Address of your remote server from the email]:8888

So in my case I’ll type this to my Google Chrome’s address bar:
46.101.128.25:8888.

Well, it’s not a website but it looks like one. In reality, it just connects me to the interface of my Jupyter notebook that runs on my server.

data coding jupyter token

On this screen you need to type a password or a token first. As we haven’t generated any password (yet), you need to use the token, which you can easily find if you go back to your terminal window. Here:

data coding jupyter token command line

Note: The /?token= part should not be copied — only that part is needed that I annotated with green above.

If you manage to copy-paste your token, you will be logged into your Jupyter Notebook. And you can create your first Python Notebook on top right corner: “New” –» “Python 3

jupyter notebook create a new python3 notebook

On this surface you can try again printing the “Hello, World!” string. Once you have typed it, you can execute this command by hitting SHIFT+ENTER.

jupyter notebook hello world python3

And done! You have installed Python 3 and Jupyter Notebook — and you can come back and use them any time. Nice job!

Note 1: when you are done, don’t forget to shut down Jupyter in your Terminal window by hitting CTRL+C. If you want to use Jupyter again in the future do the same as above: type jupyter notebook --browser any and open a browser…

Note 2: this setup is not the most secure version of using Jupyter, so I’d suggest not  using any confidential data for now. (In another article, I’ll cover the security settings.)

Step 5: Install SQL and pgadmin4!

To continue with installing SQL, you should be back to Bash and the command line. You will know for sure, if you check the beginning of the line in your Terminal window (which is called the prompt, by the way). If you are on bash, it will look something like this (not necessarily green, it can be white or gray as well):

data coding bash commandline

If you are not, just double-check that you haven’t missed anything above… Or just hit CTRL+C several times (that’s the hotkey to skip every running process on your terminal screen).

(If somehow accidentally you are still in Python, you will see >>> at the beginning of the line. If so, hit CTRL + D.)

When you are back to Bash, you can set up postgreSQL fairly quickly using a similar apt-get command as before:

sudo apt-get install postgresql postgresql-contrib

(If it asks for your password, type it – if it asks whether to continue, say yes!)
Done! You have postgreSQL just like that. Let’s try to access it!

When you installed SQL, it automatically generated one more user on your server called postgres. This new user is an SQL superuser. And right now this is the only user who can access your freshly created SQL database. The good thing is that you can sign in to this superuser’s account with this command:

sudo -i -u postgres

Notice the small change on the prompt in the command line:

data coding command line prosgres

The superuser will be able to access SQL with this simple command (type it):

psql

You are in! You can run SQL commands and queries!

First thing first, let’s generate a new user, so you can access your database in the future with your normal user too (which is the preferred way).

CREATE USER [your_user_name] WITH PASSWORD '[your_preferred_password]';

It’s really important that you replace [your_user_name] with the very same new user name that you’ve used in bash. So in my case it will be:

CREATE USER dataguy WITH PASSWORD 'a_password_that_I_wont_put_into_this_article';

Since you are here, I recommend running one more command (you won’t need it now but it’ll become handy when you start with my SQL tutorial articles):

ALTER USER [your_user_name] WITH SUPERUSER;

Obviously, change [your_user_name] again with your user name.

For me it was:

ALTER USER dataguy WITH SUPERUSER;

This line will give your user SQL super user privileges – which will become handy when creating new SQL tables, etc. Don’t worry about it too much yet. It’s good that we have done it though.

Anyways, done! You can exit from postgreSQL and go back to bash! Type:

\q

(this is the exit command in postgres.)

Then you have to log out from the superuser as well and go back to your normal user! Type:

exit
data coding psql exit

Now, let’s login with your normal user to your SQL database with this command:

psql -U dataguy -d postgres

Great! You are back to SQL again! Let’s try it out and run these few SQL commands and queries:

CREATE TABLE test(column1 TEXT, column2 INT);
INSERT INTO test VALUES ('Hello', 111);
INSERT INTO test VALUES ('World', 222);
SELECT * FROM test;

The first line creates a new table called test. The 2nd and the 3rd load some test data in it. The 4th prints all the values to the screen from the test table!

You can learn about SQL from my SQL for Data Analysis tutorial series!

Exit from postgreSQL again:

\q

Install pgadmin4!

It’s time to install pgadmin! This is a desktop application for postgreSQL that you can use to access your SQL database from your personal computer (without connecting to your remote server in terminal) and write queries much more easily and efficiently. You will find this little SQL manager tool very useful when you start writing complex queries.

As a first step, make your remote server ready to connect by typing these 5 lines of code (copy-paste it one by one):

sudo -i -u root
echo "listen_addresses = '*'" >> /etc/postgresql/*/main/postgresql.conf

(Note: this is one line of code! Only your browser breaks it into 2 lines!)

echo 'host all all 0.0.0.0/0 md5' >> /etc/postgresql/*/main/pg_hba.conf

(Note: this is one line of code! Only your browser breaks it into 2 lines!)

sudo /etc/init.d/postgresql restart
exit

What you are doing here is logging in to the root user and making some modifications in the config files of postgreSQL. (Remember: as you are on your remote server, the changes you make won’t affect your personal computer!)

Then download pgadmin4 from here: pgadmin4.

Select your OS, then download, install and run it!

Once you are done, you will see something like this screen:

data coding pgadmin add server

Click the Add New Server Icon!

And fill in the popup:

data coding pgadmin connect

“GENERAL:”

  • “Name:” anything you want (eg. “Data36 Test Server“)

“CONNECTION:”

  • “Host name/address:” your remote server’s IP Address (in my case: 46.101.128.25)
  • “Port:” 5432
  • “Maintenance database:” postgres
  • “User name:” [your_user_name]
  • “Password:” your recently generated SQL password for your user

Click save and BOOM! You are connected to your database!

data coding pgadmin browse

At first sight, it’s not really straightforward, but you can discover the different folders on the left side. If you right click on the name of the server (on my screenshot: “Data36 Test Server”), you can disconnect from your server. And you can connect the same way, when you want to get back.
Also if you left click on one of your databases (on my screenshot: “postgres”), then you select on the top menu “Tools” –» “Query tool,” you will be able to run SQL queries (execute with the little Flash Icon):

data coding pgadmin sql queries

Notice that on my screenshot you can see the very same result that we got in the Terminal SQL!

Yay, you have SQL!

Now, at this point let me mention, that pgadmin4 is nice and free and built be the awesome postgreSQL team… but… to be honest, I don’t really use it in real life, because it’s a bit unstable — and generally speaking, there are better SQL manager tools. Here’s my favorite free SQL manager (and how to install it): SQL Workbench. (Note: Save this link for later, and now just finish the article with R.)

Only one small step left…

Step 6: Install R and RStudio!

UPDATE in 2020: When I first published this article, I figured that I’ll write some R tutorials to this blog, too. But in 2020, I realized that if you learn Python for data science, you won’t really need R. Regardless, I leave this R installation tutorial here — but by all means, feel free to skip it! Here’s why:

R is the easiest tool to set up! That’s why I left it to the end.

First use apt-get again to install R:

sudo apt-get install r-base-core

(If it asks for your password, type it – if it asks whether to continue, say yes!)

Now you have R. You can test the “Hello, World!” here as well! Start R first:

R

Then type:

print ("Hello, World!");
data coding r hello world

The syntax is a bit different than it was on Python and much different than it was in Bash. You can exit from R:

quit()

Save workspace image? [y/n/c] —» Say: n

There is an application for R as well to make your coding life easier. It’s called RStudio and you can set it up using these 3 lines of code (copy-paste them one by one)!

sudo apt-get install gdebi-core
wget https://download2.rstudio.org/rstudio-server-1.1.463-amd64.deb

sudo gdebi rstudio-server-1.1.463-amd64.deb

Then just go to your browser and type [your IP Address] and port 8787. In my case:

46.101.128.25:8787

You can login with your username (e.g. dataguy) and password. (The same, you were using to access your remote server so far.) And try “Hello, World!” here as well.

data coding rstudio hello world

You have installed R and RStudio, too! Congrats!

CONCLUSION

Nice job there!

You have created your own remote data server and you have installed Python, SQL, R and bash on it! This is a fantastic first step for you towards becoming a Data Scientist!

As I’ve mentioned several times during this article, I’ve been creating quite a few articles to show you how to use Python, SQL and bash. All of these start from the very basics. Feel free to start with any of these you prefer:

  1. SQL for Data Analysis tutorial series
  2. Python for Data Science tutorial series
  3. Bash for Analytics tutorial series

MORE:

Cheers,
Tomi Mester

*shout-out to Johann for finding, reporting and solving the issue!

Sources and further reading

Bash/Command Line:
Data Science At The Command Line: http://datascienceatthecommandline.com

R:
https://support.rstudio.com/hc/en-us/articles/200552306-Getting-Started

SQL:
https://www.cyberciti.biz/faq/howto-add-postgresql-user-account/
https://help.ubuntu.com/community/PostgreSQL
http://stackoverflow.com/questions/1287067/unable-to-connect-postgresql-to-remote-database-using-pgadmin

Python:
http://jupyter.readthedocs.io/en/latest/install.html
https://www.digitalocean.com/community/tutorials/how-to-set-up-a-jupyter-notebook-to-run-ipython-on-ubuntu-16-04

← Previous post

Next post →

82 Comments

  1. Nice summary, though I think you should really put more emphasis on the security aspects (eg. disabling root access, setting up ufw, etc. but I’m not an expert at all). Nevertheless it gave me ideas how to move forward with these tools (been playing around with Python for only 6 months).

    Few questions/ideas:

    – Is there any particular reason why you use Python 2.7 and not 3.x?
    – What do you think of using MongoDB in a similar setup instead of SQL? (maybe noSQL with schemaless design fits better to ‘exploratory’ coding – ie. tinkering in a trial-and-error manner. Or is it too abstract from tabular data formats?)
    – Does this way of using Jupyter work from behind nginx? (I guess you have to point to the right port in the config, but I’m not an expert here either.)
    – Maybe Miniconda could make managing Python stuff even easier. I think it comes with Jupyter by default, and also the conda package manager, replacing pip. What do you think?
    – Also you might want to add some kind of a process manager to restart your dbase in case the server reboots for some reason (if it is an issue that your db stops)

    • Hey Andras,

      thanks, nice questions and suggestions!
      I’ll go ahead and answer them one by one!

      (Before that, note that in this article I tried to cover, how to setup a data server to practice with the least steps – so obviously this is a solution, that will need further optimizations performance- and security-wise too… Most of these questions wouldn’t have fit to this 101 article! 🙂

      1. Security:
      It’s definitely a thing. I’ll cover this topic in another article as this one became a little bit too long already! 🙂

      2. Is there any particular reason why you use Python 2.7 and not 3.x?
      UPDATE: I’ve eventually updated the whole article to Python 3.x.
      No… I use Python 2.7, because most of the data scientists are using it. It means that all the commonly used data science packages (numpy, scikit, pandas) are supported on that version “better” (not just development, but learning materials). But I’ve friends who are using 3.x without any trouble.
      UPDATE: Also if you go to a workplace, there is a very high chance, that you have to use 2.7, not 3.x.

      3. What do you think of using MongoDB in a similar setup instead of SQL?
      I’m not using MongoDB, so I don’t have any educated opinion on that. However, I’ve heard some very bad things about security around that. Just the most recent one: https://www.bleepingcomputer.com/news/security/mongodb-apocalypse-is-here-as-ransom-attacks-hit-10-000-servers/

      4. Does this way of using Jupyter work from behind nginx?
      No.

      5. Maybe Miniconda could make managing Python stuff even easier.
      Unfortunately Miniconda doesn’t come with Jupyter (afaik at least), so I left it out for now, because it would have been an extra unnecessary step. But I’ll cover it later too!

      6. Also you might want to add some kind of a process manager to restart your dbase in case the server reboots for some reason.
      Good point! I forgot about that one, but I’ll definitely will expand the article with this section! 🙂

      Thanks for the suggestions again! I’ll will use them!
      Cheers,
      Tomi

      • Most people starting new projects are not in fact using 2.7. Metrics vary from here and there, but any of the (informal) polling done in /r/datascience have shown that hands down 3 is the standard. You’ll see a lot of metrics passed around saying downloads from pip show 2.7 is still more popular, but I suspect that these are skewed with automated dependencies, and also not including new distributions with new methods of install – like Anaconda.

        Which brings me to my next point – if you’re going to give people info on how to get started with Python, I’d suggest something like Anaconda. Installing binary dependencies can quickly become a nightmare without it, and since you’re using your sys python you’re also looking for trouble there. Anaconda makes using virtual (conda) environments a breeze – leaving the sys python to do what it’s meant to do.

        Also – for Andras. The goal with learning SQL for data science is because the majority of your data is likely to come from legacy systems, these are for the most part RDBMS, and not NoSQL. No doubt in the future the mix will tilt, but I’d focus on traditional SQL/csv data manipulations first, and then mix in some Mongo later. Also note – Mongo is better at OLTP (transactional) than OLAP (analytical), and often times with analysis you’re going to be working with OLAP (aggregates, crosstabs, etc).

        Just my $.02.

        • @Kevin D!
          Thank you for your 2 cents! Very useful opinions and suggestions.
          UPDATE: I’ve eventually updated the whole article to Python 3.x.
          On Python 2.7 vs 3.x – I guess it’s still an opinion, there are many discussions on this out there. I prefer 2.7, so I’ll use 2.7 on my tutorials, so I added that in this article! 😉
          But 2 other reasons here:
          1. The majority of the workplaces started to use 2.7 and they are still doing it, so if you start to work somewhere, you can expect 2.7 more than 3.x.
          2. You can access learning materials easier in 2.7. (as far as I’ve experienced.)

          Anaconda-wise: yes, you might be right… now I’m considering to expand this article with Anaconda. (On the first hand I didn’t want because to keep it “simple”)

          And thank you for the comments on Mongo!

          Cheers,
          Tomi

      • Thanks! The mongo hacks are really nasty, though only instances with lousy security setup are affected I think (a trap a beginner could definitely fall into…). But I’m not a promoter of mongo. You need to know SQL anyway if you’re to work with data, your setup is just perfect for this.

        Jupyter is not included in Miniconda indeed, my bad. Anaconda does include it, but it felt like a huge overshoot for me when I tried it.

        Looking forward to the upcoming posts!

  2. Hey Tomi,

    Very important question: Am I still able to follow your guide if I use Ubuntu 16.04.1 x64 instead of 14.04.5 x64? I’ve noticed that the command ‘python’ didn’t install it for me, rather I had to sudo apt install python to have it install python2.7 for me, also saying ‘After this operation, 16.6 MB of additional disk space will be used.’ -> I thought I’m working in the cloud?

    Sorry for the noob questions. Really want to get started with Data Science, but not screw up anything with my personal device.

    Thank you for any help!

    • hey Lukas,

      (I assume in my answer that you set up the remote server as it was written in the article above – except the 16.04 vs 14.04 question…)

      If you use 16.04, it comes with python3.x (python 3.5.2 recently anyway)
      If you want to use python2.7 (the one that I suggested), you can set it up with the command you used:
      sudo apt install python

      And don’t worry about the:
      ‘After this operation, 16.6 MB of additional disk space will be used.’

      It will need disk space from your remote server (“cloud computer”).
      So just say “yes” and python will be installed on your remote server.

      Alternatively you can use Python3 – without installing anything – with this command:
      python3
      But note that there will be minor differences in the usage compared to Python2.7… Not very big things, but if you want to follow my tutorials in the future (or most of the online tutorials around Python, that’s available online), then it will be easier, if you go with python2.7.

      Also just a small comment, I’m using Ubuntu 14.04 in my tutorial for a reason. I experienced that Ubuntu releases are not always super stable in their first years (bugs, compatibility issues)… So Ubuntu14.04 is something, that I trust, when I set up my data coding tools on that. But this is kind of a question of taste! 🙂

      Hope, I answered all your questions and good luck on your path to become a Data Scientist!!

      Tomi

  3. Tomi

    Thanks for getting me started. I have followed the steps you described and read some of the references, and now I’m ready for the follow up lesson. : – )

  4. Hello Tomi,

    Congratulations! I’ve started a nannodegree on Udacity and i think that your article will help me.

    * Maybe you can make available the articles in pdf.

    See ya!

    • hey Rafael,

      thank you, glad you liked it! And good luck with the nanodegree!
      I guess I won’t put these stuff to PDF format – in the short future at least – but thanks for the idea anyway! : )

      Cheers,
      Tomi

  5. Great article !

    One quick question: Can I follow your bash tutorial using bash in my laptop (Mac) ? or is there any benefits for using bash in DigitalOcean server instead of from my laptop ?

    Cheers,
    Woratana

    • hey Woratana,

      yes, absolutely – you have 2 major ways to do that.
      A) You can set up a Linux – Ubuntu 14.04 to your Mac as a second op system (I’d suggest this one)… that’s more or less the same infrastructure, that the DigitalOcean servers have.
      B) You can use VirtualBox (LINK: https://www.virtualbox.org/wiki/VirtualBox) and set up a virtual computer on the top of your OSX system. There you can set up an Ubuntu 14.04 OS… But this B) solution is a bit more fragile (it will freeze out and slowing down time to time without any reason.)

      Cheers,
      Tomi

  6. Hi,

    I am trying to set up the sql and pgadmin4 on the server. I can switch to the postgres superuser but when I type psql I get

    ‘psql: could not connect to the server: no such file or directory
    Is the server running locally and accepting connections on Unix domain socket
    “/var/run/postgresql/.s.PGSQL.5432”?

    Sorry, it is probably something obvious but I wanted to run through the SQL tutorial.

    • hey RobH,

      thanks for the question.
      I guess in this article you might find a detailed answer:
      http://stackoverflow.com/questions/31645550/why-psql-cant-connect-to-server

      Based on the info, you’ve provided, it’s a bit hard to tell the exact issue.
      But my best guess is that something’s wrong with this part:
      echo 'host all all 0.0.0.0/0 md5' >> /etc/postgresql/*/main/pg_hba.conf
      If it’s so, you should have got an error message or something.
      But if not, then you can also try to add your username instead of the *

      Hopefully this will fix the issue!
      Let me know!
      Cheers,
      Tomi

  7. I found the information that I needed.

    Thank you!.

    Rob

  8. I really enjoyed setting up my own server on AWS and following your tutorials. I am currently on Ep3. Do you have a way of accepting donations?

    • hey Ryan!

      Thanks a lot and glad to hear!
      Plus, I appreciate your intention regarding donations but it’s really not necessary.
      The best way to support me is sharing this article with your network! 😉

      Thanks and cheers,
      Tomi

    • Kamal

      Is it fairly easy to set this up with aws for someone new to cloud server technology?

  9. Aimilianos Manolatos

    Hi , thanks for all the effort you’ve put in.

    I’m on a Mac and have the following error showing when trying to create a Server in pgAdmin5

    Unable to connect to server:

    FATAL: password authentication failed for user “Aimilian0s”
    FATAL: password authentication failed for user “Aimilian0s”

    Any clues ?

    Kind regards,
    Aimilianos

  10. Very good post and good reading. Thanks for your efforts.
    What I’ll suggest is to use docker images for directly start using Python, SQL, R and Bash (for non-devs). It’s free, you’ll have full control on it (destroy when it’s corrupted and launch a new one).
    Also, you’ll able to get the latest updates on the software while persisting the data and code you’ve been working on.

    And this one is for postgres:
    https://hub.docker.com/_/postgres/

    This is for pgadmin:
    https://hub.docker.com/r/fenglc/pgadmin4/

    This one is for Jupyter, R, Julia and Python:
    https://hub.docker.com/r/jupyter/datascience-notebook/

    Basically I run the commands below after having docker set up on my windows PC.

    ## run Postgres :
    docker run –restart always –name data36-postgres -e POSTGRES_USER=data36 -e POSTGRES_PASSWORD=data36 -d postgres

    ## run PGAdmin :
    docker run –restart always –name data36-pgadmin4 -e DEFAULT_USER=data36 -e DEFAULT_PASSWORD=data36 -p 5050:5050 -d –link data36-postgres:data36-postgres fenglc/pgadmin4

    Then I hit http://localhost:5050 in my browser.
    Use default administrator account to log in:
    user: data36
    password: data36

    ## run JupyterLab :
    docker run –restart always –name data36-jupyter -it –rm -p 8888:8888 jupyter/datascience-notebook start.sh jupyter lab

    Then I hit http://localhost:8888 in my browser.
    Use the token printed on your console to login.

    You can use the commands below to get the logs as well as tokens when necessary:
    docker logs -f

  11. Hi Tomi, thanks for putting this together. Great job! I’m an aspiring Data Scientist with an Electrical Engineering background. I’m a bit stuck trying to install R Studio. After entering:
    [sudo restart rstudio-server]
    I get an error: [sudo:restart:command not found] and I’m unable to access R Studio from my browser. Have you seen this before? Any suggestion? Thanks.

    • hi Kay,

      there could be multiple reasons for this issue…
      The easiest way to overcome this issue though to:
      1. Make sure you are logged in with your own user (note root user!)
      2. Re-do the whole R chapter in the article (re-install everything, etc…)

      If that doesn’t work, please try to send me a screenshot about the issue and I’ll try to figure it out! : )

      Cheers,
      Tomi

    • UPDATE!
      I think I found the answer to your problem.

      Sometimes the R install commands are aborting after the “Do you want to install the software package? [y/N]” or after the “Do you want to continue? [y/N]” questions – even if you type “y”.

      This is a hick-up in their install package and the best workaround is to simply re-run the command if you see that it has aborted after typing “y”.

      Hope this helps!

      Tomi

  12. gülsemin

    Hi Tomi,
    I have a problem and cant solve it.The problem is that after executing all of these commands:
    CREATE TABLE test(column1 TEXT, column2 INT);
    INSERT INTO test VALUES (‘Hello’, 111);
    INSERT INTO test VALUES (‘World’, 222);
    It says: -bash syntax error near unexpected token (‘

    What should I do?

    Cheers

    • hi Gülsemin,

      it seems that you are still in bash – make sure you are not missing any of the steps in the article.
      This time it’s the step where you login to your SQL database with psql.

      Hope this helps!
      Cheers,
      Tomi

  13. john schlafly

    Tomi – This is amazing, and a way more fun way to learn and visualize these languages than the online courses out there. Excited to get after this

  14. Marisa Reis

    Nice post.
    Just a comment, I recomend Pycharm as Python Editor.

  15. Trong Nguyen

    In the step below of installing Jupyter, I’m getting this weird message: No such file or directory.

    echo “c.NotebookApp.ip = ‘*'” >> /home/[your_username]/.jupyter/jupyter_notebook_config.py
    (Note: this is one line of code! Only your browser breaks it into 2 lines!)
    This will add one line to the newly created config file that will make you able to use your Jupyter notebook from a browser window (like Chrome or Firefox).

    Maybe I’m doing something wrong but I’m typing in this message:
    echo “c.NotebookApp.ip = ‘*'” >> /home/[your_username]/.jupyter/jupyter_notebook_config.py

    1) There is a space between “>>” and “/home/….”

    Any thoughts? Thanks.

    • hey Trong,

      I think you forgot to change the [your_username] part to your actual username! : )
      Try it with the username you created at the beginning of the tutorial. If it’s still not working, let me know!

      Tomi

      • Trong Nguyen

        I’m an idiot, Tomi lol. It’s a little embarrassing, but if you can’t tell already, I’m very new at this.

        However, I just purchased your program on Data 36 on how to install SQL and the other programs and your video on this section did the trick.

        It works now!

        Thanks, you’re amazing.

  16. Hi Tomi,

    Just got started via the $10 starter. Absolutely new to data science or coding. So I’d need some help with the project goals bit.

    I need to choose from the following sections;

    – CONFIGURATION MANAGEMENT
    – DEPLOYMENT
    – DEV TOOLS
    – MONITORING
    – OTHER CLOUD PROVIDERS
    – PUBLISHING

    My guess is to go to the MySQL in DEV TOOLS and forget every other item.

    Kindly help clarify.

    Cheers,

  17. Hi Tomi,

    While trying to do this command “sudo apt-get update” for updating python,I get the following message :

    “fufu is not in the sudoers file. This incident will be reported.”

    What can I do?

    Thank you

  18. Hi Tomi, thank you for this tutorial! For some reason I’m not able to create a SQL user with your prompt:

    postgres=# CREATE USER nicknemethy WITH PASSWORD ’test’;
    ERROR: syntax error at or near “’test’”
    LINE 1: CREATE USER nicknemethy WITH PASSWORD ’test’;

    The error points to the ‘ before test

    Any idea why that’s happening?

    • hey Nick,

      I think the issue is with the type of the apostrophe you use.
      They seem really similar but in fact there are 3 different types:
      1) "
      2) '
      3) and .

      Now, I don’t know that on you computer which one is which button — but there is a slight difference on how they look, so you can experiment with them by yourself. You typed in the 3rd one, but you will need the 2nd one instead.
      (I know these nuances are annoying in the beginning but the concept will grow on you when you get more and more into coding.)

      Hope this helps!

      Cheers,
      Tomi

  19. Avijeet Singh

    can i do this with google cloud platform?

    • In theory, you can. In practice, I recommend that you follow my guide and go with DO, to avoid possible inconveniences that coming from using a different platform than I do. Tomi

  20. hi can i just simply install python, r and bash in my windows laptop? would i be able to follow your articles/intsructions if i do this?

    • In theory, yes, you can follow the articles using your laptop as your “data server”…
      In practice, there might be minor differences, so I can’t guarantee anything.
      But it’s worth give it a go and you can change to the server set up anytime by coming back to this article!
      Cheers?
      Tomi

  21. Hi Tomi!

    I try to open the Jupyter notebook in Google Chrome or in Firefox, but the connection is unabled. I use this: (the IP from the email):8888
    The Jupiter notebook is running but I can’t acces it from browser.
    What can be the problem? I am at my workplace but so far I could do all the instructions written in the article. I don’t think that it is a firewall issue.

    Köszönöm

  22. Hi Tomi,

    all the problems are solved I wrote about.

  23. James Adler

    This was so great! Helpful and easy to follow!!! I am new to this whole data science thing and usually end up running into many issues when trying to follow instructions, even when I follow them exactly. Thank you!

    FYI – I did run into one issue with not being in the sudoers file when trying to run first sudo command from my own user (not root). Not sure if I missed something that put me in that spot but quick google search fixed problem.

    • Awesome! Thanks a lot, James!
      Yes, probably you missed a step — or even as small as a character, in one of the steps. But I’m glad that you fixed it! : )

      Cheers!

  24. Hazim Mir

    Hello,

    Well, I created the droplet and now when I install Putty, and when i open, my server. It asks for Login: which I gave as “root”, and then it asks for the password, Which i “copied-pasted”, I also typed it from the mail. So to be sure, as it does not show anything when we type the password. I always get the access denied error.
    What should I do?
    Thanks

    • That’s tricky, password from the email should just work.
      Probably you mistyped something (special character?)
      Also, recently, DO changed it’s reg surface, and now you can define your own password!
      Tomi

  25. Prashil

    Hi Tomi,

    When I am trying to do below:
    “This command will start to run the Jupyter application on your remote server. While it’s running in Terminal, you should just open a browser and type in the address bar [IP Address of your remote server from the email]:8888”

    I get:

    This site can’t be reached
    refused to connect.
    Try:

    Checking the connection
    Checking the proxy and the firewall
    ERR_CONNECTION_REFUSED

    Any ideas to resolve this?

    Thank you.

  26. Prashil

    Please ignore my previous message.
    Seems it is working fine now. Thank you.

  27. Hi, thank you for your tutorial.
    For some reason, I can’t get a remote server from aboard and I have to get it from an inner provider. Can I just get a virtual server from a hosting provider and continue the tutorial?

    Thank you in advance.

    • Yeah, you can go with any remote server solutions if you can’t use DigitalOcean. Please note that there might be minor differences!

  28. George Jordan

    Hi Tomi,

    Thanks for the article. I have been struggling with SQL and coding for the past 30 days and just chance on your website. I have followed the steps but cannot for whatever reason I can’t log in to using the new user after step 3. How can change the new user details so I can proceed?
    I tried to use root as the user to go through step 4 but cannot connect to Jupiter notebook as root. Any help please.

  29. Really great easy-to-understand tutorial!
    I had just one question. At the end of it I just exited from all apps and exited from the server. I wanted to know if it’s a good practice to turn off the droplet after every session?

    I believe you’re still charged by DigitalOcean but that’s fine. I just think of it like shutting down my laptop/PC.

    • hey K,

      Usually, I don’t turn off my servers ever… but you can do that.
      It doesn’t make any difference in what DigitalOcean will charge you but it’s maybe more secure that way.

      Although, I don’t think, you should worry about security of your remote server in general at DO.
      So yeah, I leave it up and running. Make sure you shut down your Jupyter Notebooks all the time, though.

      Cheers,
      Tomi

  30. Hi Tomi,

    I’m super glad I found this resource online. It’s truly incredible what you’ve put together. With that said, I’m completely new to all of this and I have no idea where I’m going wrong. I’m not able to view my jupyter notebook online and I can’t seem to connect my pgAdmin to my server.

    I’m using ssh to access my droplet with putty with the root user. But I was never able to get it to access with the user I created after. My droplet currently has a couple wordpress sites that I practiced setting up. Could that be an issue?

    I’m feeling a bit lost.

    • hey Daniel — yeah I think a firewall blocks all the things you try to proceed (probably one that was automatically set up while practicing with WP). I recommend to go with a brand new droplet that’s not interfered by other projects. : ) Tomi

  31. Wow your guide is so detailed
    Though I only understand like 10% of it (accountant background interested in excel & data science)
    Looks very interesting, will try to look through & understand more aspects of it

    Just started on
    Python for Finance: Investment Fundamentals & Data Analytics
    by 365 careers on udemy.com

    got Jupyter running & doing some practices with Python now
    came to your site through search: diff between functions & methods

    please keep up the good work! Cheers~~

  32. Mateus Maciel

    Dear Tomi,

    You really did a great job with this tutorial. However, I am facing a problem in the end of the R installation part . When I type http://xxx:8787 on Google Chrome, it says that it was not able to connect.

    Could you give me a hand?

    Best regards,

    Mateus.

  33. Mateus Maciel

    Never mind it. I could solve it.

    I just had to install RStudio again.

    Cheers!

39 Pingbacks

  1. Data Coding in Bash - Introduction - Episode 2 (data science for beginners)
  2. Command Line Tools for Data Science - Intro to Bash - episode 3
  3. Data Coding in Bash Best Practices & Shortcuts (with video)
  4. Variables, if statements and while loops in bash (Data Coding 101)
  5. Data Coding 101 - 4 more command line tools: sed, awk, join, date
  6. SQL for Data Analysis - Tutorial for Beginners - ep1 - Data36
  7. SQL WHERE clause | Data Analysis in SQL for beginners (ep2)
  8. SQL functions (SUM, AVG, COUNT, etc...) & the GROUP BY clause
  9. SQL Best Practices for Data Analysts (SQL Tutorial for Beginners: ep4)
  10. SQL JOIN for Data Analysts (SQL tutorial for Beginners - ep5)
  11. SQL for Data Analysis - Tutorial - ep6 - Some Advanced SQL stuff - Data36
  12. The Data Analyst’s Toolset (UPDATED: Aug, 2017) - Data36
  13. Python for Data Science - Tutorial for Beginners #1 - Python Basics
  14. Python Data Structures (Python & Data Science Basics #2)
  15. How to install SQL Workbench for postgreSQL? (6 steps)
  16. Analysing the Data and Project Planning. | INFOENERGY
  17. Python Built-in Functions and Methods (Python for Data Science Basics #3) - Data36
  18. Python If Statements Explained (Python For Data Science Basics #4)
  19. Python For Loops Explained (Python for Data Science Basics #5)
  20. How to Create a Table in PostgreSQL (CREATE TABLE) - Data36
  21. How to Import Data into SQL Tables (using pgamdin4 or SQL Workbench)
  22. How to connect Google Data Studio to PostgreSQL (Step by Step Tutorial)
  23. Python For Loops and If Statements Combined (Data Science Tutorial)
  24. Python Syntax Essentials and Best Practices - Data36
  25. Python Import Statement --- plus: Built-in Modules for Data Scientists
  26. Pandas Tutorial 1: Pandas Basics (read_csv, DataFrame, Data Selection)
  27. Python libraries and packages for Data Scientists (Top 5)
  28. Pandas Tutorial 2: Aggregation and Grouping
  29. Pandas Tutorial 3: Important Data Formatting Methods (merge, sort, reset_index, fillna)
  30. The Junior Data Scientist’s First Month – Maria Lobillo
  31. Learning Data Science (4 Untold Truths)
  32. Learning languages very quickly  —  with the help of some very basic Data Science. - Lexigo
  33. What's the best computer/laptop for a data scientist?
  34. Web Scraping Tutorial -- episode #1 -- Scraping a Webpage (with Bash)
  35. Scraping Multiple Pages and URLs with For Loops (Web Scraping Tutorial)
  36. Linear Regression in Python using sklearn and numpy (with code base)
  37. SQL TRUNCATE TABLE and DROP TABLE (tutorial)
  38. SQL current date (and time, month, year, etc.) in postgreSQL
  39. SQL current date (and time, month, year, etc.) in postgreSQL - Pre-Processing

Leave a Reply