My Computer Setup for Data Science (Apps, Software, Programs and Tools I Use)

Recently, I had the “pleasure” of setting up my work computer (the computer that I use exclusively for data science projects) from scratch.

I have a 2016 Macbook Pro, and its keyboard just gave it up. It happens. (Well, it shouldn’t, dear Apple!) Anyway, they’ll fix it in 2-3 weeks and in the meantime I got a replacement laptop.

Perfect, I thought. I’ll use this inconvenience as an opportunity. I’ll make a list of the tools, apps and programs that I’ll install during the process, so I can share my exact data science computer setup with you.

Note: Here, I’ll show you my Macbook setup — but most of the tools in my list are available on Windows, too.

Note 2: What’s the best computer for data science? Check it out here.

This will be somewhat different from other articles out there…

As you’ll see, I don’t have too many data science programs/applications on my computer. And the reason is simple: I prefer to do data science in the cloud, using remote servers.

data science computer setup server connection
local computer + remote data server connection

Before I wrote this post, I searched other articles on the topic online… and it surprised me that most articles recommend setting up all the data tools (like SQL databases, RStudio, Python, Jupyter Notebook and all other things) exclusively on your personal computer.

No way!

I mean that’s okay for learning. In the beginning, at least. But that’s far from real life data science computer setups.

I’ve been working on data science projects for many years… And believe me when I say that I (similarly to other practicing data scientists) don’t use anything other than remote servers — and only a few programs on my local computer that I’ll show you in this article.

My computer setup for data science. The list of the tools.

Of course, this is only my list, and of course it’s subjective. Feel free to use it as a starting point and tweak it to your needs!

Anyway, here it is — my computer setup for data science:

  • Google Chrome and Firefox (browsers)
  • iTerm2 (for accessing remote servers via the command line)
  • SQL Workbench (SQL manager for accessing SQL databases)
  • Anaconda-Navigator (local Python environment for prototyping)
  • Sublime Text 3 (the best text editor for coding and scripting)
  • Backup and Sync (file management and cloud storage)
  • Keynote, Numbers, Pages (for presentations, spreadsheets and documents)
  • Evernote (note taking)
  • Slack (teamwork and chat)
  • f.lux (the app that my eyes need)
  • Spotify (music!)
computer setup for data science
my data science computer setup

Let’s see them one by one!

#1 Browsers: Firefox and Google Chrome

You have the factory setup of an operating system on your computer. This is a new beginning. Awesome! The very first thing to do is to get a worthy browser. Neither Safari or Internet Explorer is in this category.

So my first step was to download and install Firefox, my favorite browser.

And the second step was to download and install Google Chrome. It’s a great browser, too. And I can’t ignore it, either, since I use many Google services (e.g. Gmail, Google Drive, Docs, Spreadsheets, Data Studio, Optimize) — and a few Chrome extensions that are not available for Firefox.

Both Google Chrome and Firefox are free.

Tip: if you have important bookmarks in your browsers on another computer, you can export them and import them into your new computer’s browsers. Even more, if you log in to your browser, you can even sync your bookmarks across multiple computers.

#2 iTerm2 (or PuTTY)

As I mentioned above, I prefer to do data science in the cloud. An essential and fundamental way to interact with a remote server is to type commands into the command line. For that, you’ll need a terminal app.

And one of the best terminal apps is iTerm2.

If you don’t know what Terminal or iTerm2 is, this is the program from the movies — with a black background and white and green letters:

data science computer setup iterm2
iTerm2 in action

Once you have it on your computer, you can start to type in commands, which opens up many opportunities that you don’t have using graphical interfaces. As I mentioned, this is crucial for working with remote servers (that usually don’t even have graphical user interfaces). But it can also come in handy if you want to set up something very specific on your local computer. (E.g. file or directory permissions.)

I’ve written more about how to use Terminal and iTerm2 in my data science server setup article — and about the command line itself in my bash articles. So I won’t go into detail here.

Note: iTerm2 is not available for Windows — there, you’ll have to use PuTTY instead, which is a less user-friendly alternative. But at least there is something. I’ve written more about it, too, in my data science server setup article.

#3 SQL Workbench

To work with SQL databases efficiently, you’ll need an SQL manager tool. There are many: better and worse, free and paid ones…

My favorite free SQL manager tool is SQL Workbench.

It’s simple, works smoothly and it does everything that you will need when working with your SQL queries and commands.

data science setup sql workbench
SQL Workbench

It’s also compatible with all the major SQL languages: mySQL, postgreSQL, MS SQL Server, Amazon Redshift, Oracle, etc.

I wrote more about how to install SQL Workbench here.

#4 Anaconda

Although I prefer working in the cloud, I have to admit that sometimes I use Python locally on my computer. And to have a Python environment with all the data science packages, I use Anaconda.

More specifically, I use the Anaconda Individual Editor. (It’s free.)

Once you download and install it, it’ll immediately give you access to all the popular data-science-related Python applications and frameworks — on your local computer:

data science computer stack

Note: By the way, it features not just Python but R, too!

And it also gives you access to ~320 data science packages – including the popular ones like numpy, pandas, sklearn and more:

python packages in anaconda

#5 Sublime Text

To write complex data science scripts efficiently, you’ll need a text editor designed for coding.

You can use some basic command line tools like vim or (my favorite) mcedit.

mcedit is oldschool remote server setup

But these are not practical enough if you want to be coding every day.

The best and most coder-friendly ever text editor is Sublime Text 3.

Really, I haven’t seen anything like it… The look is similar:

sublime text 3 editing code on desktop
Sublime Text 3 is awesome!

But, boy, the features. There are so, so many – just a few of these:

  • multi-select
  • multi-edit
  • auto-completion
  • selecting columns
  • and many, many more…

I absolutely love it. These things make coding and scripting so much easier, and honestly, much more fun.

Here’s, for example, the very eye-catching multi-edit feature:

sublime text data science computer setup

You can download Sublime Text 3 here.

It’s virtually free… Well, it says that you can try it out for free and you’ll have to pay for it only when you decide to stick with it in the long term. But it doesn’t really give any time limit for the trial period (so it can be 5 years, up to you).

Regardless, I encourage you to pay for it! I mean this trial period thingy is a nice gesture from the creators of Sublime Text – and a gesture that it’s worth returning. Buying Sublime Text 3 for $80 (one-time payment) is a very good investment into a great tool that you’ll use probably for the next 10-20 years.

#6 Backup and Sync by Google (aka. Google Drive)

I store all my files in the cloud. It’s safer and better than having them on my computer. 

There are quite a few vendors and solutions for that (e.g. Dropbox, Box, Tresorit, Microsoft OneDrive) but I use Google Drive.

And for Google Drive, there’s a desktop app called Backup and Sync. By setting up this program, you will be able to automatically synchronize your files between your cloud storage and your local computer.

And that’s quite awesome because:

  • After setting up Backup and Sync on my new computer, I just had to login and all my files got downloaded in a few minutes without any effort. (#win)
  • With Backup and Sync, I can decide to upload my files to Google Drive and then remove them from my computer. They’ll be still available in my cloud storage, so if I need them again, I just have to tell Backup and Sync that I want them on my computer and in a few minutes, they are there. Using this method, I’ve never needed more than 256 GB of flash drive for my Macbook.
  • I can easily share files and docs with my clients and contractors.
  • And if my computer gets lost, broken or stolen, I’ll still have my files.

Okay, I guess I don’t have to talk more about cloud storage. Just use it. It’s 2020 after all!

#7 Presentations, documents and spreadsheets

As a data scientist, you’ll have to…

  • deliver presentations,
  • open spreadsheet (like .xls or .csv files),
  • sign documents (like contracts – especially NDAs ;-))

…just like everyone else.

For these, there are three great and well-known tools:

  • PowerPoint for presentations,
  • Excel for spreadsheets and
  • Word for documents.

I don’t have to introduce any of these to you.

But let me share an interesting practical observation of mine:

Of the above programs, PowerPoint is the only one that I use frequently.

As for Excel, I use it only for reading spreadsheets — but very rarely for data analysis. When I analyze data, the actual work happens in SQL and pandas rather than in Excel… And on the rare occasions when I really need to edit a spreadsheet, I go with the cloud-based (and free) Google Sheets instead.

And the same goes for Word and editing documents. I don’t really do that – and when I do, I use Google Docs.

So a few years ago, I decided stop paying for PowerPoint, Excel and Word — and since I have a Mac, I replaced them with their free Mac alternatives:

  • Keynote for presentations,
  • Numbers for spreadsheets and
  • Pages for regular documents.

They are just as perfect for everything that I need for my day-to-day job as the Microsoft programs were. (Actually, I find Keynote better than PowerPoint.)

And again: if you are a Mac user, you’ll get Keynote, Numbers and Pages for free.

#8 Note-taking: Evernote

For note-taking I use Evernote.

  • When I’m in a meeting,
  • when I have a random idea,
  • when I learn something new in an online course,

I take notes.

Working as a data scientist, you can’t afford not to take notes. Your brain will be so full of complex concepts and information all the time that you’ll have to unload it somewhere.

That’s pretty similar to how the memory card and the hard drive work in a computer. The memory card is fast, but you can’t store everything there because it gets full quickly. So you need a hard drive to store things that you won’t use immediately.

So in this metaphor your brain is the memory card — and note-taking is the hard drive.

Anyway, for note-taking, I use one single app: Evernote.

Why Evernote? I don’t know, I started to use it 8 years ago — and it never failed me, so I just stuck with it.

It does what it promises: you can take notes and you can synchronize them across your devices. So I have Evernote on my computer and on my smartphone, too — and when I have a random idea when I’m walking on the street, I can write it down quickly.

evernote in my data science setup
Making notes of important steps of a data science project in Evernote

Using Evernote on two devices is free, by the way.

#9 f.lux

Let me explain f.lux very quickly with two pictures:

flux in action on my computer 2

left-side: computer screen without f.lux — right-side: computer screen using f.lux.

See, f.lux does only one thing: when the sun goes down, it changes your computer’s display so it adapts better to your light conditions. During the day, you’ll get your normal display, nothing changes. But in the evening (and during the night if you work late) f.lux makes your display’s colors warmer… And you won’t get blinded by your screen. 🙂

Their motto is: “Flux – software to make your life better.”

And indeed. It’s a small thing, but the moment you try it out, you’ll get how satisfying warmer colors are for your eyes. Especially if you are like me and work a lot during the night.

#10 Spotify

And eventually: music.

I like to listen to music when coding. But I have to admit that when I need to focus (e.g. when building a very sophisticated machine learning algorithm, or thinking about the business aspect of a data project), it’s better to turn it off and have some silence.

Anyway, when I do want music, I use Spotify.

data science plus music spotify

This is my current computer setup for data science

So this is my list of apps, programs and software that I have on my local computer for data science right now.

Most probably, it’ll change and expand over time — if so, I’ll update this article.

In the meantime, I’d be curious to see yours. Feel free to add your list to the comment section below!

Cheers,
Tomi Mester

← Previous post

Next post →

2 Comments

  1. Kamal

    I’m very new to using the remote server and I recently setup one using your tutorial. Kudos for that! However, I learned that I can’t open graphical interface text editor (e.g. sublime, atom) on cloud. I can’t be writing python scripts using vim/nano lolz. Just want to do know what do you do? Write locally and then FTP to your server? Doesn’t it beat the purpose of using the server then? Thanks for all your guides here!

    • hey Kamal,

      that’s a great question.
      On the server, you can use mcedit — which is a bit more user-friendly than vim or nano, in my opinion, at least.

      But I prefer to use Sublime Text 3 remotely — via this nice little addition:
      https://github.com/randy3k/RemoteSubl

      I’ll write an article about it, too, if needed!

      Tomi

Leave a Reply