Recently, I had the “pleasure” of setting up my data science computer (the one that I use exclusively for client projects) from scratch.
I have a Macbook Pro, and its keyboard just gave it up. It happens. (Well, it shouldn’t, dear Apple!) Anyway, they’ll fix it in 2-3 weeks and in the meantime I got a replacement laptop.
Perfect, I thought. I’ll use this inconvenience as an opportunity. I’ll make a list of the tools, apps and programs that I’ll install during the process, so I can share my exact data science computer setup with you.
Note: Here, I’ll show you my Macbook setup — but most of the tools in my list are available on Windows, too.
Note 2: What’s the best computer for data science? Check it out here.
My data science computer setup is different…
As you’ll see, I don’t have too many data science programs/applications on my computer. And the reason is simple: I prefer to do data science in the cloud, using remote servers.
Before I wrote this post, I searched other articles on the topic online… and it surprised me that most articles recommend setting up all the data tools (like SQL databases, RStudio, Python, Jupyter Notebook and all other things) exclusively on your personal computer.
No way!
I mean that’s okay for learning. In the beginning, at least. But that’s far from real life data science computer setups.
I’ve been working on data science projects for many years… And believe me when I say that I (similarly to other practicing data scientists) don’t use anything other than remote servers — and only a few programs on my local computer that I’ll show you in this article.
Note: a question you might have. “What’s the best cloud platform for an aspiring data scientist?” This will be surprising: it’s not AWS, IBM or Google Cloud! These platforms are awesome but not for aspiring data scientists. You want to start easy and move further when you got the basics. That’s why I recommend DigitalOcean as a start! Find more info here.
My computer setup for data science. The list of the tools.
Of course, this is only my list, and of course it’s subjective. Feel free to use it as a starting point and tweak it to your needs!
Anyway, here it is — my computer setup for data science:
- Google Chrome and Firefox (browsers)
- iTerm2 (for accessing remote servers via the command line)
- SQL Workbench (SQL manager for accessing SQL databases)
- Anaconda-Navigator (local Python environment for prototyping)
- Sublime Text 3 (the best text editor for coding and scripting)
- Backup and Sync (file management and cloud storage)
- Keynote, Numbers, Pages (for presentations, spreadsheets and documents)
- Evernote (note taking)
- Slack (teamwork and chat)
- f.lux (the app that my eyes need)
- Spotify (music!)
Let’s see them one by one!
#1 Browsers: Firefox and Google Chrome
You have the factory setup of an operating system on your computer. This is a new beginning. Awesome! The very first thing to do is to get a worthy browser. Neither Safari or Internet Explorer is in this category.
So my first step was to download and install Firefox, my favorite browser.
And the second step was to download and install Google Chrome. It’s a great browser, too. And I can’t ignore it, either, since I use many Google services (e.g. Gmail, Google Drive, Docs, Spreadsheets, Data Studio, Optimize) — and a few Chrome extensions that are not available for Firefox.
Both Google Chrome and Firefox are free.
Tip: if you have important bookmarks in your browsers on another computer, you can export them and import them into your new computer’s browsers. Even more, if you log in to your browser, you can even sync your bookmarks across multiple computers.
#2 iTerm2 (or PuTTY)
As I mentioned above, I prefer to do data science in the cloud. An essential and fundamental way to interact with a remote server is to type commands into the command line. For that, you’ll need a terminal app.
And one of the best terminal apps is iTerm2.
If you don’t know what Terminal or iTerm2 is, this is that cool program from the movies — with a black background and white and green letters:
Once you have it on your computer, you can start to type in commands, which opens up many opportunities that you don’t have using graphical user interfaces.
Note: if you have ever wondered:CLI
= Command Line InterfaceGUI
= Graphical User Interface
In data science, we often prefer CLI over GUI.
As I mentioned, working in the command line is crucial for working with remote servers. Remote servers usually don’t even have graphical user interfaces! But CLI can also come in handy if you want to set up something very specific on your local computer. (E.g. file or directory permissions.)
I’ve written more about how to use Terminal and iTerm2 in my data science server setup article — and about the command line itself in my bash articles. So I won’t go into detail here.
Note: iTerm2 is not available for Windows — there, you’ll have to use PuTTY instead, which is a less user-friendly alternative. But at least there is something. I’ve written more about it, too, in my data science server setup article.
#3 SQL Workbench
To work with SQL databases efficiently, you’ll need an SQL manager tool. There are many: better and worse — free and paid ones — …
My favorite free SQL manager tool is SQL Workbench.
It’s simple, works smoothly and it does everything that you will need when working with your SQL queries and commands.
It’s also compatible with all the major SQL languages: mySQL, postgreSQL, MS SQL Server, Amazon Redshift, Oracle, etc.
I wrote more about how to install SQL Workbench here.
#4 Anaconda
Although I prefer working in the cloud, I have to admit that sometimes I use Python on my local data science computer, too. And to have a Python environment with all the data science packages, I use Anaconda.
More specifically, I use the Anaconda Individual Editor. (It’s free.)
Once you download and install it, it’ll immediately give you access to all the popular data-science-related Python applications and frameworks — on your computer:
Note: By the way, it features not just Python but R, too!
And it also gives you access to ~320 data science packages – including the popular ones like numpy, pandas, sklearn and more:
The Junior Data Scientist's First Month
A 100% practical online course. A 6-week simulation of being a junior data scientist at a true-to-life startup.
“Solving real problems, getting real experience – just like in a real data science job.”
#5 Sublime Text
To write complex data science scripts efficiently, you’ll need a text editor designed for coding.
You can use some basic command line tools like vim or (my favorite) mcedit.
But these are not practical enough if you want to be coding every day.
The best and most coder-friendly text editor, I have ever used, is Sublime Text 3.
Really, I haven’t seen anything like it… The look is similar:
But, boy, the features. There are so, so many – just a few of these:
- multi-select
- multi-edit
- auto-completion
- selecting columns
- and many, many more…
I absolutely love it. These things make coding and scripting so much easier, and honestly, much more fun. So you definitely want to have it in your computer setup.
Here’s, for example, the very eye-catching multi-edit feature:
You can download Sublime Text 3 here.
It’s virtually free… Well, it says that you can try it out for free and you’ll have to pay for it only when you decide to stick with it in the long term. But it doesn’t really give any time limit for the trial period (so it can be 5 years, up to you).
Regardless, I encourage you to pay for it! I mean this trial period thingy is a nice gesture from the creators of Sublime Text – and a gesture that it’s worth returning. Buying Sublime Text 3 for $80 (one-time payment) is a very good investment into a great tool that you’ll use probably for the next 10-20 years.
#6 Backup and Sync by Google (aka. Google Drive)
I store all my files in the cloud. It’s safer and better than having them on my computer.
There are quite a few vendors and solutions for that (e.g. Dropbox, Box, Tresorit, Microsoft OneDrive) but I use Google Drive.
And for Google Drive, there’s a desktop app, too called… wait for it: Google Drive Desktop. By setting up this program, you will be able to automatically synchronize your files between your cloud storage and your local computer. If you have multiple computers, it’ll synchronize across them too.
And that’s quite awesome because:
- After setting up Google Drive Desktop on my new computer, I just had to login and all my files got downloaded in a few minutes without any effort. (#win)
- With Google Drive Desktop, I can decide to upload my files to Google Drive and then remove them from my computer. They’ll be still available in my cloud storage, so if I need them again, I just have to tell it that I want them on my computer and in a few minutes, they are there. Using this method, I’ve never needed more than 256 GB of flash drive for my Macbook.
- I can easily share files and docs with my clients and contractors.
- And if my computer gets lost, broken or stolen, I’ll still have my files.
Okay, I guess I don’t have to talk more about cloud storage. Just use it. It’s 2022 after all!
#7 Presentations, documents and spreadsheets
As a data scientist, you’ll have to…
- deliver presentations,
- open spreadsheet (like .xls or .csv files),
- sign documents (like contracts – especially NDAs ;-))
…just like everyone else.
For these, there are three great and well-known tools:
- PowerPoint for presentations,
- Excel for spreadsheets and
- Word for documents.
I don’t have to introduce any of these to you. It’s practically part of every computer setup.
But let me share an interesting practical observation of mine:
Of the above programs, PowerPoint is the only one that I use frequently.
As for Excel, I use it only for reading spreadsheets — but very rarely for data analysis. When I analyze data, the actual work happens in SQL and pandas rather than in Excel… And on the rare occasions when I really need to edit a spreadsheet, I go with the cloud-based (and free) Google Sheets instead.
And the same goes for Word and editing documents. I don’t really do that – and when I do, I use Google Docs.
So a few years ago, I decided stop paying for PowerPoint, Excel and Word — and since I have a Mac, I replaced them with their free Mac alternatives:
- Keynote for presentations,
- Numbers for spreadsheets and
- Pages for regular documents.
They are just as perfect for everything that I need for my day-to-day job as the Microsoft programs were. (Actually, I find Keynote better than PowerPoint.)
And again: if you are a Mac user, you’ll get Keynote, Numbers and Pages for free.
#8 Note-taking: Evernote
For note-taking I use Evernote.
- When I’m in a meeting,
- when I have a random idea,
- when I learn something new in an online course,
I take notes.
Working as a data scientist, you can’t afford not to take notes. Your brain will be so full of complex concepts and information all the time that you’ll have to unload it somewhere.
That’s pretty similar to how the memory card and the hard drive work in a computer. The memory card is fast, but you can’t store everything there because it gets full quickly. So you need a hard drive to store things that you won’t use immediately.
So in this metaphor your brain is the memory card — and note-taking is the hard drive.
Anyway, for note-taking, I use one single app: Evernote.
Why Evernote? I don’t know, I started to use it 8 years ago — and it never failed me, so I just stuck with it.
It does what it promises: you can take notes and you can synchronize them across your devices. So I have Evernote on my computer and on my smartphone, too — and when I have a random idea when I’m walking on the street, I can write it down quickly.
Using Evernote on two devices is free, by the way.
#9 f.lux
Let me explain f.lux very quickly with two pictures:
See, f.lux does only one thing: when the sun goes down, it changes your computer’s display so it adapts better to your light conditions. During the day, you’ll get your normal display, nothing changes. But in the evening (and during the night if you work late) f.lux makes your display’s colors warmer… And you won’t get blinded by your screen. 🙂
Their motto is: “Flux – software to make your life better.”
And indeed. It’s a small thing, but the moment you try it out, you’ll get how satisfying warmer colors are for your eyes. Especially if you are like me and work a lot during the night.
#10 Spotify
And eventually: music.
I like to listen to music when coding. But I have to admit that when I need to focus (e.g. when building a very sophisticated machine learning algorithm, or thinking about the business aspect of a data project), it’s better to turn it off and have some silence.
Anyway, when I do want music, I use Spotify.
This is my current computer setup for data science
So this is my list of apps, programs and software that I have as the most important parts of my data science computer setup right now.
Most probably, it’ll change and expand over time — if so, I’ll update this article.
In the meantime, I’d be curious to see yours. If you have any recommendation, drop me an email and I’ll feature the best additions in the article!
- If you want to learn more about how to become a data scientist, take my 50-minute video course: How to Become a Data Scientist. (It’s free!)
- Also check out my 6-week online course: The Junior Data Scientist’s First Month video course.
Cheers,
Tomi Mester
Cheers,
Tomi Mester