We all know the old catch-22 — you need a job to get job experience and job experience to get a job. Luckily, that’s not entirely true in data science. You can use personal data science projects to demonstrate your skills to prospective employers — especially for landing your first data science job.
But where do you start? It’s important to pick a project you can showcase effectively. And it’s just as important to know how to include it in your resume or CV.
When you’re just starting to look into putting together your own data science project, you might feel a bit overwhelmed. In this post, I’ll guide you through the data science personal project process — from how to pick a good project topic to how to actually utilize your data science projects in your application.
What to think about before picking a data science project topic
Before you start brainstorming topics, it’s important to think about the point of these projects: to show prospective employers you have strong technical skills and a knack for presenting data science results.
During a standard application process, you really have two opportunities to show and discuss your projects to the hiring team: a non-conversational opportunity (so either on your resume/CV or on your personal website — more on this later) as well as during an actual interview.
You need your project topic to work well in both capacities. Is it easy to digest and is it skimmable, so a recruiter or a hiring manager can quickly read it and understand it? Can you elaborate and discuss it at length to an interviewer?
So you might be thinking — wait, skimmable? I’m doing a bunch of work so a recruiter or a hiring manager might skim my data science project?
It’s true. The reality is that (at least during the early stages of the job application process) your application will be skimmed. And this includes your personal projects. Now, if a project catches their eye, a recruiter or hiring manager will spend more time reviewing your work. Which brings me to my next point: pick a project topic that will make potential recruiters and hiring managers say, “Huh. That’s actually pretty cool.”
Lastly: how many projects do you really need? I personally believe 2-3 good, interesting side projects is more than enough. Hiring companies just won’t spend the time looking through and reading the 4, 5, 6+ projects you have.
How to pick your dataset
The process of brainstorming your project topic starts off fairly straightforward. I recommend you begin by Googling “free public data” to get a general idea of what data is out there (or visit Google’s dataset search feature) — and what you might be interested in working with.
(Spoiler: there are TONS and TONS of free public datasets out there).
Before getting into data science, I came from an economics research background — so I knew a ton about where to find and how to analyze U.S. economic data. For one of my projects, I experimented with R’s ggplot2 and created aesthetically-pleasing charts to show economic trends using data from the Federal Reserve’s Economic Database. I was able to explain this project during one of my interviews because the panel was impressed by the visualizations I constructed…
Moral of the story: companies are impressed when you have a portfolio of projects. And personal projects give you the chance to discuss work that you know a lot about and are passionate about.
If you’re still struggling for inspiration, a great strategy is finding a way to weave together data and pop culture. I’m a huge T.V. comedy fan; one of my favorite shows of all time is Parks and Recreation. It’s fairly easy to take one of your favorite shows or movies, find the script online, scrape the show/movie dialogue, and do some basic text analysis. If you’re intrigued with blending data science and pop culture but need more inspiration, I highly recommend the website Pudding.cool. (It’s also just a fantastic website to browse.)
Okay, so to summarize: start by thinking about a topic that you’re interested in. Google “free public data” if you need some inspiration–and don’t be afraid to get creative!
How to decide what to analyze
Once you’ve decided on a dataset you’d like to explore, the next step is actually figuring out what questions to answer and what to analyze. If you recall what I said earlier: the best data science personal projects are eye-catching and skimmable. And the easiest way to make them that way is to create an awesome visualization.
No matter what you analyze, what question you try to answer, or what methodology you use, you need to think about how you will visualize your results. When you’re exploring your dataset, start thinking about possible trends or different ways you can segment the data.
Let’s revisit my Parks and Recreation example from before. Using the show dialogue, you can create a visualization to see which characters had the most lines. Or find out (if you’re familiar with the show this will make more sense) were Leslie Knope and the rest of the Parks Department really that mean to Jerry?
You might feel like you need to shoot for the moon and put together some technically-astounding machine learning project in order to impress a hiring team. If you have a strong background in statistics and programming and a lot of time — more power to you. However, a project like this is in no way necessary for getting hired as a data scientist. This may be a subject for another blog post, but in my experience, aspiring data scientists seem to immediately jump to fancy machine learning or deep learning tutorials — and forget about learning the basics and honing their problem solving, critical thinking, and presentation skills.
If you’d like to go for an in-depth machine learning project — that’s great. But if you don’t, rest assured that simply answering an interesting and insightful question with your dataset is more than enough.
How to start building your projects
Once you have settled on how you will analyze your dataset, the next step is to start coding. What’s most important here is writing clean, easy to read, and well-commented code. (This is good practice in general–but especially important for your data science projects.)
Once your code is written, the best way to display your code (and demonstrate to prospective employers that you can code) is to set up a GitHub account.
Already have a GitHub? Awesome. Just pin the repos you want people to see and add clear and concise READMEs that explain what your project is about.
Don’t have a GitHub? Confused what “pin the repo” means? Then I recommend you create a GitHub account and read this introduction.
GitHub is a fantastic place to demonstrate your programming ability to hiring managers. Just make sure that in addition to having clean and well-commented code, you also include a README file explaining your motivation and what your project is about.
How to present your projects in your CV/resume
Let me just mention this one more time: the point of these projects is to show prospective employers you have strong technical skills and a knack for presenting data science results.
With that in mind, let’s revisit my Parks and Recreation example and I’ll show you how I’d present this project on my resume/CV:
Parks and Recreation Dialogue–Visualized
This project is an analysis of my favorite T.V. show, “Parks and Recreation.” I used R’s ggplot2 to construct the visualizations and Python’s BeautifulSoup to scrape the show dialogue.
Okay, so a couple of things to notice: one, yes, this is short. However, space on your resume is scarce. You have your job experience, skills, education, and contact information taking up space. If you’re discussing 2-3 projects (with 1-2 bullet points each), that can easily take up over a third of your resume (and your resume needs to stay one page, of course!).
Also a topic for another blog post — but you don’t want your resume to become cluttered. More is not always better — short and skimmable is the name of the game.
It’s also important to notice that I mention the packages I used in my project. This signals your programming proficiency and gives recruiters keywords to see. (Oftentimes, recruiters are looking for certain keywords while reviewing resumes.)
Yes, this description is short, and yes it’s disappointing to do a bunch of work and not be able to fully explain and outline your project on your resume. But you have two more opportunities to go more in depth about your projects: on your website and during an actual interview.
In an ideal world, recruiters and whoever else is reviewing your resume would spend 5-10 minutes looking over your resume, carefully reading each bullet point, and fully grasping your skills and experience. However, that’s just not the case. Your resume/CV will be skimmed. Oftentimes, the people who are able to succinctly demonstrate their skills and experience end up getting the interviews. So, write short descriptions. Include keywords. Avoid clutter.
How to present your projects on your website
Your website gives you the opportunity to showcase your personal projects in depth.
As I mentioned before, the best projects to display are ones that can be succinctly presented — meaning, you have a well-constructed plot or table and a clear description of the project that is a few sentences to a paragraph or so in length. Also — don’t forget to include a link to your code!
Below is how I’d present my Parks and Recreation example on my website (note: this is just an example, not an actual analysis of the show):
At this point you’re probably tired of listening to me say how you need your analysis to be clear and concise. But this point is incredibly important! The biggest struggle with data science departments is being able to effectively communicate their findings to the rest of the company to help make data-driven business decisions. If you’re able to show the hiring manager that you can clearly present your analysis (whether it is a simple visualization or a fancy machine learning model) you will stand out in the interview process.
I’ve always been one to preach simplicity and clarity over anything else — especially for your first data science job. Unless you’re coming from a technical PhD program, companies just aren’t expecting first-time data science applicants to be able to take on difficult machine learning tasks (if a company does expect that from a first-time data science applicant, that company’s data team is a mess).
Your personal data science projects are a fantastic way to showcase your technical skills, presentation skills, and creativity. If you focus on writing clean code and having clear visualizations and an insightful analysis you’ll be well on your way to landing your first data science job.