I’ve been part of many job interviews – on both sides of the table. The most fun, but also the most feared, part of the process is the technical screening. In this article, I’ll show you three SQL test exercises that, in my experience, are quite typical in data analyst job interviews — as of 2022. (And hey, these are “sample” SQL interview questions but they are heavily based on reality!)
Before the tasks – What can you expect in an SQL technical screening?
There are two common ways an SQL tech screening can be done.
The simpler but less common way is that you get a computer, a data set, and a task. While you are solving the task, the interviewers are listening and asking questions. A little trial-and-error is totally fine, as long as you can come up with the correct solution in a reasonable amount of time.
The other, more difficult (and by the way much more common) way is the whiteboard interview. In this case, you don’t get a computer. You have to solve the task and sketch up the code on a whiteboard. This means that you won’t get feedback (at least not from a computer) on whether you made a logical or a syntax error in your code. Of course, you can still solve the tasks by thinking iteratively. You can crack the different SQL problems one by one… But you have to be very confident with your SQL skills.
Additionally, usually, you have to solve the tasks on the fly. Maybe you will get 3-5 minutes of thinking time but that’s the maximum you can expect.
I know, this sounds stressful. And it is. But don’t worry, there is some good news, as well. Because companies know that this is a high-stress interview type, compared to the real-life challenges, you will get relatively simpler tasks. (See the difficulty level below!)
SQL tech assessments in 2022
There are several types of SQL tech assessments. The one that I described above (and for that, I’ll provide a few exercises below) is the most common one. When people say “SQL tech screening,” they usually refer to that. To be more precise, I like to call it “in-person SQL screening.”
But, in fact, there are four different types of SQL assessments:
- In-person SQL screening. The one that we discussed so far (and will discuss in the rest of the article).
- SQL quiz questions. For example: “What is a primary key?” Or “List the different types of
JOIN
s!” That’s a stupid type of SQL tech assessment — as it focuses on theory and not on practice. Still, some companies… you know. - Take-home SQL assignment. You get a more complex task and you’ll have to write multiple SQL queries to solve it. The upside is that can work from home, as you get the task and the dataset by email. You get these on a workday you choose, and you’ll have ~12 hours to solve it and send the solution back (SQL queries and a short presentation). I like this assessment type, as it creates a less stressful environment for the applicant.
- Automated SQL screening. With the rise of remote work, automated SQL screening becomes more common. It’s usually a one-hour process with a few simpler SQL tasks – that you can solve from home via a browser. This interview type is not very personal, but I like it as it’s less stressful and more flexible (e.g. you can skip tasks and go back later).
When someone asks you to do an “SQL tech screening,” either of the above can come up. Still, the most common is the in-person SQL screening. So let’s see a few examples of that!
Test yourself!
Here are three SQL interview questions that are really close to what I actually got or gave on data analyst/scientist job interviews!
Try to solve all of them as if they were whiteboard interviews!
In the second half of the article, I’ll show you the solutions, too!
How to Become a Data Scientist
(free 50-minute video course by Tomi Mester)
Just subscribe to the Data36 Newsletter here (it’s free)!
Thank you!
You have successfully joined our subscriber list.
SQL Interview Question #1
Let’s say you have two SQL tables: authors
and books
.
The authors
dataset has 1M+ rows. Here’s a small sample, the first six rows:
author_name | book_name |
author_1 | book_1 |
author_1 | book_2 |
author_2 | book_3 |
author_2 | book_4 |
author_2 | book_5 |
author_3 | book_6 |
… | … |
The books
dataset also has 1M+ rows and here’s the first six:
book_name | sold_copies |
book_1 | 1000 |
book_2 | 1500 |
book_3 | 34000 |
book_4 | 29000 |
book_5 | 40000 |
book_6 | 4400 |
… | … |
Create an SQL query that shows the TOP 3 authors who sold the most books in total!
(Note: Back in the days, I got almost this exact SQL interview question for a data scientist position at a very well-known Swedish IT company.)
SQL Interview Question #2
You work for a startup that makes an online presentation software. You have an event log that records every time a user inserted an image into a presentation. (One user can insert multiple images.) The event_log
SQL table looks like this:
user_id | event_date_time |
7494212 | 1535308430 |
7494212 | 1535308433 |
1475185 | 1535308444 |
6946725 | 1535308475 |
6946725 | 1535308476 |
6946725 | 1535308477 |
… | … |
…and it has over one billion rows.
Note: If the event_date_time
column’s format doesn’t look familiar, google “epoch timestamp”!
Write an SQL query to find out how many users inserted more than 1000 but less than 2000 images in their presentations!
(Note: I personally created and used this interview question to test data analysts when I was freelancing and my clients needed help in their hiring process.)
SQL Interview Question #3
You have two SQL tables!
The first table is called employees
and it contains the employee names, the unique employee ids, and the department names of a company. Sample:
department_name | employee_id | employee_name |
Sales | 123 | John Doe |
Sales | 211 | Jane Smith |
HR | 556 | Billy Bob |
Sales | 711 | Robert Hayek |
Marketing | 235 | Edward Jorgson |
Marketing | 236 | Christine Packard |
… | … | … |
The second SQL table is called salaries
. It holds the same employee names and the same employee ids – and the salaries for each employee. Sample:
salary | employee_id | employee_name |
500 | 123 | John Doe |
600 | 211 | Jane Smith |
1000 | 556 | Billy Bob |
400 | 711 | Robert Hayek |
1200 | 235 | Edward Jorgson |
200 | 236 | Christine Packard |
… | … | … |
The company has 546 employees, so both tables have 546 rows.
Print every department where the average salary per employee is lower than $500!
(Note: I created this test question based on a real SQL interview question that I heard from a friend, who applied at one of the biggest social media companies (name starts with ‘F.’ ;))
Solution of SQL Interview Question #1
The solution code is:
SELECT authors.author_name, SUM(books.sold_copies) AS sold_sum FROM authors JOIN books ON books.book_name = authors.book_name GROUP BY authors.author_name ORDER BY sold_sum DESC LIMIT 3;
And here is a short explanation:
1. First you have to initiate the JOIN
. I joined the two tables by using:
SELECT * FROM authors JOIN books ON books.book_name = authors.book_name;
2. After that, I used a SUM()
function with a GROUP BY
clause. This means that in the SELECT
statement I had to replace the *
with the author_name
and sold_copies
columns. (It’s not mandatory to indicate from which table you are selecting the columns, but it’s worth it. That’s why I used authors.author_name
and books.sold_copies
.)
3. Eventually, I ORDER
ed the results in DESC
ending order. (Just for my convenience, I also renamed the sum
column to sold_sum
using the AS sold_sum
method in the SELECT
statement.)
Solution of SQL Interview Question #2
The solution SQL query is:
SELECT COUNT(*) FROM (SELECT user_id, COUNT(event_date_time) AS image_count FROM event_log GROUP BY user_id) AS image_per_user WHERE image_count < 2000 AND image_count > 1000;
The trick in this task is that you had to use the COUNT()
function two times: first, you had to count the number of images per user, then the number of users (who fulfill the given condition). The easiest way to do that is to use a subquery.
- Write the inner query first! Run a simple
COUNT()
function with aGROUP BY
clause on theevent_log
table. - Make sure that you create an alias for the subquery (
AS image_per_user
). It’s a syntax requirement in SQL. - Eventually, in an outer query, apply a
WHERE
filter and aCOUNT()
function on the result of the subquery.
Solution of SQL Interview Question #3
Solution:
SELECT department_name, AVG(salaries.salary) AS avg_salaries FROM employees JOIN salaries ON employees.employee_id = salaries.employee_id GROUP BY department_name HAVING AVG(salaries.salary) < 500;
Note: You can solve this task using a subquery, too – but in an interview situation the interviewer will like the above solution better.
Brief explanation:
1. First JOIN the two tables:
SELECT * FROM employees JOIN salaries ON employees.employee_id = salaries.employee_id
Watch out! Use the employee_id
column – not the employee_name
. (You can always have two John Does at a company, but the employee id is unique!)
2. Then use the AVG()
function with a GROUP BY
clause — and replace the *
with the appropriate columns. (Just like in the first task.)
3. And the last step is to use a HAVING
clause to filter by the result of the AVG()
function. (Remember: WHERE
is not good here because it would be initiated before the AVG()
function.)
Watch out: in the HAVING
line, you can’t refer to the alias – you have to use the whole function itself again!
Prepare for SQL tech screenings by practicing!
If you solved all these questions properly, you are probably ready for a junior or even a mid-level Data Analyst SQL technical screening.
If not, let me recommend my new online course: SQL for Aspiring Data Scientists (7-day online course) – where you can level up (or brush up) your SQL skills in only 7 days. When you finish the course, just come back to this article and I guarantee that you will be able to solve these questions!
And if you are just about to start with SQL, start with my SQL For Data Analysis series on the blog!
And ultimately, if you feel that you are ready for a junior data scientist position but you want to try out how it works before you apply for a job, take my 6-week data science course:
The Junior Data Scientist's First Month
A 100% practical online course. A 6-week simulation of being a junior data scientist at a true-to-life startup.
“Solving real problems, getting real experience – just like in a real data science job.”
Conclusion
The hard part of these SQL interview questions is that they are abstract. The tasks say to “imagine the data sets” and show only a few lines of them. When you get an exercise like that, it helps a lot if you have seen similar datasets and solved similar problems before. I hope solving the tasks in this article will boost your confidence!
If you have questions or alternative solutions, don’t hesitate to send them in via email and I’ll review them for you!
- If you want to learn more about how to become a data scientist, take my 50-minute video course: How to Become a Data Scientist. (It’s free!)
- Also check out my 6-week online course: The Junior Data Scientist’s First Month video course.
Cheers,
Tomi Mester
Cheers,
Tomi Mester