The structure of your Data Team. The flow of the data in your organization.

Smaller companies have smaller data teams (maybe just one person), bigger companies have bigger. The tricky thing is that the several aspects of a data project need several very different kind of skills.

How should you build up your data team to make this skills work the best together? I’ll show it to you through an example of a fictional startup.

Your fictional startup flies! Yay.

Note: For the sake of the example I tried to come up with a startup, that doesn’t exist for sure. But you can apply this scheme – I am going to show you here – for any online businesses.

Okay. Let’s say you are one of the founders of Adtrefa.io (Adapt trees from Australia)! You came up with the wonderful idea of helping people to adapt trees from Australia. Cool. You’ve already built the web application – and it flies! In one month you’ve got the first 10.000 users. Then another 10.000 users in the next one. You are on a constant grow, the dream came true.

As a wise startup founder you know that:

  1. yes, your instincts worked, you’ve built a great thing that people care about, and now
  2. it’s time to understand your users and get your first data professional onboard.

But before you start to hire, it’s worth to understand first, how data analysis and data science works in online businesses like yours. Here you go!

The flow of the data between different data teams.

First, let’s see, how already established Data Teams are doing it at bigger companies.

data team 1 teams

Usually the whole process starts with the Tracking Team, which is responsible for data collection. They are passing the data to the Data Infrastructure Team, which takes care of the data storage. From the stored (and sometimes already cleaned, restructured and/or aggregated) data, the Analytics/Data Science Team picks what it needs for its analyses and it turns the data into meaningful insights. Eventually this insights land at the Managers and Decision Makers, who take the action on the learnings!

The different tasks – in your data projects

Parallelly with the chart above – this is the flow of the data between the different tasks:

data team 2 tasks

It has four major steps too:

  1. Collecting the raw data. How to do it right? I’ve written a detailed article about that before. Here: How data collection works?
  2. Storing the data. For business people this step might look banal – but it’s not. In online businesses building a good data infrastructure is not easy or self-evident at all. It’s not necessarily complicated either, but good to know, that there could be many technical questions – especially about scaling (which is maybe the most important factor for a startup).
  3. Analyzing the data. This is the step, where your data turns into meaningful information by different research methods (eg. cohort analysis, funnel analysis, segmentation, predictive analyticsetc…).
  4. Making a business decision. Never forget, that the original goal of your data project was to understand your users and translate data into actions (eg. optimization efforts).

You can break down this data flow into smaller tasks, but I won’t go into that in this article.
Also some people would argue that data cleaning should be mentioned here. I left it out, because the data cleaning can happen on the Data Infrastructure and on the Data Analytics part as well – so I didn’t want to specify, who in your data team should do it.

Different skills for the different steps

As I have described in the intro: the tricky thing is that different parts of a data project need different skills.
The data collection and storage part needs more engineering skills. The analytics part needs a mix of coding, statistics and business knowledge. And the decision part needs a business mindset of course.

data team 3 skills

But this is not black or white! Eg. none of the Managers can make decisions without at least having a high-level understanding of the engineering things. And none of the engineers can build up a meaningful easy-to-use infrastructure without knowing the business and analytical needs.

So who does what in your data team?

  1. As I see, the data collection (aka. tracking) should be done by the same people, who are building the codebase of your product/website. Why? The tracking scripts will go into production. As your developers know your codebase the best, they will be able to write those scripts without breaking anything.
  2. The Data Storage should be built by a data infrastructure expert. The most common name of this position is Data Engineer.
  3. The Analytics and the Data Science part is done by data research experts. The most common name of this positions are: Data Analyst and/or Data Scientist. (There is a slight difference between the two.)
  4. And the business decisions are made by the management or other decision makers.

data team 4 summary

The good thing is, that you will already have the first bubble (developers) and the last bubble (decision makers) in your company. So you only need to fill up the Data Engineering and the Analytics positions…

Who to hire first? A Data Analyst or a Data Engineer?

Good question! Let’s get back to the Adtrefa.io fictional startup project!
Mid size startups (~500 employees) usually have at least 3-4 Data Engineers on the Data Infrastructure Team and around 6-10 Data Scientists and Analysts on the Analytics Team (sometimes Data Scientists and Analysts are split into 2 teams). Of course the exact numbers and the exact structures are differing from company to company.

Your problem – as a startup – is that you can’t and won’t hire 10 people immediately to do data projects.
In a very ideal case you could hire 2 people though:

  1. A data engineer, who is a great fit with your developers in terms of engineering skills. (Fluent in your platform’s programming language and in the common data languages.)
  2. And a data analyst, who knows SQL and Python or R – and who is very talented in business thinking.

If you can hire only one person, I’d go with the Data Analyst first – but in this case make sure, that she/he has above average technical skills as well (preferably fluent in the above mentioned languages: SQL + Python or R). If she/he has a good understanding of technical things – together with your developers they will be able to build a “bridge” until you can hire your first Data Engineer. And with that, you can start your first data project without a well-established Data Infrastructure (Team).
If you go this way, your second hire on the Data Team definitely has to be a Data Engineer, who can focus on building a Data Infrastructure that will scale with your company, when the grow will reach the 1M or 10M users.

Then if you have done this first two data-hires right, these people will be able to advise, how should you extend your data team in the future – based on their needs.

Working together. Communication across the data team.

Communication is key. And this is true for data projects as well. Across your four data teams – everyone should talk to everyone.

data team 5 teams individuals

When developers are passing the data to the Infrastructure Team, they talk and create things together. No question about that.
It’s obvious, that the same happens when the Infrastructure Team passes the data to Analytics Team and when the Analytics Team provides the insights for Managers.

But here, I’d like to emphasize, that subteams, who are not in a daily contact with each other should communicate as well! Eg. managers should know, what Data Engineers are doing. For instance it helps them to understand, why data servers cost so much and what does it mean budget-wise for the company (so they can calculate the ROI of the data projects). Or another example: developers should understand, what Analysts/Data Scientists are doing, because it helps them learning, what kind of data they should collect. And so on and so forth.

Conclusion

Your data flow goes through on these four steps:

  1. Collection
  2. Storage
  3. Analysis
  4. Decision.

And in parallel with that, these four sub-teams will be working on it:

  1. Developers
  2. Data Infrastructure Team
  3. Data Analytics/Data Science Team
  4. Managers.

Everyone in this process should have at least a high level understanding about this whole process. That’s the only way to build an efficient data team and turn your raw data into meaningful, useful and successful decisions!

If you want to be notified first about new content on Data36 (like articles, videos, handbooks, etc.), sign up for my Newsletter!

Cheers,
Tomi Mester

← Previous post

Next post →

2 Comments

  1. Thanks for this insights! Keep up the good work!

Leave a Reply