Learning Data Science in 2023 (4 Untold Truths)

Did you flirt with the idea of learning data science? You are not alone. This has been a really hot topic in the last few years and it will be one in the upcoming few, for sure. Many aspiring data scientists start the journey. Yet, only a very few of them actually become data scientists.

Why?

Part of the problem is that a newcomer to the data world don’t know what to expect from this field. Or even worse, based on the many misleading — sometimes scammy — “how to become a data scientist” tutorials, they have false expectations. And when they hit the wall, they get demotivated and quit.

In this article, I want to show you 4 untold truths that you should know about learning data science – and I have never seen them written down anywhere else before.

Untold truth #1: Learning Data Science is Hard!

Learning data science is not easy.

It will take a lot of work, a lot of energy and a lot of time from you.

I have seen an ad recently in my Instagram feed that said:
“Take this course and master data science in 1 month!”

And I was like: what the fudge!?

I’ve been working as a data professional since 2012. I’ve held senior data scientist positions (in addition to teaching). But I wouldn’t say that I mastered data science or analytics. After all these years, I still have a lot to learn. But one thing is for sure: no one, I repeat no one can master data science in 1 month!!! In fact, my personal estimation (based on students I worked with) is that from zero to the junior level the learning process will take ~6-9 months.
(More about that in this free course: How to become a data scientist.

Learning data science is hard!
A few online education platforms imply the opposite.

  • “Just change one word in this query. Run it! And boom, you’ve learned SQL…”
  • “Just watch this video course of the instructor running Python code, and you will know Python, too…”
  • “Just play around with this interactive chart and you will understand regression analysis immediately…”

Two years ago, I interviewed a young dude for a junior DS position. He didn’t have any hands-on experience. It turned out that he learned SQL on a popular “just-type-your-code-into-the-browser” kind of online learning platform. (I won’t name the exact platform here. :-))

I gave him a computer with an SQL manager open – and a simple real-life task. He had to JOIN two SQL tables, then do a simple segmentation task. He couldn’t solve it! He ran into syntax errors, he couldn’t debug his code, he didn’t get the context, he couldn’t discover the data…

And that’s when I realized that many of these online schools give people only the illusion of data science knowledge.

You want to have real data science knowledge

You want to have real data science knowledge.
But what does it take?

Well, first and foremost: (1) a lot of practicing (2) in true-to-life data environments.
Don’t try to skip forward: take the time and the energy and set up your own data server!

Yes, sometimes (well, quite often in the beginning) you will mistype a code-snippet, your computer will throw an error and it will be very annoying. But this is how it works! We make mistakes, we learn from them and next time we will do much better.

And also take the time to practice a lot!

When you practice, it’s okay to make stupid mistakes. For instance, it’s okay to accidentally mess up your previously built data pipelines and lose hours of work. This happens from time to time with my students. But again: we all do stupid things in real life data projects, too. At least, I did in my junior years and it cost me a lot of extra work-hours. But that was an excellent way to learn my data science lessons.

We make mistakes, we learn from them and we don’t make them again.

Note: How to practice? I shared a few ideas (and even more) in the above-mentioned free online course: How to become a data scientist?

Learning data science is not easy and it will take time. If you can’t accept this fact, then maybe this profession is not the best choice for you. But if you are okay with learning data science the hard way, this learning period of a few months will be one of your best long-term investments. (I’ll get back to this below.)

The Junior Data Scientist's First Month

A 100% practical online course. A 6-week simulation of being a junior data scientist at a true-to-life startup.

“Solving real problems, getting real experience – just like in a real data science job.”

Untold truth #2: It’s not “learning data science”, it’s “improving your data science skills”

The world changes really fast and it won’t get any slower.

And I seriously believe that if one wants to keep up with the pace, the only way to do it is by focusing on improving skills.

Why?
You might already have heard that according to researchers’ predictions, ~65% of today’s grade schoolers will hold jobs that don’t exist yet.

You might also have heard that the current estimated “half-life” of engineering related information is ~4 years. So 50% of the things you learn today regarding IT will be outdated in ~4 years.

learning data science
source: Shift Happens presentation

What does it mean for you?
That the skills you acquire and improve are way more important than the actual information you learn.

It also means that “learning data science” is not about learning data science.

It’s about:

  • improving your coding skills.
  • improving your business skills.
  • improving your mathematical/statistical skills.
  • improving your data visualization, presentation, communication and other soft skills.

Learning data science is not about:

  • Learning a certain package of Python.
  • Learning the different industry benchmarks for this or that KPI.
  • Learning certain statistical models.
  • Learning how to use Google Data Studio, Tableau, Power BI, etc.

What seems important today, might be irrelevant in 5 years!

Because mastering, for instance, the scikit-learn library or Google Data Studio might seem important today… but I bet that there will be a better machine learning package and a better data visualization tool in ~5-10 years.

Don’t get me wrong, I still think that today, you should learn these things because they are part of the current data science and analytics ecosystem. And they are also part of the learning curve itself.

I’m saying that you should keep in mind that when you learn these (or any other) tools, the important thing is not to cram in every little syntax detail or which button is where in the specific software – but to understand the big picture. Why does this tool work the way it works? What’s the underlying logic? How does this function work in other similar tools? Once you get these, changing between tools (even between programming languages) will be easy as pie.

And you will be much more prepared for the ever-changing future.

So to future-proof your data science career: focus on your skills and not on the information you learn!

image source: Nubelson Fernandes on Unsplash

Untold truth #3: Because it’s hard, Learning Data Science is a great investment

Let’s talk about career perspectives, too!

Learning data science is a great short and long-term investment.

I guess I don’t have to explain the short-term investment part.

Check out the LinkedIn Workforce Report for the US (August 2018)! It says:
“Demand for data scientists is off the charts … data science skills shortages are present in almost every large U.S. city. Nationally, we have a shortage of 151,717 people with data science skills, with particularly acute shortages in New York City, the San Francisco Bay Area, and Los Angeles.”

This didn’t change much ever since.

Here’s another research: according to Glassdoor, Data Scientist was ranked as the best job four years in a row in the USA! And it’s constantly in the top 3 since 2016.

learning data science 2
source: glassdoor.com

Note: the above numbers apply to the US only – I don’t have hard data for the EU or any other parts of the world. But in my experience, in the EU we have the same trends.

High demand and persistent shortage puts data scientists into a really good position. It means:

  • Higher salary and better benefits
  • Better job security
  • Better work conditions (e.g. flexible hours, working from home, etc.)

Besides, data scientist is a well-respected job within the company (and in the outer world, too). You will be someone who your managers and colleagues want to listen to.

The point is: learning data science is a good short-term investment, for sure.

But is learning data science a good long-term investment, too?

My answer is a big YES! And I have two reasons.

REASON #1:
Again: just look at the data! In 2018 the shortage of data professionals in the US was ~150,000 people. This number was ~140,000 in 2011. So in 7 years, the market couldn’t produce enough new data scientists to fill up the gap. It even grew a bit. And this trend didn’t seem to change too much ever since.

REASON #2:
This is something that I’ve already mentioned in the intro:
Many people want to learn data science… yet, not too many of them become data scientists after all.
Why? Because learning data science is hard. It’s a combination of hard skills (Python, SQL, statistics, data visualization tools, etc.) and soft skills (like business skills or communication skills) and more.
This is an entry limit that not many students can pass. They got fed up with some part — statistics, or coding, or too many business decisions — and quit.

So the question is:

If yes, it will be one of the best career investments of your life!

Untold truth #4: Learning Data Science is not about learning Machine Learning, Deep Learning (or any other data buzzwords)

If you had to guess, what would you say is the most time-consuming part of the data scientist job?

Or in other words, what do you think you’ll need to work on the most when practicing data science and analytics for real?

Hint: it’s not Machine Learning!

The answer is…
.
.
.
…data cleaning.

Data scientists often say: “80 percent of data science is data cleaning. And 20 percent is complaining about data cleaning.”
Okay, obviously, that’s a joke.

But when you get into your first data science role, you will see for yourself that it’s not about doing machine learning and predictive analytics 24/7!

Because to be able to run a proper machine learning algorithm, you have to complete many other steps first:

  • data collection
  • data formatting
  • data cleaning
  • transforming your data to the right format
  • discovering and understanding the data
  • running other data analytics projects
  • data visualization
  • automating the above steps
  • and so on…

And believe me when I say: when you are working with real data, these things are just as exciting as the machine learning and predictive analytics parts.

What’s important for you now?

When you are learning data science, you should not focus on polishing your machine learning skills. Instead you should focus on:

  • being fluent with Python and SQL
  • being familiar with the basics of statistics
  • understanding the business logic behind simpler analytical methods
  • practicing and experiencing the pain of working with a raw and uncleaned data set
  • learning how to automate things
  • and so on…

These things will help you to become a better data scientist and eventually get your first job. Another deep learning or artificial intelligence course won’t.

So to summarize:

  • Learning Python and SQL –» important
  • Learning about Deep Learning –» not important
  • Learning the basics of statistics –» important
  • Learning about AI –» not important
  • Practicing data cleaning, data formatting and automation –» important
  • Understanding “artificial neural networks” –» not important

At least, at the junior level…
Later on (in 1 or 2 years), when your career moves forward, you will have to learn these above-mentioned, fancy machine learning methods on the job, anyway.

But for now: focus on the things that are important for your next step!

Conclusion

I know: being a data scientist, a machine learning guru, a master of deep learning… These all sound exciting. And you will get there eventually.

(I mean, if you want to. For instance, I take much, much more enjoyment from working on simpler analytics projects that have bigger impacts on business. E.g. a sophisticated segmentation project rather than a deep learning project.)

But think about everything that I’ve written above: accept that learning data science is hard, focus on your skills, consider it an investment and learn the basics first!

Cheers,
Tomi Mester

Cheers,
Tomi

The Junior Data Scientist's First Month
A 100% practical online course. A 6-week simulation of being a junior data scientist at a true-to-life startup.