There are three popular languages for data scientists: SQL, Python and R. I create a lot of tutorials about SQL and Python on my blog… But I never talk about R. So the question provides itself:
Is R required for data science?
I know I’m opening Pandora’s box with my answer — and many R fans will hate me for it — but I’ll say it anyway… No, R is not required for data science.
And in this article, I’ll explain why I think so.
Note: this article is available in Youtube video format here — and in podcast format here (and on Spotify, iTunes, Google Podcast, etc.)
Python vs. R
From the perspective of what you can achieve — Python and R are pretty similar to each other. Especially if you use Python with pandas. You can run similar methods and algorithms, you can wrangle the data in similar formats and you can get similar results.
One would say, R and Python are sort of interchangeable with each other.
And that’s the only reason why I say, R is not required for data science. Because if you know Python already, you can just stick with Python and do everything in it that you’d do in R.
So if you know Python already, you don’t have to learn R, for now. (I’ve been using Python for 6+ years now — and I haven’t had any projects that I couldn’t have done with it during this time.)
And, obviously, it works the other way around, too: if you know R already, you probably don’t have to learn Python, for now.
But there is one more important question here:
If you don’t know either R or Python, which one should you learn?
I recommend learning Python.
Why?
#1 Python is more popular than R
There is a clear trend that shows that Python is getting more and more popular within data science compared to R. Just take a look at this google trends chart.
I know… we should not draw conclusions from one chart only. But many other analyses from the last few years have reinforced the same take-away: more and more data professionals prefer Python over R.
What does this mean for you if you are an aspiring data scientist?
That there’s a higher and higher chance that in the workplace where you get started as a data scientist, you will have to use Python and not R.
By the way, research also showed that R was a preferred language often in the academic world while Python was used more often in the business world. This also helped Python get stronger in the last few years. As the popularity of data science grew, Python became more widely used everywhere.
Note: I have to mention here, that yes, it can happen that you only know Python and the company you’ll start to work for will require using R. And, by the way, it can happen the other way around: you know R and the company requires Python. Either way, because the two languages are pretty similar to each other, if you know one you can easily learn the other.
The point is: learning Python is just a better investment of your time compared to R, right now.
#2 Python is easier to learn than R
R’s learning curve is much steeper in the beginning. For most aspiring data scientists, this is the number one reason to get started with Python instead.
The syntax of Python is just simpler and more intuitive. And while learning coding for data science is hard in general, it’s still easier to do it in Python than in R.
Just compare these two code snippets!
Both solve the same problem — printing numbers that are divisible by 3, by using a for loop and an if statement:
For loop + if statement in R:
for (i in 1:20){
if (i %% 3 == 0) {
print(i)
}
}
For loop + if statement in Python:
for i in range(1,20):
if i % 3 == 0:
print(i)
In my opinion, the Python solution is just more elegant.
But don’t let me cherry pick, go ahead and Google a few Python and pandas cheat sheets and compare them to R cheat sheets. And you’ll see the overall difference between the syntaxes of the two languages.
#3 Python is a general-purpose language — R is mainly for statistics
Another big advantage of Python is that it’s not just for data science. There’s a good chance that the company that you’ll work for will use it for other things, too. Like web development, building apps or creating API connections.
R is rarely used for anything else but statistics and data science.
That means that if you learn Python, you’ll have more opportunities and flexibility. You’ll understand other people’s code better. It”ll be easier to write compatible code with other projects at the company you work for. And you’ll also be able to build other projects for yourself in addition to data science scripts. (E.g. I built this small web-based application using a Python-based web framework called Flask.)
That’s another big plus for Python.
A reader’s comment
Alejandro (a reader of the Data36 blog) brought an interesting additional perspective to this R vs. Python question. I found it very valuable, so let me highlight it here:
“I had been using R for years for social statistics and it’s just during the pandemic that I gave Python a try. I love it some much that now I’m considering not coming back to R, except when needing that one very specific statistical library.
The reasons are many and similar to what you say regarding the syntax and the fact that Python is a general language. I also have the following impression, it would be interesting to know what you think about it:
Since R is mainly developed for and by statisticians, instead of programmers, a lot of how it is used just feel less polished and professional. For example, in my experience with R, everything seems less careful when managing requirements. I’m not sure if this is due to the language itself or due to the culture of the people that use R (as I said, most are not programmers but statisticians), but I’ve had many headaches with scripts (from online courses, colleagues, etc) that don’t run because one of the packages used changed its syntax 3 days ago.
In the Python world everyone seems to take more care by providing you with requirements files or simply telling you which versions to use, and setting up virtual environments to manage this is very easy.
Another example: the R package for Ubuntu had an issue some years ago (and I was affected by it) because the maintainer used a short apt-secure key. Some computer science friends of mine laughed telling me this happened was because R is not really developed by programmers. I didn’t really understand what they meant at the time, but after learning Python and seeing how unpolished R feels sometimes it is clearer what they were pointing to.
That’s my two cents on the topci… ;)”
Thanks, Alejandro!
Disclaimer: R is a great language
So as you see, I pretty much prefer and recommend Python over R. I mentioned all my most important reasons. And there could be more arguments on both sides.
But before I wrap this up, let me add one crucial thing here, to make sure you won’t misunderstand me:
R is a great language for data science and statistics!
It’s just that I think that Python is even better — and that’s especially true when you are an aspiring data scientist.
But with that being said, you should feel free to check out R, as well — and give it a try if you want to. Maybe for you, specifically, it will click better than Python. And if so, feel free to go with that instead.
Conclusion
Okay, I hope I didn’t hurt anyone’s feelings here. I just wanted to help you decide whether you should learn R or not. So the answer is simple, R is not required for data science. If you know it already, that’s great. But if you can choose, I recommend going with Python instead.
- If you want to learn more about how to become a data scientist, take my 50-minute video course: How to Become a Data Scientist. (It’s free!)
- Also check out my 6-week online course: The Junior Data Scientist’s First Month video course.
Cheers,
Tomi Mester
Cheers,
Tomi Mester