“Should I learn Python 2 or Python 3?” For everyone who has just started to learn Python for Data Science, this is an important initial question to answer. There are many ongoing discussions on the topic and you might have found it hard to get a straightforward answer. I was also having this question for quite a while – as I want to teach the most relevant Python version here, on the blog. So I’ve decided to reach out to practicing senior Data Scientists and asked their opinion about it. After several hours of discussions and research I have a definite answer for you. In this article I will summarize my top takeaways.
tl;dr: you should learn Python 3.
Python 2 vs Python 3 – what’s the difference?
To be honest, given that you are not an engineer, but a practicing/aspiring data scientist, you won’t realize major differences between Python 2 and Python 3.
For a long time Python 3 was claimed to be actually slower, than Python 2, which might sound odd, I know. Either way, in 2017, that’s not the case anymore. Python 3.7 was just released a few days ago (October 06, 2017) and if the promises are kept, this is gonna be the fastest Python version ever. But don’t put too much emphasis on performance anyway. As Wes McKinney says in Python for Data Analysis:
“As Python is an interpreted programming language, in general most Python code will run substantially slower than code written in a compiled language like Java or C++. As programmer time is often more valuable than CPU time, many are happy to make this trade-off.”
And I fully agree with this idea.
There are small, but firstly annoying differences. I was using Python 2 for a long time and learning the small changes, that were made in Python 3, was a bit unpleasant. But once I got used to the new version, all these new things felt so much more logical. I’ll give you two examples. The first one is how the print statement works. In Python 2:
And in Python 3:
The extra-parentheses seem a bit unreasonable in Python 3, but in fact it’s very logical as in Python we use every function in parentheses. And why print wouldn’t be a function? (It wasn’t in Python 2.)
Note: actually the syntax print(“Hello, World!”) is working with Python 2 as well, but it’s not true the other way around – print “Hello, World!” doesn’t work with Python 3.
Note 2: the cherry on the top of the Python 3 version is that print(“Hello, World!”) is the same syntax as it is in R.
Another example of the differences is how the two Python versions are handling the integer division and the fractional part of the result:
The Python-3-way is much more intuitive. (For me at least.)
If you want to learn more about the specific differences, read this article.
Which one to learn? And why?
It all comes down to this question, right? Python 2 vs Python 3! Who’s the winner?
Previously, I have suggested to learn Python 2, because most of the companies are still using that for legacy reasons. But this is not a strong enough argument anymore!
First off, Python 3 is around since 2008, and more than 95% of the data science related features and libraries have been migrated already. So it’s already a fully featured language for data science.
Secondly (and more importantly), Python 2 won’t be supported after 2020. This leads to the fact, that even those companies who were using Python 2 so far, have to migrate to Python 3 soon. Thus learning Python 3 will make you more compatible and more valuable for your next job.
And third, Python 3 is a bit more logical and practical in the little details. And since it’s continuously developed, it will be also much better in terms of performance, than Python 2.
Note: and reason #4 is that on Data36 all of my Python for Data Science tutorials will be in Python 3. 🙂
At this point, if you are new to Python for Data Science, I think, there is no reason to learn Python 2, you should learn Python 3! Invest into the future, not in the past.
If you are still on Python 2…
… consider to learn Python 3. I did it, so I can tell you by personal experience: it’s not a big deal. But if you can’t do it, because you are relying on very special Python 2 libraries, that are not migrated to Python 3 yet, or you are constrained by your company code-base, I still recommend to prepare your code for Python 3.
There is a very nice project called Python Future, that offers a set of libraries to do that!
I didn’t mean to be too dramatic here! As I said above, the difference between Python 2 and Python 3 is not that big at all! Whichever you choose, you can learn the other one in a matter of hours. But in 2017, the winner of the Python 2 vs Python 3 battle is clearly Python 3. So if you can choose which one to learn, choose that!
With that being said, come and continue with learning Python for Data Science.
Otherwise subscribe to my Newsletter, so you won’t miss my new articles and video courses!