Get data for your data science side projects!
Are you working on a data science side project -- but you don't know how to get the actual raw data for it?
You are not alone, this is a very typical problem, but don't worry because I have a solution to fix it. (Well, actually five different solutions.)
And I'll show them to you in my online course called the Data Source course.
"in the Data Source course -- well, as the name suggests -- I'll give you access to a lot of different data sources."
Get data for your data science side projects!
I see many aspiring and junior data scientists starting to work on hobby projects. That's awesome!
We all know why these projects are awesome:
But as I just learned, for beginner data scientists, there is a huge roadblock in these projects right at the very first step. Namely, they simply just don't know how they could get access to datasets they could work with.
If you landed here (and you are still reading), probably, this is an issue for you, as well… Well, not anymore because I've created the Data Source course to solve this very problem.
Introducing the Data Source course!
In the Data Source course (well, as the name suggests) I'll give you access to a lot of different data sources.
I put them into five modules:
MODULE #1 -- REAL LIFE DATASETS
The simplest solution. I'll just go ahead and give you immediate access to datasets from a few of my real life projects. These datasets are unique and can't be found anywhere else on the internet. And more importantly, they are from real life -- so you'll see all kinds of exciting things in them that you usually can't see in pre-prepared datasets from other online courses.
In this module, I published three datasets:
MODULE #2 -- APIs
In the second module, I'll show you how APIs work. With an API, you can query real-life data from different online applications. In this course specifically, I'll show you:
MODULE #3 -- ARTIFICIALLY GENERATED DATASETS
I'll also give you access to a few randomly generated datasets. I created these with Python-based random generator scripts that I built specifically for this course. This includes a simpler dataset called "dogs vs. cats." And a dataset of a simulated online e-commerce shop.
As an extra, I'll also give you access to the random-generator Python scripts themselves, so you will be able to modify and re-use these random generators to create as much raw data as you just need -- also to see my Python code and based on that figure out how you can build similar things for yourself.
MODULE #4 -- WEB SCRAPING
I'll show you a web scraping example. You can already find pretty detailed step by step web scraping tutorials on my blog, data36.com. But in the course, I'll show you one more example, well, in fact, the most popular web scraping example: scraping wikipedia. If you learn how to scrape wikipedia pages with Python, you can get access to the raw data of over 6 million articles.
MODULE #5 -- OPEN DATASETS
And if these four are not enough, in the last module, I'll also give you an exhaustive list of open datasets, so you can go ahead and browse for more raw data from all around the internet.
If getting raw data was a problem before...
So, I guess you get the point. If getting raw data was a problem before, after this course your only issue will be having access to too much data!
Let me just highlight an important thing that you might suspect already.
By taking this course, you won't just get the fish, you'll learn fishing, too!
These five modules (these five different ways of getting access to datasets) are great starting points... But I really hope that after finishing this course:
...because you'll understand how to apply this knowledge to other data sources -- and build, query and get various types of raw data by yourself!
Tomi Mester is a practicing data analyst and researcher since 2012.
He has worked for Prezi, iZettle and several smaller companies as an analyst/consultant.
He’s the author of the Data36 blog where he writes posts and tutorials on a weekly basis about data science, AB-testing, online research and data coding.
He's an O'Reilly author and presenter at TEDxYouth, Barcelona E-commerce Summit and Stockholm Analytics Day. More info about Tomi: check out the intro video. >>
Data Source is a completely self-paced online course - you decide when you start and when you finish. If you enroll now, you'll get immediate access for to all course materials. Also, you'll have lifetime access and you will get all future updates, too.
Price: $97 (plus your country’s VAT if you live in the EU)
ps. If you are from Hungary, please email me before you register: firstname.lastname@example.org.
I worked (and will keep working) really hard to make this course the best available, and I stand behind it 100%.
I understand that enrolling in an online course is not always an easy decision, so I made this decision totally risk-free for you: if you request one, I’ll give you a full refund within the first 30 days.
Enroll for $97 (+ VAT in EU)
If you are from Hungary, please send me an e-mail before you register.
If you’re taking this course, you are probably not an absolute beginner in data science. I'll assume that: you are confident with the basics of Python (for loops, if statements, functions, imports, etc.) you have some basic command line (bash) skills and you have a data science environment to work with. (If you don't have these, please go to my tutorials on data36.com -- or enroll in the Junior Data Scientist's First Month course first.)
Clicking this button will take you to the check-out page where you can pay safely using your credit card or your Paypal account! (If you are registering from the EU as an individual - in accordance with EU law - you have to pay the applicable VAT of your country, too.)
Frequently Asked Questions
Are there any prerequisites?
Yes! If you’re taking this course, you are probably not an absolute beginner in data science. I'll assume that: you are confident with the basics of Python (for loops, if statements, functions, imports, etc.) you have some basic command line (bash) skills and you have a data science environment to work with.
If you don't have these, please go to my tutorials on data36.com -- or finish the Junior Data Scientist's First Month course first.
When does the course start and finish?
The course starts now and never ends! Data Source is a completely self-paced online course - you decide when you start and when you finish.
What if I don't have an SQL environment in place?
Please set one up following my tutorials on data36.com.
How much time does the course take?
It really, really depends on you. Going through all materials can take ~10-15 hours. But if you build actual data projects on these datasets... You know it can take hours, days, weeks, months, years... #lifelonglearning ;-)
How long do I have access to the course?
How does lifetime access sound? After enrolling, you have unlimited access to this course for as long as you like - across any and all devices you own.
What if I am unhappy with the course?
I would never want you to be unhappy! If you are unsatisfied with your purchase, contact me in the first 30 days and I will give you a full refund.
Will I get an invoice?
Yes. Individuals get a receipt in the email. Companies get an invoice. EU-based companies get VAT-invoices. If you need something even more special, just reach out to me (email@example.com) and we will solve your administrative issues!
I want to have this for my whole company!
Happy to hear that! Send me an email to firstname.lastname@example.org and we will sort it out! (I hold 1-day live workshops, too.)