A year and a half ago, I dropped out of one of the best computer science programs in Canada. I started creating my own data science master’s program using online resources. I realized that I could learn everything I needed through edX, Coursera, and Udacity instead. And I could learn it faster, more efficiently, and for a fraction of the cost.
I’m almost finished now. I’ve taken many data science-related courses and audited portions of many more. I know the options out there, and what skills are needed for learners preparing for a data analyst or data scientist role. So I started creating a review-driven guide that recommends the best courses for each subject within data science.
For the first guide in the series, I recommended a few coding classes for the beginner data scientist. Then it was statistics and probability classes. Then introductions to data science. Then data visualization. Machine learning was the fifth and latest guide. And now I’m back to conclude this series with even more resources.
Here’s a summary of all my previous guides, plus recommendations for 13 other data science topics.
For each of the five major guides in this series, I spent several hours trying to identify every online course for the subject in question, extracting key bits of information from their syllabi and reviews, and compiling their ratings. My goal was to identify the three best courses available for each subject and present them to you.
The 13 supplemental topics — like databases, big data, and general software engineering — didn’t have enough courses to justify full guides. But over the past eight months, I kept track of them as I came across them. I also scoured the internet for courses I may have missed.
For these tasks, I turned to none other than the open source Class Central community, and its database of thousands of course ratings and reviews.
Since 2011, Class Central founder
has kept a closer eye on online courses than arguably anyone else in the world. Dhawal personally helped me assemble this list of resources.
How we picked courses to consider
Each course within each guide must fit certain criteria. There were subject-specific criteria, then two common ones that each guide shared:
- It must be on-demand or offered every few months.
- It must be an interactive online course, so no books or read-only tutorials. Though these are viable ways to learn, this guide focuses on courses. Courses that are strictly videos (i.e. with no quizzes, assignments, etc.) are also excluded.
We believe we covered every notable course that fit the criteria in each guide. There is always a chance that we missed something, though. Please let us know in each guide’s comments section if we left a good course out.
How we evaluated courses
We compiled average ratings and number of reviews from Class Central and other review sites to calculate a weighted average rating for each course. We read text reviews and used this feedback to supplement the numerical ratings.
We made subjective syllabus judgment calls based on a variety of factors specific to each subject. The criteria in our intro to programming guide, for example:
- Coverage of the fundamentals of programming.
- Coverage of more advanced, but useful, topics in programming.
- How much of the syllabus is relevant to data science?
Here are the best courses overall for each of these topics. Together these form a comprehensive data science curriculum.
Subject #1: Intro to Programming
The University of Toronto’s Learn to Program series has an excellent mix of content difficulty and scope for the beginner data scientist. Taught in Python, the series has a 4.71-star weighted average rating over 284 reviews.
An Introduction to Interactive Programming in Python (Part 1) and (Part 2) by Rice University via Coursera
Rice University’s Interactive Programming in Python series contains two of the best online courses ever. They skew towards games and interactive applications, which are less applicable topics in data science. The series has a 4.93-star weighted average rating over 6,069 reviews.
R Programming Track by DataCamp
If you are set on learning R, DataCamp’s R Programming Track effectively combines programming fundamentals and R syntax instruction. It has a 4.29-star weighted average rating over 14 reviews.
Subject #2: Statistics & Probability
Foundations of Data Analysis — Part 1: Statistics Using R and Part 2: Inferential Statistics by the University of Texas at Austin via edX
The courses in the UT Austin’s Foundations of Data Analysis series are two of the few with great reviews that also teach statistics and probability with a focus on coding up examples. The series has a 4.61-star weighted average rating over 28 reviews.
Statistics with R Specialization by Duke University via Coursera
Duke’s Statistics with R Specialization, which is split into five courses, has a comprehensive syllabus with full sections dedicated to probability. It has a 3.6-star weighted average rating over 5 reviews, but the course it was based upon has a 4.77-star weighted average rating over 60 reviews.
Introduction to Probability — The Science of Uncertainty by the Massachusetts Institute of Technology (MIT) via edX
MIT’s Intro to Probability course by far has the highest ratings of the courses considered in the statistics and probability guide. It exclusively probability in great detail, plus it is longer (15 weeks) and more challenging than most MOOCs. It has a 4.82-star weighted average rating over 38 reviews.
Subject #3: Intro to Data Science
Data Science A-Z™: Real-Life Data Science Exercises Included by Kirill Eremenko and the SuperDataScience Team via Udemy
Kirill Eremenko’s Data Science A-Z excels in breadth and depth of coverage of the data science process. The instructor’s natural teaching ability is frequently praised by reviewers. It has a 4.5-star weighted average rating over 5,078 reviews.
Intro to Data Analysis by Udacity
Udacity’s Intro to Data Analysis covers the data science process cohesively using Python. It has a 5-star weighted average rating over 2 reviews.
Data Science Fundamentals by Big Data University
Big Data University’s Data Science Fundamentals covers the full data science process and introduces Python, R, and several other open-source tools. There are no reviews for this course on the review sites used for this analysis.
Subject #4: Data Visualization
Data Visualization with Tableau Specialization by the University of California, Davis via Coursera
A five-course series, UC Davis’ Data Visualization with Tableau Specialization dives deep into visualization theory. Opportunities to practice Tableau are provided through walkthroughs and a final project. It has a 4-star weighted average rating over 2 reviews.
Data Visualization with ggplot2 Series by DataCamp
Endorsed by ggplot2 creator Hadley Wickham, a substantial amount of theory is covered in DataCamp’s Data Visualization with ggplot2 series. You will know R and its quirky syntax quite well leaving these courses. There are no reviews for these courses on the review sites used for this analysis.
An effective practical introduction, Kirill Eremenko’s Tableau 10 series focuses mostly on tool coverage (Tableau) rather than data visualization theory. Together, the two courses have a 4.6-star weighted average rating over 3,724 reviews.
Subject #5: Machine Learning
Machine Learning by Stanford University via Coursera
Taught by the famous Andrew Ng, Google Brain founder and former chief scientist at Baidu, Stanford University’s Machine Learning covers all aspects of the machine learning workflow and several algorithms. Taught in MATLAB or Octave, It has a 4.7-star weighted average rating over 422 reviews.
Machine Learning by Columbia University via edX
A more advanced introduction than Stanford’s, CoIumbia University’s Machine Learning is a newer course with exceptional reviews and a revered instructor. The course’s assignments can be completed using Python, MATLAB, or Octave. It has a 4.8-star weighted average rating over 10 reviews.
Machine Learning A-Z™: Hands-On Python & R In Data Science by Kirill Eremenko and Hadelin de Ponteves via Udemy
Kirill Eremenko and Hadelin de Ponteves’ Machine Learning A-Z is an impressively detailed offering that provides instruction in both Python and R, which is rare and can’t be said for any of the other top courses. It has a 4.5-star weighted average rating over 8,119 reviews.
Subject #6: Deep Learning
Parag Mital’s Creative Applications of Deep Learning with Tensorflow adds a unique twist to a technical subject. The “creative applications” are inspiring, the course is professionally produced, and the instructor knows his stuff. Taught in Python, It has a 4.75-star weighted average rating over 16 reviews.
Neural Networks for Machine Learning by the University of Toronto via Coursera
Learn from a legend. Geoffrey Hinton is known as the “godfather of deep learning” is internationally distinguished for his work on artificial neural nets. His Neural Networks for Machine Learning is an advanced class. Taught in Octave with exercises also in Python, it has a 4.11-star weighted average rating over 35 reviews.
Deep Learning A-Z™: Hands-On Artificial Neural Networks by Kirill Eremenko and Hadelin de Ponteves via Udemy
Deep Learning A-Z is an accessible introduction to deep learning, with intuitive explanations from Kirill Eremenko and helpful code demos from Hadelin de Ponteves. Taught in Python, it has a 4.6-star weighted average rating over 1,314 reviews.
And here’s our top course pick for each of the supplementary subjects within data science.
Python & its tools
Python Programming Track by DataCamp, plus their individual pandas courses:
DataCamp’s code-heavy instruction style and in-browser programming environment are great for learning syntax. Their Python courses have a 4.64-star weighted average rating over 14 reviews. Udacity’s Intro to Data Analysis, one of our recommendations for intro to data science courses, covers NumPy and pandas as well.
R & its tools
R Programming Track by DataCamp, plus their individual dplyr and data.table courses:
Again, DataCamp’s code-heavy instruction style and in-browser programming environment are great for learning syntax. Their R Programming Track, which is also one of our recommendations for programming courses in general, effectively combines programming fundamentals and R syntax instruction. The series has a 4.29-star weighted average rating over 14 reviews.
Databases & SQL
Stanford University’s Introduction to Databases covers database theory comprehensively while introducing several open source tools. Programming exercises are challenging. Jennifer Widom, now the Dean of Stanford’s School of Engineering, is clear and precise. It has a 4.61-star weighted average rating over 59 reviews.
Importing & Cleaning Data Tracks by DataCamp:
DataCamp’s Importing & Cleaning Data Tracks (one in Python and one in R) excel at teaching the mechanics of preparing your data for analysis and/or visualization. There are no reviews for these courses on the review sites used for this analysis.
Exploratory Data Analysis
Data Analysis with R by Udacity and Facebook
Udacity’s Data Analysis with R is an enjoyable introduction to exploratory data analysis. The expert interviews with Facebook’s data scientists are insightful and inspiring. The course has a 4.58-star weighted average rating over 19 reviews. It also serves as a light introduction to R.
The Ultimate Hands-On Hadoop — Tame your Big Data! by Frank Kane via Udemy, then if you want more on specific tools (all by Frank Kane via Udemy):
- Taming Big Data with Apache Spark and Python — Hands On!
- Taming Big Data with MapReduce and Hadoop — Hands On!
- Apache Spark 2.0 with Scala — Hands On with Big Data!
- Taming Big Data with Spark Streaming and Scala — Hands On!
Frank Kane’s Big Data series teaches all of the most popular big data technologies, including over 25 in the “Ultimate” course alone. Kane shares his knowledge from a decade of industry experience working with distributed systems at Amazon and IMDb. Together, the courses have a 4.52-star weighted average rating over 6,932 reviews.
Software Testing by Udacity
Software Debugging by Udacity
Software skills are an oft-overlooked part of a data science education. Udacity’s testing, debugging, and version control courses introduce three core topics relevant to anyone who deals with code, especially those in team-based environments. Together, the courses have a 4.34-star weighted average rating over 68 reviews. Georgia Tech and Udacity have a new course that covers software testing and debugging together, though it is more advanced and not all relevant for data scientists.
Building a Data Science Team by Johns Hopkins University via Coursera
Learning How to Learn: Powerful mental tools to help you master tough subjects by Dr. Barbara Oakley and the University of California, San Diego via Coursera
Mindshift: Break Through Obstacles to Learning and Discover Your Hidden Potential by Dr. Barbara Oakley and McMaster University via Coursera
Johns Hopkins University’s Building a Data Science Team provides a useful peek into data science in practice. It is an extremely short course that can be completed in a handful of hours and audited for free. Ignore its 3.41-star weighted average rating over 12 reviews, some of which were likely from paying customers.
Dr. Barbara Oakley’s Learning How to Learn and Mindshift aren’t data science courses per se. Learning How to Learn, the most popular online course ever, covers best practices shown by research to be most effective for mastering tough subjects, including memory techniques and dealing with procrastination. In Mindshift, she demonstrates how to get the most out of online learning and MOOCs, how to seek out and work with mentors, and the secrets to avoiding career ruts and general ruts in life. These are two courses that everyone should take. They have a 4.74-star and a 4.87-star weighted average rating over 959 and 407 reviews, respectively. Both courses are four weeks in duration.