Get smart on Data Science in 9 books or less (Updated April 2021)

Matthieu Couturier
4 min readApr 18, 2021

--

Recommended Data Science Books

The question I most often get from people looking to get into Data Science is what are the best books to read?

  1. Predictive Analytics, Revised and Updated
  2. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking.
  3. Naked Statistics: Stripping the Dread from the Data
  4. An Introduction to Statistical Learning
  5. The Elements of Statistical Learning
  6. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
  7. Trustworthy online controlled experiments
  8. Field Experiments: Design, Analysis, and Interpretation
  9. Mostly Harmless Econometrics

Coming from an engineering and consulting background, I’ve had to learn a lot of technical topics on my own and while there are many amazing online classes out there, I’ve always learned most efficiently by reading books. Below is my recommended data science reading list. I’ve sorted each category from intro to expert so you can find the right place to start based on your current knowledge level.

Intro to Data science:

The books recommended here are a great introduction to the world of data science and machine learning. They will help you understand the fundamentals of data science and why they are critical in today’s world.

  1. Predictive Analytics, Revised and Updated

This book provides great examples of how data science helped companies solve some of their toughest problems, from how Netflix makes movie recommendations to how IBM’s Watson beat Jeopardy. It’s a great starting point to understand why you should care about modeling and data science (spoiler alert, business impact!).

2. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking.

This one will help you start understanding what is happening underneath the hood. It goes over the basic concepts of modeling (e.g. supervised vs unsupervised models) and provides a good overview of different statistical approaches in an intuitive way (without going too deep in the math)!

3. Naked Statistics: Stripping the Dread from the Data

While you do not need a PhD in math to become a Data Scientist, you do need to develop an understanding of the basics of statistics. This book is funny and a great way to intuitively grasp some of the fundamentals concepts (e.g. the central limit theorem) that underpin the discipline.

Statistical Learning and Machine Learning

At this point you are convinced you want to become a Data Scientist. You understand the high level concepts and want to dive deeper into the world of statistical learning. Note that the books in this section are math heavy (but you can always gloss over those parts)!

4. An Introduction to Statistical Learning

This book is required reading for anyone who wants to become a serious data scientist. It is a crash course in modeling starting from Linear Regression to Support Vector Machine and PCA. The examples in R will help you see how to start using this in practice.

5. The Elements of Statistical Learning

If an introduction to statistical learning is DS 101, this is DS 201. About twice the size of the previous book it expands on some of the previous topics and introduces neural nets, ensemble learning and other advanced topics. However, if you mostly want to learn about deep-learning, skip straight to the next one.

6. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

Now we are ready for graduate level classes. While this book is focused on TensorFlow (i.e. Deep Learning) it also provides an overview of all the basic approaches from Linear Regression to Boosting and Random Forest. I wouldn’t be surprised if this one quickly became a classic.

Experimentation

The above section will help you master machine learning but that’s only a small part of what a Data Scientist is expected to know. The other big area is experimentation (also known as A/B testing). While I would argue most Data Scientists spend much of their time designing and interpreting experiments there are relatively few books on the topic.

7. Trustworthy online controlled experiments

This is the best book I’ve read on the topic of A/B testing but it assumes you already have a basic knowledge of experimentation. It provides a quick overview of the subject but also some in depth analysis of thornier problems that data scientists will have to tackle (e.g. instrumentation, ramping-up exposure, triggering…)

8. Field Experiments: Design, Analysis, and Interpretation

While this book is rooted in the social sciences side of experimentation, it will help you build an understanding of the concepts needed to run successful online experiments (attrition, interference, heterogeneous treatment effects…)

9. Mostly Harmless Econometrics

Finally this book approaches experimentation from the side of econometrics. It will introduce you to advanced concepts like instrumental variables, fixed effects or diff-in-diff. Understanding and mastering those approaches will help you solve your thornier causal inference problems where a standard A/B test might not suffice.

Final thoughts: There are a few important areas of data science for which I have yet to find good books. In particular I think there is a big gap in critical topics like how to define metrics, build data pipelines, design dashboards, or run back-of-the-envelope analysis to answer business questions. Those are all critical skills to learn but (for now) seem mostly ignored in the press.

--

--

Matthieu Couturier
Matthieu Couturier

Written by Matthieu Couturier

Data Science Manager at Airbnb. Previously at Uber and BCG.

No responses yet