The resources here are meant for further exploration of topics already covered in the book. Some of them were excluded from the book to avoid distracting the readers from the key points, as the book already includes a substantial amount of links and references.
- Chapter 1. Overview of Machine Learning Systems
- Chapter 2. Introduction to Machine Learning Systems Design
- Chapter 3. Data Engineering Fundamentals
- Chapter 4. Training Data
- Chapter 5. Feature Engineering
- Chapter 6. Model Development and Offline Evaluation
- Chapter 7. Model Deployment and Prediction Service
- Chapter 8. Data Distribution Shifts and Monitoring
- Chapter 9. Continual Learning and Test in Production
- Chapter 10. Infrastructure and Tooling for MLOps
- Chapter 11. The Human Side of Machine Learning
To learn to design ML systems, it’s helpful to read case studies to see how actual teams deal with different deployment requirements and constraints. Many companies — Airbnb, Lyft, Uber, and Netflix, to name a few — run excellent tech blogs where they share their experience using ML to improve their products and/or processes.
- Using Machine Learning to Predict Value of Homes On Airbnb (Robert Chang, Airbnb Engineering & Data Science, 2017)
In this detailed and well-written blog post, Chang described how Airbnb used machine learning to predict an important business metric: the value of homes on Airbnb. It walks you through the entire workflow: feature engineering, model selection, prototyping, moving prototypes to production. It's completed with lessons learned, tools used, and code snippets too.
- Using Machine Learning to Improve Streaming Quality at Netflix (Chaitanya Ekanadham, Netflix Technology Blog, 2018)
As of 2018, Netflix streams to over 117M members worldwide, half of those living outside the US. This blog post describes some of their technical challenges and how they use machine learning to overcome these challenges, including to predict the network quality, detect device anomaly, and allocate resources for predictive caching.
- 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Bernardi et al., KDD, 2019)
As of 2019, Booking.com has around 150 machine learning models in production. These models solve a wide range of prediction problems (e.g. predicting users’ travel preferences and how many people they travel with) and optimization problems (e.g.optimizing the background images and reviews to show for each user). Adrian Colyer gave a good summary of the six lessons learned here:
-
Machine learned models deliver strong business value.
-
Model performance is not the same as business performance.
-
Be clear about the problem you’re trying to solve.
-
Prediction serving latency matters.
-
Get early feedback on model quality.
-
Test the business impact of your models using randomized controlled trials.
-
How we grew from 0 to 4 million women on our fashion app, with a vertical machine learning approach (Gabriel Aldamiz, HackerNoon, 2018)
To offer automated outfit advice, Chicisimo tried to qualify people's fashion taste using machine learning. Due to the ambiguous nature of the task, the biggest challenges are framing the problem and collecting the data for it, both challenges are addressed by the article. It also covers the problem that every consumer app struggles with: user retention.
- Machine Learning-Powered Search Ranking of Airbnb Experiences (Mihajlo Grbovic, Airbnb Engineering & Data Science, 2019)
This article walks you step by step through a canonical example of the ranking and recommendation problem. The four main steps are system design, personalization, online scoring, and business aspects. The article explains which features to use, how to collect data and label it, why they chose Gradient Boosted Decision Tree, which testing metrics to use, what heuristics to take into account while ranking results, and how to do A/B testing during deployment. Another wonderful thing about this post is that it also covers personalization to rank results differently for different users.
- From shallow to deep learning in fraud (Hao Yi Ong, Lyft Engineering, 2018)
Fraud detection is one of the earliest use cases of machine learning in the industry. This article explores the evolution of fraud detection algorithms used at Lyft. At first, an algorithm as simple as logistic regression with engineered features was enough to catch most fraud cases. Its simplicity allowed the team to understand the importance of different features. Later, when fraud techniques have become too sophisticated, more complex models are required. This article explores the tradeoff between complexity and interpretability, performance and ease of deployment.
- Space, Time and Groceries (Jeremy Stanley, Tech at Instacart, 2017)
Instacart uses machine learning to solve the task of path optimization: how to most efficiently assign tasks for multiple shoppers and find the optimal paths for them. The article explains the entire process of system design, from framing the problem, collecting data, algorithm and metric selection, topped with a tutorial for beautiful visualization.
- Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning (Brad Neuberg, Dropbox Engineering, 2017)
An application as simple as a document scanner has two distinct components: optical character recognition and word detector. Each requires its own production pipeline, and the end-to-end system requires additional steps for training and tuning. This article also goes into detail the team’s effort to collect data, which includes building their own data annotation platform.
- Spotify’s Discover Weekly: How machine learning finds your new music (Sophia Ciocca, 2017)
To create Discover Weekly, there are three main types of recommendation models that Spotify employs:
- **Collaborative Filtering **models (i.e. the ones that Last.fm originally used), which work by analyzing your behavior and others’ behavior.
- Natural Language Processing (NLP) models, which work by analyzing text.
- Audio models, which work by analyzing the raw audio tracks themselves.
- Smart Compose: Using Neural Networks to Help Write Emails (Yonghui Wu, Google AI Blog 2018)
“Since Smart Compose provides predictions on a per-keystroke basis, it must respond ideally within 100ms for the user not to notice any delays. Balancing model complexity and inference speed was a critical issue.”
- Rules of Machine Learning (Martin Zinkevich)
- Things I wish we had known before we started our first Machine Learning project (Aseem Bansal, towards-infinity 2018)
- Data Science Project Quick-Start (Eugene Yan, 2022)
- https://github.com/chiphuyen/machine-learning-systems-design: A much earlier, much less organized version of this book.
- Deploying Machine Learning Models: A Checklist (a short checklist for ML systems design)
- A Beginner’s Guide to Data Engineering (Robert Chang 2018)
- Designing Data-Intensive Applications (Martin Kleppmann, O’Reilly, 2017)
- Emerging Architectures for Modern Data Infrastructure (Bornstein et al, a16z 2022)
- Reverse ETL — A Primer (Astasia Myers 2021)
- Uber’s Big Data Platform: 100+ Petabytes with Minute Latency (Reza Shiftehfar, Uber Engineering blog 2018)
- How DoorDash is Scaling its Data Platform to Delight Customers and Meet our Growing Demand (Sudhir Tonse 2020)
- The Log: What every software engineer should know about real-time data's unifying abstraction (Jay Kreps, LinkedIn / Confluent, 2013): Jay mentioned in a tweet that when he wrote the blog to see if there was enough interest in streaming for his team to start a company around it. The blog must have been popular because his team spun out of LinkedIn to become Confluent.
- The Many Meanings of Event-Driven Architecture (Martin Fowler, GOTO 2017): Martin Fowler is a great speaker. His talk made clear many of the complexities of event-driven architecture.
- Stream Processing Hard Problems – Part 1: Killing Lambda (Kartik Paramasivam, LinkedIn Engineering 2016)
- Open Problems in Stream Processing: A Call To Action (Tyler Akidau, DEBS 2019): Tyler used to lead Dataflow at Google until he joined Snowflake in Jan 2020 to start Snowflake’s streaming team. His talk laid out key challenges of stream processing.
- The Four Innovation Phases of Netflix's Trillions Scale Real-time Data Infrastructure (Zhenzhong Xu, 2022): How Netflix transitioned from a batch system to a streaming system.
- Rejection sampling
- The MIDAS Touch: Mixed Data Sampling Regression Models (Ghysels et al., 2004)
- An Overview of Weak Supervision (Ratner et al., 2018)
- Interpretable Machine Learning (Christoph Molnar, 2022): An amazingly detailed introduction to interpretability
- How to unit test machine learning code (Chase Roberts, 2017)
- A Recipe for Training Neural Networks (Andrej Karpathy, 2019)
- Top 6 errors novice machine learning engineers make (Christopher Dossman, AI³ | Theory, Practice, Business 2017)
- Testing and Debugging in Machine Learning course (Google)
- What did you wish you knew before deploying your first ML model? (I asked this question on Twitter and got some interesting responses)
- Techniques for Training Large Neural Networks (OpenAI 2022)
- A survey of model compression and acceleration for deep neural networks (Cheng et al., IEEE Signal Processing Magazine 2017)
- Towards Federated Learning at Scale: System Design (Bonawitz et al, 2019)
- Effective testing for machine learning systems (Jeremy Jordan, 2020)
- On Calibration of Modern Neural Networks (Guo et al., 2017)
- Calibration for Netflix recommendation systems (Harald Steck, 2018)
- Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., ACL 2020)
- TextBugger: Generating Adversarial Text Against Real-world Applications (Li et al., 2018)
- Uncertainty Sets for Image Classifiers using Conformal Prediction (Angelopoulos et al., 2020)
- Beyond Incremental Processing: Tracking Concept Drift (Jeffrey C. Schlimmer and Richard H. Granger, Jr., 1986). Concept drift isn’t something new!
- Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift (Rabanser et al., 2019)
- Out-of-Distribution Generalization via Risk Extrapolation (REx) (Krueger et al., 2020)
- Domain Adaptation under Target and Conditional Shift (Zhang et al., 2013)
- A Review of Domain Adaptation without Target Labels (Kouw et al., 2019)
- On Learning Invariant Representations for Domain Adaptation (Zhao et al., 2019)
- How to deal with the seasonality of a market? (Marguerite Graveleau, Lyft Engineering 2019)
- Invariant Risk Minimization (Arjovsky et al., 2019)
- Causality for Machine Learning (Bernhard Schölkopf, 2019)
- Application deployment and testing strategies (Google)
- MLOps: Continuous delivery and automation pipelines in machine learning (Google)
- Automated Canary Analysis at Netflix with Kayenta (Michael Graff and Chris Sanden, Netflix Technology Blog 2018)
- A/B testing — Is there a better way? An exploration of multi-armed bandits (Greg Rafferty, Towards Data Science 2020)
- Deep Bayesian Bandits: Exploring in Online Personalized Recommendations (Guo et al. 2020)
- Active Learning and Contextual Bandits (Paul Mineiro, 2012)
- Introduction to Microservices, Docker, and Kubernetes: a good 1-hour video on introduction to Docker and k8s.
- How Microsoft plans efficient workloads with DevOps
- Airbnb’s BigHead
- Uber’s Michelangelo
- Weapons of Math Destruction (Cathy O’Neil, Crown Books 2016)
- NIST Special Publication 1270: Towards a Standard for Identifying and Managing Bias in Artificial Intelligence
- ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) publications
- Trustworthy ML’s recommended list of resources and fundamental papers to researchers and practitioners who want to learn more about trustworthy ML
- Sara Hooker’s awesome slide deck on ML Beyond Accuracy: Fairness, Security, Governance (2022)
- Timnit Gebru and Emily Denton’s tutorials on Fairness, Accountability, Transparency, and Ethics (2020)