Notes: Learning Machine Learning
February 1, 2023
I've decided to spend some time digging into Machine Learning and Deep Learning. I took an ML course in grad school but forgot most of it, and I worked on a system for Google Maps back in 2015-2016 that predicated hundreds of categorical labels for tens of millions of POIs, but I only understood the model at a superficial level, and it was the days before TensorFlow, so we had to build a lot of custom stuff. A lot has happened since then!
I'll link notes and code below as I go along.
Edit: This project is complete. See the retrospective below.
Table of Contents
- Google's Machine Learning Crash Course
- Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow (abandoned)
- Independent practice
- Coursera Machine Learning Specialization
Google's Machine Learning Crash Course
This ~15 hour video-heavy course provides a broad overview of the field. The emphasis is on linear and logistic regression, neural networks, and applied/engineering factors. There are also some "programming exercises" that provide ~zero learning value, as you do little more than just read and execute existing TensorFlow code. But still this is a very good and information-dense course that I would highly recommend as an introduction.
- Descending into ML
- Reducing Loss
- First Steps with TensorFlow
- Real Datasets
- Training, Validation, Test Sets
- Feature Crosses
- Regularization: Simplicity
- Logistic Regression
- Regularization: Sparsity
- Neural Networks
- Training Neural Networks
- Multi-Class Neural Networks
- ML Engineering
- ML Fairness
- Real-World Examples
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
Status: Abandoned. Fantastic book but goes way deeper into library/tool usage and a million applications than I was looking for. Still I think the first few chapters and their exercises helped me develop a reasonable understanding of the ML workflow.
Notes and Colabs:
- Annotated Machine Learning Project Checklist taken from Appendix A plus details and tips from other chapters
- Chapter 2: End-to-End ML Project - Colab
- Chapter 3: Classification - Colab
- Chapter 4: Training Models - Colab
- Kaggle: MNIST without using neural networks
- Kaggle: Titanic
- Kaggle: Spaceship Titanic
- Kaggle: House Prices
- Linear regression with California housing data
- Predicting hotel reservation cancellation with logistic regression
- MNIST with a Keras softmax neural network
- Predicting wine quality with linear regression and cross-validation
- Forest cover type classification with XGBoost
- Ad engagement regression with XGBoost
Coursera Machine Learning Specialization
Great course for learning the algorithms and ideas in more detail than the Google crash course. Lots of math (like backpropagation from scratch and so on) helped me develop a better sense of how things work than just playing around with Kaggle problems. Also, Andrew Ng's comments and tips based on his extensive experience were a nice feature. Unfortunately the programming labs were almost entirely useless - you implement a tiny bit of an algorithm based on literally translating a couple of equations into Python. It would have been much more interesting if the labs required you to build up the entire pipeline and algorithm from scratch (or with hints). If I hadn't worked through the first couple of chapters of HOML I wouldn't have had any idea how the labs worked.
Notes and colabs:
- Course 1: Supervised Machine Learning: Regression and Classification
- Course 2: Advanced Learning Algorithms
- Course 3: Beyond supervised learning
I spent a year on this project (Feb 2023 to Feb 2024), with almost all of the work work happening in the first six months (before I relocated to France over the summer, when I had new problems to deal with). The effort absolutely paid off professionally. Even if I didn't transition to AI work, my current company does a lot of ML, and the time I spent on this material made it possible for me to understand what kind of approaches we use and why and their tradeoffs, identify the largest ML engineering challenges we're facing, and have reasonable conversations with the team members about their work and its difficulty and impact. It also made it possible to reframe my previous work on a production ML system at Google in 2015-2016 into modern language and concepts.
My next step is to dig further into Deep Learning and building stuff. I'll make a new post about that :)