Notes: Learning Machine Learning

February 1, 2023

Motivated by the ChatGPT release and the new gold rush, I've decided to spend some time refreshing my ML fundamentals. Back in 2015-2016 (before TensorFlow) I worked on what is now called MLOps for a large multi-label classification model for Google Maps, where I implemented distributed data processing pipelines, model versioning, deployment, and data and inference observability, all in C++, but I've lost touch with the modern ecosystem. In grad school I took several ML and classical AI courses but haven't had to revisit those fundamentals in a long time.

I'll link notes and code below as I go along.

Edit: This project is complete. See the retrospective below.

Google's Machine Learning Crash Course
Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow (abandoned)
Independent practice
Coursera Machine Learning Specialization
Retrospective

Google's Machine Learning Crash Course

Course: https://developers.google.com/machine-learning/crash-course

Status: Done

Review:

This ~15 hour video-heavy course provides a broad overview of the field. The emphasis is on linear and logistic regression, neural networks, and applied/engineering factors. There are also some "programming exercises" that provide ~zero learning value, as you do little more than just read and execute existing TensorFlow code. But still this is a very good and information-dense course that I would highly recommend as an introduction.

Notes:

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

Book: https://www.oreilly.com/library/view/hands-on-machine-learning/9781098125967/

Status: Abandoned. Fantastic book but goes way deeper into library/tool usage and a million applications than I was looking for. Still I think the first few chapters and their exercises helped me develop a reasonable understanding of the ML workflow.

Notes and Colabs:

Annotated Machine Learning Project Checklist taken from Appendix A plus details and tips from other chapters
Chapter 2: End-to-End ML Project - Colab
Chapter 3: Classification - Colab
Chapter 4: Training Models - Colab

Independent practice

Coursera Machine Learning Specialization

Course: https://www.deeplearning.ai/courses/machine-learning-specialization/

Status: Done

Review:

Great course for learning the algorithms and ideas in more detail than the Google crash course. Lots of math (like backpropagation from scratch and so on) helped me develop a better sense of how things work than just playing around with Kaggle problems. Also, Andrew Ng's comments and tips based on his extensive experience were a nice feature. Unfortunately the programming labs were almost entirely useless - you implement a tiny bit of an algorithm based on literally translating a couple of equations into Python. It would have been much more interesting if the labs required you to build up the entire pipeline and algorithm from scratch (or with hints). If I hadn't worked through the first couple of chapters of HOML I wouldn't have had any idea how the labs worked.

Notes and colabs:

Course 1: Supervised Machine Learning: Regression and Classification
- Intro to Machine Learning: notes
- Multiple Linear Regression: notes
- Classification: notes
Course 2: Advanced Learning Algorithms
- Neural Networks: notes
- Training Neural Networks: notes
- Applying Machine Learning: notes
- Decision Trees: notes
Course 3: Beyond supervised learning
- Unsupervised learning: notes
- Recommender systems: notes
- Reinforcement learning: notes

Retrospective

I spent a year on this project (Feb 2023 to Feb 2024), with almost all of the work work happening in the first six months (before I relocated to France over the summer, when I had new problems to deal with). The effort absolutely paid off professionally. Even if I didn't transition to AI work, my current company does a lot of ML, and the time I spent on this material made it possible for me to understand what kind of approaches we use and why and their tradeoffs, identify the largest ML engineering challenges we're facing, and have reasonable conversations with the team members about their work and its difficulty and impact. It also helped me to understand and apply my (existing but very outdated) ML knowledge and experience in terms of the modern ecosystem.

My next step is to dig further into Deep Learning and building stuff. I'll make a new post about that :)

[home]