Maths foundations for statistics and machine learning

While (and after) being a COVID hermit, I took/re-took a journey from basic maths to try to use the momentum to get better at the mathematics of statistics and machine learning. It was a winding path but, with hindsight, the list below shows the steps I would have taken in an ideal world (in order, for the first two sub-lists) with the resources I found most helpful along the way (or, in some cases - Devlin, Kline, Jones & Jones - ones I didn't use myself originally but found when looking for a reference to fill out a point on the list).

The first sub-list could be good preparation for statistics modules in e.g. an undergraduate psychology course. The idea of the second sub-list is to interweave general maths with the probability maths that's probably the motivating material for the intended reader; this sub-list is what I think should be covered at latest by the end of a social science PhD, so you have a good enough foundation to understand typical statistics and read up on specific methods relatively efficiently.

Note: While I've excluded those books that are such didactic garbage they're pretty much sabotaging their readers (I've made allowances for what happens in "starred" sections), only a few are really good about helping readers select necessary subsets of exercises. So there's some judgement needed; don't let ego get in the way of being efficient with time, e.g., by looking up information or solutions online at some points (it can be fun to heroically struggle and transcend, but not if it turns out you're being blocked by some "cute" exercise you need some unmentioned, non-obvious factoid for).

The list

  1. General prerequisites
    1. Back-to-basics to make sure you're not missing anything that'll trip you up later: The first four parts of the MathTrackX series on edX: Polynomials, Functions and Graphs; Special Functions; Differential Calculus; Integral Calculus. These is about having the "school maths" and numeracy that everything else will assume is known, similarly to literacy. If you find it's needed to go back further, to arithmetic or wherever you need to start, then that's where to start instead!
    2. Any introduction to very basic linear algebra and matrices. E.g., Savov's No Bullshit Guide to Linear Algebra, chapters 2 and 3. This is mostly to learn the simple but critical language of how data are typically represented and organized - in columns and rows - and later on there will also be necessary maths building on these basic concepts.
    3. An introduction to the language of proofs and sets: Section 1.16 of Savov.
    4. Introduction to Probability (STAT110x) on edX: Really well designed and accessible online introduction to foundational probability maths. While statistics isn't the same as probability, statistics is fundamentally about probabilities, so having this basis will make your life infinitely easier in stats modules.
  2. Fundamental probability and statistics
    1. Just for awareness at this point, to avoid confusion since they're briefly alluded to here and there: Section 1.14 of Savov covering complex numbers (depending on how secure you are in the "school maths" from the first sub-list, it could be worth running through all of Chapter 1 to avoid gaps).
    2. An introduction to logic, proofs, and set theory and Boolean algebra (including truth tables); e.g., chapters 1 - 3 of Devlin's Sets, Functions, and Logic: An Introduction to Abstract Mathematics. This is the more formal mathematical language used in sources in further steps below; statistics modules and textbooks might also implicitly assume at least some familiarity with the concepts.
    3. Introduction to Probability by Hwang & Blitzstein, chapters 1 - 4. This is the book the STAT110x online course is based on, but the next step after completing the course is to really work through the book - including the (standard) exercises - you can trust the authors that they're doable; more difficult exercises are clearly marked. It's a time investment but I found it very worth it, as someone who wanted to start properly understanding scientific methods. Subsequent topics in this list will assume a good grasp of probability concepts covered in the book (although which specific ones will vary per topic).
    4. Quick digression specifically for the arithmetic series, since that'll be assumed to be known below. This Intro video on Khan academy covers it.
    5. Revisit and consolidate calculus, since the next bits on probability will heavily involve that. I haven't found anything I've fully vetted and been satisfied with but Kline's Calculus: An Intuitive and Physical Approach gives a great start with chapters 1 through 9 (the non-starred subsections), explaining a lot of points in a helpful conversational/lecture style. You need 1 through 12 though for the next part of Hwang & Blitzstein. (My main issue with Kline concerns the chapters on trigonometry, 10 and 11, which are a bit of a slog that doesn't quite feel optimal or necessary to me, with all the secants and cosecants etc in addition to just sines and cosines; but the chapters aren't skippable since they introduce general points.)
    6. Introduction to Probability by Hwang & Blitzstein, chapter 5.
    7. Further calculus up to the basic idea of multiple integrals. Finishing Kline's Calculus: An Intuitive and Physical Approach is currently still the best single option I'm aware of. A good introductory engineering mathematics book could be an alternative option (they seem to cover all the relevant topics but can be non-rigorous to the point of being sloppy, which can end up being confusing and frustrating).
    8. Introduction to Probability by Hwang & Blitzstein, chapter 6 onwards. I'd suggest studying up to at least conditional expectations, since those will come up a lot in basic statistical methods.
    9. Further linear algebra, needed for subsequent techniques like regression or Principal Component Analysis. E.g., chapters 4 through 6.6 of Savov's No Bullshit Guide to Linear Algebra. I'd also recommend starting on the great Strang lectures here, up to projections and least squares; but be aware that the best edition of the associated book might not be the latest (6th) one (the online course materials refer to the 4th and 5th, most consistently the 4th which is what the assignment numbers refer to, although the assignments are repeated in the solution PDFs anyway).
    10. Regression by Bingham & Fry, fully covers linear regression including the contents of the usual mathematical black box where Psychology statistics teaching ends.
    11. Finish the Strang lectures.