# Optimization and Machine Learning

I wanted to know how much of machine learning requires optimization. From what I’ve heard statistics is an important mathematical topic for people working with machine learning. Similarly how important is it for someone working with machine learning to learn about convex or non-convex optimization?

The way I look at it is that statistics / machine learning tells you what you should be optimizing, and optimization is how you actually do so.

For example, consider linear regression with $Y = X\beta + \varepsilon$ where $E(\varepsilon) = 0$ and $Var(\varepsilon) = \sigma^2I$. Statistics tells us that this is (often) a good model, but we find our actual estimate $\hat \beta$ by solving an optimization problem

The properties of $\hat \beta$ are known to us through statistics so we know that this is a good optimization problem to solve. In this case it is an easy optimization but this still shows the general principle.

More generally, much of machine learning can be viewed as solving

where I’m writing this without regularization but that could easily be added.

A huge amount of research in statistical learning theory (SLT) has studied the properties of these argminima, whether or not they are asymptotically optimal, how they relate to the complexity of $\mathscr F$, and many other such things. But when you actually want to get $\hat f$, often you end up with a difficult optimization and it’s a whole separate set of people who study that problem. I think the history of SVM is a good example here. We have the SLT people like Vapnik and Cortes (and many others) who showed how SVM is a good optimization problem to solve. But then it was others like John Platt and the LIBSVM authors who made this feasible in practice.

To answer your exact question, knowing some optimization is certainly helpful but generally no one is an expert in all these areas so you learn as much as you can but some aspects will always be something of a black box to you. Maybe you haven’t properly studied the SLT results behind your favorite ML algorithm, or maybe you don’t know the inner workings of the optimizer you’re using. It’s a lifelong journey.