Recursively updating the MLE as new observations stream in

General Question

Say we have iid data x1, x2, … \sim f(x\,|\,\boldsymbol{\theta}) streaming in. We want to recursively compute the maximum likelihood estimate of \boldsymbol{\theta}. That is, having computed
\hat{\boldsymbol{\theta}}_{n-1}=\underset{\boldsymbol{\theta}\in\mathbb{R}^p}{\arg\max}\prod_{i=1}^{n-1}f(x_i\,|\,\boldsymbol{\theta}),
we observe a new x_n, and wish to somehow incrementally update our estimate
\hat{\boldsymbol{\theta}}_{n-1},\,x_n \to \hat{\boldsymbol{\theta}}_{n}
without having to start from scratch. Are there generic algorithms for this?

Toy Example

If x_1, x_2, … \sim N(x\,|\,\mu, 1), then
\hat{\mu}_{n-1} = \frac{1}{n-1}\sum\limits_{i=1}^{n-1}x_i\quad\text{and}\quad\hat{\mu}_n = \frac{1}{n}\sum\limits_{i=1}^nx_i,
so
\hat{\mu}_n=\frac{1}{n}\left[(n-1)\hat{\mu}_{n-1} + x_n\right].

Answer

See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there’s no convenient general result).

If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there’s a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).

One example I’m not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven’t noticed – I can’t say I’ve really done more than glance at it for considering the updating case.

On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn’t the general case, though — with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn’t shift much, which one is highest could well change).

Attribution
Source : Link , Question Author : jcz , Answer Author : Glen_b

Leave a Comment