Estimating parameters using maximum likelihood estimation (MLE) involves evaluating the likelihood function, which maps the probability of the sample (X) occurring to values (x) on the parameter space (θ) given a distribution family (P(X=x|θ) over possible values of θ (note: am I right on this?). All examples I’ve seen involve calculating P(X=x|θ) by taking the product of F(X) where F is the distribution with the local value for θ and X is the sample (a vector).
Since we’re just multiplying the data, does it follow that the data be independent? E.g. could we not use MLE to fit time-series data? Or do the parameters just have to be independent?
The likelihood function is defined as the probability of an event E (data set x) as a function of the model parameters θ
L(θ;x)∝P(Event E;θ)=P(observing x;θ).
Therefore, there is no assumption of independence of the observations. In the classical approach there is no definition for independence of parameters since they are not random variables; some related concepts could be identifiability, parameter orthogonality, and independence of the Maximum Likelihood Estimators (which are random variables).
(1). Discrete case. x=(x1,...,xn) is a sample of (independent) discrete observations with P(observing xj;θ)>0, then
Particularly, if xj∼Binomial(N,θ), with N known, we have that
(2). Continuous approximation. Let x=(x1,...,xn) be a sample from a continuous random variable X, with distribution F and density f, with measurement error ϵ, this is, you observe the sets (xj−ϵ,xj+ϵ). Then
When ϵ is small, this can be approximated (using the Mean Value Theorem) by
For an example with the normal case, take a look at this.
(3). Dependent and Markov model. Suppose that x=(x1,...,xn) is a set of observations possibly dependent and let f be the joint density of x, then
If additionally the Markov property is satisfied, then
Take also a look at this.