Statistical learning when observations are not iid

As far as I am concerned, statistical/machine learning algorithms always suppose that data are independent and identically distributed (iid).

My question is: what can we do when this assumption is clearly unsatisfied? For instance, suppose that we have a data set whith repeated measurements on the same observations , so that both the cross-section and the time dimensions are important (what econometricians call a panel data set, or statisticians refer to as longitudinal data, which is distinct from a time series).

An example could be the following. In 2002, we collect the prices (henceforth Y) of 1000 houses in New York, together with a set of covariates (henceforth X). In 2005, we collect the same variables on the same houses. Similar happens in 2009 and 2012. Say I want to understand the relationship between X and Y. Were the data iid, I could easily fit a random forest (or any other supervised algorithm, for what matters), thus estimating the conditional expectation of Y given X. However, there is clearly some auto-correlation in my data. How can I handle this?

Answer

There is nothing in the theory of statistical learning or machine learning that requires samples to be i.i.d.

When samples are i.i.d, you can write the joint probability of the samples given some model as a product, namely P({x})=ΠiPi(xi) which makes the log-likelihood a sum of the individual log-likelihoods. This simplifies the calculation, but is by no means a requirement.

In your case, you can for example model the distribution of a pair xi,yi with some bi-variate distribution, say zi=(xi,yi)T , ziN(μ,Σ) , and then estimate the parameter Σ from the likelihood P({z})=ΠiP(zi|μ,Σ).

It is true that many out-of-the-box algorithm implementations implicitly assume independence between samples, so you are correct in identifying that you will have a problem applying them to you data as is. You will either have to modify the algorithm or find ones that are better suited for your case.

Attribution
Source : Link , Question Author : Plastic Man , Answer Author : J. Delaney

Leave a Comment