I plan to include coordinates as covariates in the regression equation in order to adjust for the spatial trend that exists in the data. After that, I want to test residuals on spatial autocorrelation in random variation. I have several questions:

Should I perform linear regression in which only independent variables are x and y coordinates and then tests residuals on spatial autocorrelation, or should I rather include not only coordinates as covariates but also other variables and then test residuals.

If I expect to have quadratic trend, and then include not only x,y, but also , xy, x2 and y2, but then some of them (xy and y2) have the p-value higher than the threshold –should I exclude those variables with higher p-value as being nonsignificant? How should I then interpret the trend, it is certainly not quadratic anymore?

I guess I should treat x and y coordinates as any other covariates, and test them on having linear relationship with dependent variable by constructing partial residual plots … but then once I transform them (if they show they need transformation), that will not be that kind of trend any more (especially if I include xy, x2 and y2 for quadratic trend). It may show that x2, for example, needs transformation, while x does not or so? How should I react in these situations?

Thank you.

**Answer**

I think you might be better off fitting a linear mixed effects model with spatially-correlated random effects (sometimes called a *geostatistical* model). Assuming your data is Gaussian, you specify a model of the form:

Yi=μi+Si+ϵi,

for n observations 1≤i≤n, with ϵ∼N(0,τ2) representing iid errors and S∼MVN(0,σ2R) representing your spatial terms (where S={S1,...,Sn}). The mean μi could be a function of other covariates (i.e. μi=β0+β1xi1+β2xi2 etc.) or it could be just be a constant (might be best to start with the latter for simplicity).

The correlation matrix R for the spatial terms (which determines how correlated you think each observation should be) can be specified by looking at the empirical variogram. Generally the correlation between observations is chosen to only depend on the distance between them (this is where your coordinates come into the model).

Chapter 2 of *Model-based geostatistics* by Diggle and Ribeiro (2000) should give you a more detailed introduction. The R package geoR has many procedures for fitting geostatistical models, so you may find it useful (see http://cran.r-project.org/web/packages/geoR/geoR.pdf).

**Attribution***Source : Link , Question Author : Beka , Answer Author : Sam Livingstone*