# Predicting variance of heteroscedastic data

I am trying to do a regression on heteroscedastic data where I’m trying to predict the error variances as well as the mean values in terms of a linear model. Something like this:

\begin{align}\\ y\left(x,t\right) &= \bar{y}\left(x,t\right)+\xi\left(x,t\right),\\ \xi\left(x,t\right) &\sim N\left(0,\sigma\left(x,t\right)\right),\\ \bar{y}\left(x,t\right) &= y_{0}+ax+bt,\\ \sigma\left(x,t\right) &= \sigma_{0}+cx+dt. \end{align}

In words, the data consists of repeated measurements of $y(x,t)$ at various values of $x$ and $t$. I assume these measurements consist of a “true” mean value $\bar{y}(x,t)$ which is a linear function of $x$ and $t$, with additive Gaussian noise $\xi(x,t)$ whose standard deviation (or variance, I haven’t decided) also depends linearly on $x,t$. (I could allow more complicated dependencies on $x$ and $t$ – there isn’t a strong theoretical motivation for a linear form – but I’d rather not overcomplicate things at this stage.)

I know the search term here is “heteroscedasticity,” but all I’ve been able to find so far are discussions of how to reduce/remove it to better predict $\bar{y}$, but nothing in terms of trying to predict $\sigma$ in terms of the independent variables. I would like to estimate $y_0, a, b, \sigma_0, c$ and $d$ with confidence intervals (or Bayesian equivalents), and if there is an easy way to do it in SPSS so much the better! What should I do?
Thanks.

I think your first problem is that $N\left(0,\sigma\left(x,t\right)\right)$ is not longer a normal distribution, and how the data needs to be transformed to be homoscedastic depends on exactly what $\sigma\left(x,t\right)$ is. For example, if $\sigma\left(x,t\right)= ax+bt$, then the error is proportional type and the logarithm of the y data should be taken before regression, or, the regression adjusted from ordinary least squares (OLS) to weighted least squares with a $1/y^2$ weight (that changes the regression to minimized proportional type error). Similarly, if $\sigma\left(x,t\right)= e^{a x+b t}$, one would have to take the logarithm of the logarithm and regress that.
I think the reason why prediction of error types is poorly covered is that one first does any old regression (groan, typically ordinary least squares, OLS). And from the residual plot, i.e., $model-y$, one observes the residual shape, and one plots the frequency histogram of the data, and looks at that. Then, if the residuals are a fan beam opening to the right, one tries proportional data modeling, if the histogram looks like an exponential decay one might try reciprocation, $1/y$, and so on and so forth for square roots, squaring, exponentiation, taking exponential-y.