# Regression on the unit disk starting from “uniformly spaced” samples

I need to solve a complicated regression problem over the unit disk. The original question attracted some interesting comments, but no answers unfortunately. In meantime, I learned something more on this problem, thus I will try to split the original problem into subproblems, and see if I have better luck this time.

I have 40 temperature sensors regularly spaced in a narrow ring inside the unit disk:

These sensors acquire temperature in time. However, since time variation is much smaller than space variation, let’s simplify the problem by ignoring time variability, and assume that each sensor only gives me a time average. This means that I have 40 samples (one for each sensor) and I don’t have repeated samples.

I would like to build a regression surface $T=f(\rho,\theta)+\epsilon$ from the sensor data. The regression has two goals:

1. I need to estimate a mean radial temperature profile $T_{mean}=g_1(\rho)+\epsilon$. With linear regression, I already estimate a surface which is the mean temperature surface, thus I only need to integrate my surface with respect to $\theta$, right? If I use polynomials for regression, this step should be a piece of cake.
2. I need to estimate a radial temperature profile $T_{95}=g_2(\rho)+\epsilon$, such that at each radial position, $P(T(\rho).

Given these two goals, which technique should I use for the regression on the unit disk? Of course, Gaussian Processes are commonly used for spatial regression. However the definition of a good kernel for the unit disk is not trivial, so I'd like to keep things simple and use polynomials, unless you feel it's a losing strategy. I've read about Zernike polynomials. The Zernike polynomials seem to be appropriate for regression over the unit disk, since they're periodic in $\theta$.

Once the model is chosen, I need to choose an estimation procedure. Since this is a spatial regression problem, errors at different locations should be correlated. Ordinary Least Squares assumes uncorrelated errors, thus I guess Generalized Least Squares would be more appropriate. GLS seems a relatively common statistical technique, given that there's a gls function in the standard R distribution. However, I've never used GLS, and I have doubts. For example, how do I estimate the covariance matrix? A worked out example, even with just a few sensors, would be great.

PS I chose to use Zernike polynomials and GLS because it seems to me the logical thing to do here. However I'm no expert, and if you feel I'm going in the wrong direction, feel free to use a completely different approach.

I think you are on the right track in thinking about something like Zernike polynomials. As noted in the answer by jwimberly, these are an example of a system of orthogonal basis functions on a disk. I am not familiar with Zernike polynomials, but many other families of orthogonal functions (including Bessel functions) arise naturally in classical mathematical physics as eigenfunctions for certain partial differential equations (at the time of this writing, the animation at the top of that link even shows an example of a vibrating drum head).

Two questions come to my mind. First, if all you are after is the radial profile ($\theta$ averaged), then how much constraint on the spatial pattern do you need? Second, what types of variability occur in the spatio-temporal data?

In terms of the first question, there are two concerns that come to mind. Due to the polar coordinates, the support-area for each sensor has a trend with $r$. The second concern would be the possibility of aliasing, essentially a mis-alignment of your sensors relative to the phase of the pattern (to use a Fourier/Bessel analogy). Note that aliasing will likely be the primary uncertainty in constraining the peak temperatures (i.e. $T_{95}$).

In terms of this second question, data variability could actually help with any aliasing issues, essentially allowing any mis-alignment to average out over the different measurements. (Assuming no systematic bias ... but that would be a problem for any method, without e.g. a physical model to give more information).

So one possibility would be to define your spatial orthogonal functions purely at the sensor locations. These "Empirical Orthogonal Functions" could be computed via PCA on your spatiotemporal data matrix. (Possibly you could use some weighting to account for the variable sensor support areas, but given the uniform polar grid and target of radial averages, this may not be required.)

Note that if there is any physical modeling data available for "expected" variations in the temperature, available on a dense spatiotemporal computational grid, then the same PCA procedure could be applied to that data to derive orthogonal functions. (This would typically called "Proper Orthogonal Decomposition" in engineering, where it is used for model reduction, e.g. an expensive computational fluid dynamics model can be distilled for use in further design activities.)

A final comment, if you were to weight the sensor data by support area (i.e. polar cell size), this would be a type of diagonal covariance, in framework of GLS. (That would apply to your prediction problem more, although weighted PCA would be closely related.)

I hope this helps!

Update: Your new diagram of the sensor distribution changes things considerably in my view. If you want to estimate temperatures over the disk interior, you will need a much more informative prior than simply "set of orthogonal functions on the unit disk". There is just too little information in the sensor data.

If you indeed want to estimate the spatial temperature variation over the disk, the only reasonable way I can see would be to treat the problem as one of data assimilation. Here you would need to at least constrain the parametric form of the spatial distribution based on some physics-based considerations (these could be from simulations, or could be from related data in systems with similar dynamics).

I do not know your particular application, but if it is something like this, then I would imagine there is an extensive engineering literature that you could draw upon to choose appropriate prior constraints. (For that sort of detailed domain knowledge, this is probably not the best StackExchange site to ask on.)