# From a statistical perspective: Fourier transform vs regression with Fourier basis

I am trying to understand whether discrete Fourier transform gives the same representation of a curve as a regression using Fourier basis. For example,

library(fda)
Y=daily\$tempav[,1] ## my data
length(Y) ## =365

## create Fourier basis and estimate the coefficients
mybasis=create.fourier.basis(c(0,365),365)
basisMat=eval.basis(1:365,mybasis)
regcoef=coef(lm(Y~basisMat-1))

## using Fourier transform
fftcoef=fft(Y)

## compare


FFT gives a complex number, whereas regression gives a real number.

Do they convey the same information? Is there a one to one map between the two sets of numbers?

(I would appreciate if the answer is written from the statistician’s perspective instead of the engineer’s perspective. Many online materials I can find have engineering jargon all over the place, which makes them less palatable to me.)

They’re the same. Here’s how…

# Doing a Regression

Say you fit the model

where $t=1,\ldots,N$ and $n = \text{floor}(N/2)$. This isn’t suitable for linear regression, though, so instead you use some trigonometry ( $\cos(a + b) = \cos(a)\cos(b) - \sin(a)\sin(b)$) and fit the equivalent model:

Running linear regression on all of the Fourier frequencies $\{j/N : j = 1, \ldots, n \}$ gives you a bunch ($2n$) of betas: $\{\hat{\beta}_{i,j}\}$, $i=1,2$. For any $j$, if you wanted to calculate the pair by hand, you could use:

and

These are standard regression formulas.

# Doing a Discrete Fourier Transform

When you run a Fourier transform, you calculate, for $j=1, \ldots, n$:

This is a complex number (notice the $i$). To see why that equality holds, keep in mind that $e^{ix} = \cos(x) + i\sin(x)$, $\cos(-x) = \cos(x)$ and $\sin(-x) = -\sin(x)$.

For each $j$, taking the square of the complex conjugate gives you the “periodogram:”

In R, calculating this vector would be I <- abs(fft(Y))^2/length(Y), which is sort of weird, because you have to scale it.

Also you can define the “scaled periodogram

Clearly $P(j/N) = \frac{4}{N}|d(j/N)|^2$. In R this would be P <- (4/length(Y))*I[(1:floor(length(Y)/2))].

# The Connection Between the Two

It turns out the connection between the regression and the two periodograms is:

Why? Because the basis you chose is orthogonal/orthonormal. You can show for each $j$ that $\sum_{t=1}^N \cos^2(2 \pi t [j/N]) = \sum_{t=1}^N \sin^2(2 \pi t [j/N]) = N/2$. Plug that in to the denominators of your formulas for the regression coefficients and voila.