How to minimize residual sum of squares of an exponential fit?

I have the following data and would like to fit a negative exponential growth model to it:

Days <- c( 1,5,12,16,22,27,36,43)
Emissions <- c( 936.76, 1458.68, 1787.23, 1840.04, 1928.97, 1963.63, 1965.37, 1985.71)
plot(Days, Emissions)
fit <- nls(Emissions ~ a* (1-exp(-b*Days)), start = list(a = 2000, b = 0.55))
curve((y = 1882 * (1 - exp(-0.5108*x))), from = 0, to =45, add = T, col = "green", lwd = 4)

The code is working and a fitting line is plotted. However, the fit is visually not ideal, and the residual sum of squares seems to be quite huge (147073).

How can we improve our fit? Does the data allow a better fit at all?

We could not find a solution to this challenge on the net.
Any direct help or linkage to other websites/posts is greatly appreciated.

Answer

A (negative) exponential law takes the form y=exp(x). When you allow for changes of units in the x and y values, though, say to y=αy+β and x=γx+δ, then the law will be expressed as

αy+β=y=exp(x)=exp(γxδ),

which algebraically is equivalent to

y=1αexp(γxδ)β=a(1uexp(bx))

using three parameters a=β/α, u=1/(βexp(δ)), and b=γ. We can recognize a as a scale parameter for y, b as a scale parameter for x, and u as deriving from a location parameter for x.

As a rule of thumb, these parameters can be identified at a glance from the plot:

  • The parameter a is the value of the horizontal asymptote, a little less than 2000.

  • The parameter u is the relative amount the curve rises from the origin to its horizontal asymptote. Here, the the rise therefore is a little less than 2000937; relatively, that’s about 0.55 of the asymptote.

  • Because exp(3)0.05, when x equals three times the value of 1/b the curve should have risen to about 10.05 or 95% of its total. 95% of the rise from 937 to almost 2000 places us around 1950; scanning across the plot indicates this took 20 to 25 days. Let’s call it 24 for simplicity, whence b3/24=0.125. (This 95% method to eyeball an exponential scale is standard in some fields that use exponential plots a lot.)

Let’s see what this looks like:

plot(Days, Emissions)
curve((y = 2000 * (1 - 0.56 * exp(-0.125*x))), add = T)

Eyeball fit

Not bad for a start! (Even despite typing 0.56 in place of 0.55, which was a crude approximation anyway.) We can polish it with nls:

fit <- nls(Emissions ~ a * (1- u * exp(-b*Days)), start=list(a=2000, b=1/8, u=0.55))
beta <- coefficients(fit)
plot(Days, Emissions)
curve((y = beta["a"] * (1 - beta["u"] * exp(-beta["b"]*x))), add = T, col="Green", lwd=2)

NLS fit

The output of nls contains extensive information about parameter uncertainty. E.g., a simple summary provides standard errors of estimates:

> summary(fit)

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
a 1.969e+03  1.317e+01  149.51 2.54e-10 ***
b 1.603e-01  1.022e-02   15.69 1.91e-05 ***
u 6.091e-01  1.613e-02   37.75 2.46e-07 ***

We can read and work with the entire covariance matrix of the estimates, which is useful for estimating simultaneous confidence intervals (at least for large datasets):

> vcov(fit)
             a             b             u
a 173.38613624 -8.720531e-02 -2.602935e-02
b  -0.08720531  1.044004e-04  9.442374e-05
u  -0.02602935  9.442374e-05  2.603217e-04

nls supports profile plots for the parameters, giving more detailed information about their uncertainty:

> plot(profile(fit))

Here is one of the three output plots showing variation in a:

Profile plot

E.g., a t-value of 2 corresponds roughly to a 95% two-sided confidence interval; this plot places its endpoints around 1945 and 1995.

Attribution
Source : Link , Question Author : Strohmi , Answer Author : whuber

Leave a Comment