The paper in Science [1] infers change points in COVID

spread in Germany. The authors fit the number of daily cases assuming

one (red), two (orange), and three (green) change points. Every change

point adds two parameters to the model.It is hard to believe that the three change points model capture some underling

physical reality missing in the one change point model. The conclusion that “three

corresponding change points are detected” is based on comparison of

leave-one-out cross-validation (LOO-CV) scores:`[loo log-score] [standard error] [effective number of parameters] three points 787 15 13 two points 796 17 13 one point 819 17 13`

`pymc3.compare(..., ic='LOO', scale='deviance')`

returns (`d_loo`

is a relative difference and`dse`

is a standard error of the difference in score between each model and the top-ranked model):`loo p_loo d_loo weight se dse three points 786.543 13.3241 0 0.933612 15.2098 0 two points 795.797 12.5467 9.25366 0.0662461 16.6689 4.88424 one point 819.280 13.3403 32.737 0.000141764 17.106 8.25306`

`pymc3.plot_elpd`

shows this plot:And I also plot a pointwise predictive accuracy:

Is LOO-CV used correctly?

There is an eLetters exchange related to the paper [4] and a technical

report from the authors [5].

Dehning, J., Zierenberg, J., Spitzner, F. P., Wibral, M., Neto, J. P., Wilczek, M., & Priesemann, V. (2020). Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science.

http://dx.doi.org/10.1126/science.abb9789

code and data: https://zenodo.org/record/3780722Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian

model evaluation using leave-one-out cross-validation and

WAIC. Statistics and computing, 27(5), 1413-1432.

https://doi.org/10.1007/s11222-016-9696-4

(the same reference is used in PyMC3)

`pymc3.loo`

and`pymc3.compare`

documentation and code

https://docs.pymc.io/api/stats.html

https://github.com/arviz-devs/arviz/blob/18797b81/arviz/stats/stats.pyhttps://science.sciencemag.org/content/early/2020/05/14/science.abb9789/tab-e-letters

**Answer**

Overview quick remarks

- The model with three points does make a better fit.
- The fit with three points is only slightly better.
- The model with only one point is not
*very*bad. The difference in loocv score may indicate that the model with more points is a significant/probable/likely improvement, but the effect*size*is only small. - Even if the three points model is a good fit, it may not need to be physical reality.
- The better fit should be interpreted as confirmation that the null hypothesis SIR, with one turn point, is likely not true (in the sense ‘not
*exactly*true’, it might still be a reasonably good description). It does not confirm that the alternative model, with three points, is*correct*(in a physical sense). The correct model (the true model) might be in reality a different model (e.g. a smooth transition instead of change points). It only confirms that the alternative model performs better.

It is hard to believe that the three change points model capture some underling physical reality missing in the one change point model.

### The fit with three change points is indeed more accurate

It is not hard to believe that a model with three change points will do better. A simple SIR model (which assumes homogeneous mixing of all people) is not an exact fit to reality. Those change points will help to make-up for that shortcoming (making it more flexible and able to fit a wider range of different curves).

### But it might not capture physical reality

However, you are right to doubt whether it captures a physical reality. A SIR model is designed as a *mechanistic* model. However, when it is not accurate enough, then it becomes *effectively* just an *empirical* model.

The underlying parameters may not necessarily represent some physical reality. (If you like you could fit a mechanistic model which has obviously not at all any physical reality)

There are many ways how one may have a decrease in the rate of growth without changes in the epidemiological parameters. In spatial and networked SIR models this may be due to local saturation (e.g. see here an example).

As a result

- a fit with an SIR model will underestimate the R0 value (because lower R0 values tend to fit better deflections in the curve).
- when the SIR model is made more flexible with change points then the R0 might be higher initially but the fit will indicate a decrease in growth parameter β which might in reality not exist.

### One change point

So, are these change points fiction? I think not. The value of β in that model does change a lot.

I would not expect that this drop in growth rate is not occurring and that it is something due to a strange adjustment to an SIR model that makes it automatically drop.

Although when N is lower, which I believe is not included as one of the model parameters and seems to be fixed, then a drastic drop in growth rate may occur without a change of the epidemiological parameters.

dIdt=If N or S = N-Iare over/under estimatedthen the drop in this termbecomes underestimated⏞SNβ⏟In that case β will getunderestimated in order tocorrect for the wrong S/N termI−μI

If the wrong N is used then the model will be pushed to correct for this. The same is true when we wrongly assume that all cases are being measured (and thus underestimate the number of cases, because we did not include underreporting).

But anyway, I guess that it is reasonably to say that there is turnpoint/drop in the β there are many epidemiological curves that show a rapid decrease in growth rate. This is, I believe, not due to natural processes like saturation (growing immunity), but instead mostly due to the parameters changing.

### Two or three points

The effect of these models is actually only very subtle. What these extra change points do is make the change from growth to decrease more smooth, and this only occurs over a short period. So instead of one big step you get three small steps between 8 and 22 March.

It is not hard to believe that you will get a smooth decrease in β (many mechanisms may create such change). More difficult is the interpretation. The change points are being related to particular events.

See for instance this quote in the abstract

“Focusing on COVID-19 spread in Germany, we detect change points in the effective growth rate that correlate well with the times of publicly announced interventions”

Or in the text

A third change point … was inferred on March 24 (CI[21,26])]; this inferred date matches the timing of the third governmental intervention

But that is speculation and may be just fiction. This is especially the case since they placed priors exactly on these dates (with standard deviation that more or less matches the size of the credible intervals, we have ‘posterior distribution ≈ prior distribution’ which means that the data did not add so much information regarding the dates):

So it is not like they did a three change point model and it turned out to be *coincidentally* matching the dates of particular interventions (this was my first interpretation after a quick scan of the article). They did not *detect* change points, and it is more like the model had a build in tendency to correlate well with the particular interventions, and place the ‘detected’ points near the dates of the interventions. (in addition there is free parameter for a reporting delay which allows some flexibility of a couple days between the date of change in the curves and the date of change in the interventions, so the date of the change points is not pinpointed/detected/inferred very precisely and overall it is more fuzzy)

### The leave one out cross validation.

Is LOO-CV used correctly?

I believe that the LOO-CV is correctly applied. (but the interpretation is tricky)

I would have to dig into the code to know exactly, but I have little reasons to doubt it. What those scores mean is that the function with three change points did not overfit and was able to better capture the deterministic part of the model (but not that the model with three points is so much better than the model with one point, it is only a small improvement).

- It is not so strange that the function did not over fit. There are quite some data points to even out the noise and preventing that the fitted function is capturing too much noise instead of the underlying deterministic trend.
- It is not so strange that the three change points are better able to capture the deterministic model. The standard SIR model is, out of the box, not really a good fit. Instead of the change points you could get similar improvements with high order polynomial fits or splines. That the change points improve the model may not need to be because of a mechanistic underlying reason.

You might think, hey but what about the small differences between the three curves red, orange, green?

Yes, indeed the differences are only small. The change points occur only over a small time period. While the differences in the LOO-CV scores, from 819 to 796 to 787, may indicate some significance, this may not need to be relating to a ‘large’ effect and neither does the effect for the alternative model need to be relating to some realistic mechanism. See for instance the example in the image below where an additional x2 term is able to significantly improve a fit, but the difference of the effect is only small and the ‘true’ effect is a x3 term instead of the x2 term. But for that example the log likelihood scores are significantly different:

```
> lmtest::lrtest(mod1,mod2)
Likelihood ratio test
Model 1: y ~ x
Model 2: y ~ x + I(x^2)
#Df LogLik Df Chisq Pr(>Chisq)
1 3 15.345
2 4 19.634 1 8.5773 0.003404 **
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
‘.’ 0.1 ‘ ’ 1
```

Also the small differences might be problematic. It is likely not very significant, especially when you consider that the noise is likely correlated. Because of that some degree of overfitting might possibly not be punished in a leave-one-out CV.

Example image and code:

```
set.seed(1)
x <- seq(0,1,0.02)
ydeterministic <- x + 0.5*x^3
y <- ydeterministic + rnorm(length(x),0,0.2)
mod1 <- lm(y~x)
mod2 <- lm(y~x+I(x^2))
plot(x,y, main="small but significant effect",
cex.main = 1, pch = 21, col =1, bg = "white", cex = 0.7,
ylim = c(-0.2,1.7))
lines(x,mod1$fitted.values,col="red", lty = 2)
lines(x,mod2$fitted.values,col="blue", lty =2)
lines(x,ydeterministic, lty = 1 )
lmtest::lrtest(mod1,mod2)
legend(0,1.7,c("true model: y = x + x³", "fit 1: y = x", "fit 2: y = x + x²"),
col = c("black","red","blue"), lty = c(1,2,2), cex = 0.6)
```

*This example is for a linear model, and not a Bayesian setting, but it might help to see intuitively the case of a ‘significant but small effect’, and how this comparison in terms of log-likelihood values, instead of the effect size, is tangential to that.*

**Attribution***Source : Link , Question Author : slitvinov , Answer Author : Community*