# Statistical inference under model misspecification

I have a general methodological question. It might have been answered before, but I am not able to locate the relevant thread. I will appreciate pointers to possible duplicates.

(Here is an excellent one, but with no answer. This is also similar in spirit, even with an answer, but the latter is too specific from my perspective. This is also close, discovered after posting the question.)

The theme is, how to do valid statistical inference when the model formulated before seeing the data fails to adequately describe the data generating process. The question is very general, but I will offer a particular example to illustrate the point. However, I expect the answers to focus on the general methodological question rather than nitpicking on the details of the particular example.

Consider a concrete example: in a time series setting, I assume the data generating process to be

with $u_t \sim i.i.N(0,\sigma_u^2)$. I aim to test the subject-matter hypothesis that $\frac{dy}{dx}=1$. I cast this in terms of model $(1)$ to obtain a workable statistical counterpart of my subject-matter hypothesis, and this is

So far, so good. But when I observe the data, I discover that the model does not adequately describe the data. Let us say, there is a linear trend, so that the true data generating process is

with $v_t \sim i.i.N(0,\sigma_v^2)$.

How can I do valid statistical inference on my subject-matter hypothesis $\frac{dy}{dx}=1$?

• If I use the original model, its assumptions are violated and the estimator of $\beta_1$ does not have the nice distribution it otherwise would. Therefore, I cannot test the hypothesis using the $t$-test.

• If, having seen the data, I switch from model $(1)$ to $(2)$ and change my statistical hypothesis from $H_0\colon \ \beta_1=1$ to $H'_0\colon \ \gamma_1=1$, model assumptions are satisfied and I get a well-behaved estimator of $\gamma_1$ and can test $H'_0$ with no difficulty using the $t$-test.
However, the switch from $(1)$ to $(2)$ is informed by the data set on which I wish to test the hypothesis. This makes the estimator distribution (and thus also inference) conditional on the change in the underlying model, which is due to the observed data. Clearly, the introduction of such conditioning is not satisfactory.

Is there a good way out? (If not frequentist, then maybe some Bayesian alternative?)