# What is the difference between conditioning on regressors vs. treating them as fixed?

Sometimes we assume that regressors are fixed, i.e. they are non-stochastic. I think that means all our predictors, parameter estimates etc. are unconditional then, right? Might I even go so far that they are no longer random variables?

If on the other hand we accept that most regressors in economics say are stochastic because no outside force determined them with some experiment in mind. Econometricians then condition on these stochastic regressors.

How is this different from treating them as fixed?

I understand what conditioning is. Mathematically, it means we make all observations and inference conditional on that particular set of regressors and have no ambitions to say that inferences, parameter estimates, variance estimates etc. would have been the same had we seen a different realization of our regressors (such is the crux in time series, where each time series is only ever seen once).

However, to really grasp the difference between fixed regressors vs. conditioning on stochastic regressors, I am wondering if anyone here knows of an example of an estimation or inference procedure that is valid for say fixed regressors but breaks down when they are stochastic (and will be conditioned on).

I am looking forward to seeing those examples!

Here I am on thin ice but let me try: I have a feeling (please comment!) that a main difference between statistics and econometrics is that in statistics we tend to consider the regressors as fixed, hence the terminology design matrix which obviously comes from design of experiments, where the supposition is that we are first choosing and then fixing the explanatory variables.

But for most data sets, most situations, this is a bad fit. We are really observing the explanatory variables, and in that sense they stand at the same footing as the response variables, they are both determined by some random process outside our control. By considering the $$xx$$‘s as “fixed”, we decide not to consider a lot of problems which that might cause.

By considering the regressors as stochastic, on the other hand, as econometricians tend to do, we open the possibility of modeling which try to consider such problems. A short list of problems we then might consider, and incorporate into the modeling, is:

Probably, that should be done much more frequently that it is done today? Another point of view is that models are only approximations and inference should admit that. The very interesting paper The Conspiracy of Random Predictors and Model Violations
against Classical Inference in Regression
by A. Buja et.al. takes this point of view and argues that nonlinearities (not modeled explicitely) destroys the ancillarity argument given below.

EDIT


I will try to flesh out an argument for conditioning on regressors somewhat more formally. Let $$(Y,X)(Y,X)$$ be a random vector, and interest is in regression $$YY$$ on $$XX$$, where regression is taken to mean the conditional expectation of $$YY$$ on $$XX$$. Under multinormal assumptions that will be a linear function, but our arguments do not depend on that. We start with factoring the joint density in the usual way
$$f(y,x)=f(y∣x)f(x) f(y,x) = f(y\mid x) f(x)$$
but those functions are not known so we use a parameterized model
$$f(y,x;θ,ψ)=fθ(y∣x)fψ(x) f(y,x; \theta, \psi)=f_\theta(y \mid x) f_\psi(x)$$
where $$θ\theta$$ parameterizes the conditional distribution and $$ψ\psi$$ the marginal distribution of $$XX$$. In the normal linear model we can have $$θ=(β,σ2)\theta=(\beta, \sigma^2)$$ but that is not assumed. The full parameter space of $$(θ,ψ)(\theta,\psi)$$ is $$Θ×Ψ\Theta \times \Psi$$, a Cartesian product, and the two parameters have no part in common.

This can be interpreted as a factorization of the statistical experiment, (or of the data generation process, DGP), first $$XX$$ is generated according to $$fψ(x)f_\psi(x)$$, and as a second step, $$YY$$ is generated according to the conditional density $$fθ(y∣X=x)f_\theta(y \mid X=x)$$. Note that the first step does not use any knowledge about $$θ\theta$$, that enters only in the second step. The statistic $$XX$$ is ancillary for $$θ\theta$$, see https://en.wikipedia.org/wiki/Ancillary_statistic.

But, depending on the results of the first step, the second step could be more or less informative about $$θ\theta$$. If the distribution given by $$fψ(x)f_\psi(x)$$ have very low variance, say, the observed $$xx$$‘s will be concentrated in a small region, so it will be more difficult to estimate $$θ\theta$$. So, the first part of this two-step experiment determines the precision with which $$θ\theta$$ can be estimated. Therefore it is natural to condition on $$X=xX=x$$ in inference about the regression parameters. That is the conditionality argument, and the outline above makes clear its assumptions.

In designed experiments its assumption will mostly hold, often with observational data not. Some examples of problems will be: regression with lagged responses as predictors. Conditioning on the predictors in this case will also condition on the response! (I will add more examples).

One book which discusses this problems in a lot of detail is Information and exponential families: In statistical theory by O. E Barndorff-Nielsen. See especially chapter 4. The author says the separation logic in this situation is however seldom explicated but gives the following references: R A Fisher (1956) Statistical Methods and Scientific Inference $$§4.3\S 4.3$$ and Sverdrup (1966) The present state of the decision theory and the Neyman-Pearson theory.