I always struggle to get the true essence of identification in econometrics. I know that we state that a parameter (say ˆθ) can be identified if by simply looking at its (joint) distribution we can infer the value of the parameter. In a simple case of y=b1X+u, where E[u]=0,E[u|x]=0 we can state that b1 is identified if we know that its variance Var(ˆb)>0. But what if E[u|X]=a where a is an unknown parameter? Can a and b1 be identified?

If I expand the model to Y=b0+b1X+b2XD=u where D∈{0,1} and E[u|X,D]=0, to show that b1,b2,b3are identified, do I simply have to restate that the variance for all three parameters is greater than zero?

I appreciate all the help on clearing my mind concerning identification.

**Answer**

Lets first define the following objects: In a statistical model M that is used to model Y as a function of X, there are p parameters denoted by vector θ. These parameters are allowed to vary within the parameter space Θ⊂Rp. We are not interested in estimation of *all* these parameters, but only of a certain subset, say in q≤p of the parameters that we denote θ0 and that varies within the parameter space Θ0⊂Rq. In our model M the variables X and the parameters θ will now be mapped such as to explain Y. This mapping is defined by M and the parameters.

Within this setting, identifiability says something about **Observational Equivalence**. In particular, if parameters θ0 are identifiable w.r.t. M then it will hold that ∄θ1∈Θ0:θ1≠θ0,M(θ0)=M(θ1). In words, there does not exist a different parameter vector θ1 that would induce the same data generating process, given our model specification M.

To make these concepts more conceivable, I give two examples.

**Example 1**: Define for θ=(a,b); X∼N(μ,σ2In);ε∼N(0,σ2eIn) the simple statistical model M:

Y=a+Xb+ε

and suppose that (a,b)∈R2 (so Θ=R2).

It is clear that whether θ0=(a,b) or θ0=a, it will always hold that θ0 is identifiable: The process generating Y from X has a 1:1 relationship with the parameters a and b. Fixing (a,b), it will not be possible to find a second tuple in R describing the same Data Generating Process.

**Example 2**: Define for θ=(a,b,c); X∼N(μ,σ2In);ε∼N(0,σ2eIn) the more tricky statistical model M′:

Y=a+X(bc)+ε

and suppose that (a,b)∈R2 and c∈R∖{0} (so Θ=R3∖{(l,m,0)|(l,m)∈R2}). While for θ0, this would be an identifiable statistical model, this does not hold if one includes another parameter (i.e., b or c). Why? Because for any pair of (b,c), there exist infinitely many other pairs in the set B:={(x,y)|(x/y)=(b/c),(x,y)∈R2}. The obvious solution to the problem in this case would be to introduce a new parameter d=b/c replacing the fraction to identify the model. However, one might be interested in b and c as separate parameters for theoretical reasons – the parameters could correspond to parameters of interest in an (economic) theory sense. (E.g., b could be ‘propensity to consume’ and c could be ‘confidence’, and you might want to estimate these two quantities from your regression model. Unfortunately, this would not be possible.)

**Attribution***Source : Link , Question Author : CharlesM , Answer Author : Jeremias K*