What is the difference between endogeneity and unobserved heterogeneity? I know that endogeneity comes for example from omitted variables? But as far as I understand, unobserved heterogeneity causes the same problem. But where exactly lays the difference between these two notions?
The terms endogeneity and unobserved heterogeneity often refer to the same thing but usage varies somewhat, even within economics, the discipline I most associate with the terms.
In a regression equation, an explanatory variable is endogenous if it is correlated with the error term.
Endogeneity is often described as having three sources: omitted variables, measurement error, and simultaneity. Though it is often helpful to mention these “sources” separately, confusion sometimes arises because they are not truly distinct. Imagine a regression predicting the effect of education on wages. Perhaps our measure of education is simply the number of years someone spent in formal education, regardless of the type of education. If I have a clear idea of what type of education affects wages, I might describe this situation as measurement error in the education variable. Alternatively, I could describe the situation as an omitted variables problem (the variables indicating type of education).
Perhaps wages also affect education decisions. If wages and education are measured at the same time this is an example of simultaneity, but it too, might be reframed in terms of omitted variables.
Unobserved heterogeneity is simply variation/differences among cases which are not measured. If you understand endogeneity, I think you understand the implications of unobserved heterogeneity in a regression context.