I am currently working on trying to implement a method used in a popular paper titled “I Just Ran Two Million Regressions”. The basic idea behind it is that there are certain cases where it is not obvious what controls should be included in the model. One thing that you can do in such a case is to randomly draw controls, run millions of different regressions and then see how your variable of interest reacted. If it generally has the same sign in all specifications then we can consider it more robust than a variable whose sign always changes.
Most of the paper is very clear. However, the paper weights all those different regressions in the following manner: The integrated likelihood of the given specification is divided by the sum of all the integrated likelihoods for all the specifications.
The trouble that I am having is that I am not sure how the integrated likelihood relates to the OLS regressions that I would like to run (in Stata). Googling topics such as “stata integrated likelihood” has been a dead end as I keep on running into things like mixed effects logistic regression. I confess that these models are too complex for me to grasp.
My current work around is that that there are different weighting schemes used in the literature that I do (kind-of) understand. For example, it is possible to weight each regression based on the likelihood ratio index. There is even an R package that uses the lri as weights. Naturally though, I would like to also implement the original one.
For OLS, you can still compute the likelihood function (the exponentiated log likelihood, as Christoph Hanck mentions in the comment). It is just the good old Li=∏i(2πσ2)−.5exp(−.5(yi−xiβ)2). Stata stores this as
e(ll) after running a regression using
Then you construct weights as wi=Li∑jLj.
Finally, you construct weighted averages of your regression coefficients using wi as weights.