This question is a follow-up or attempt to clear up possible confusion regarding a topic I and many others find a bit difficult, regarding the difference between AIC and BIC. In a very nice answer by @Dave Kellen on this topic (https://stats.stackexchange.com/a/767/30589) we read:

Your question implies that AIC and BIC try to answer the same

question, which is not true. AIC tries to select the model that most

adequately describes an unknown, high dimensional reality. This means

that reality is never in the set of candidate models that are being

considered. On the contrary, BIC tries to find the TRUE model among

the set of candidates. I find it quite odd the assumption that reality

is instantiated in one of the model that the researchers built along

the way. This is a real issue for BIC.In a comment below, by @gui11aume , we read:

(-1) Great explanation, but I would like to challenge an assertion.

@Dave Kellen Could you please give a reference to where the idea that

the TRUE model has to be in the set for BIC? I would like to

investigate on this, since in this book the authors give a convincing

proof that this is not the case. – gui11aume May 27 ’12 at 21:47It seems that this assertion comes from Schwarz himself (1978), although the assertion was not necessary: By the same authors (as @gui11aume links to), we read from their article “Multimodel inference: Understanding AIC and BIC in Model selection” (Burnham and Anderson, 2004):

Does the derivation of BIC assume the existence of a true model, or,

more narrowly, is the true model assumed to be in the model set when

using BIC? (Schwarz’s derivation specified these conditions.) … The

answer … no. That is, BIC (as the basis for an approximation to a

certain Bayesian integral) can be derived without assuming that the

model underlying the derivation is true (see, e.g. Cavanaugh and Neath

1999; Burnham and Anderson 2002:293-5). Certainly, in applying BIC,

the model set need not contain the (noexistent) true model

representing full reality. Moreover, the convergence in probability of

the BIC-selected model to a targbet model (under the idealization of

an iid sample) does not logically mean that that target model must be

the true data-generating distribution).So, I think it is worth a discussion or some clarification (if more is needed) on this subject. Right now, all we have is a comment from @gui11aume (thank you!) under a very highly voted answer regarding the difference between AIC and BIC.

**Answer**

The Information Criterion by Schwarz (1978) was designed with the feature that it asymptotically chooses the model with the higher posterior odds, i.e. the model with the higher likelihood given the data under equal priors. So roughly

\frac{p(M_1|y)}{p(M_2|y)} > 1 \overset{A}{\sim} SIC(M_1) < SIC(M_2)

where \overset{A}{\sim} denotes "asymptotically equivalent" and p(M_j|y) is the posterior of model j given data y. I do not see how this result would depend on model 1 being true (is there even a true model in a Bayesian framework?).

What I think is responsible for the confusion is that the SIC has the other nice feature that, under certain conditions, it will asymptotically select the "true" model if the latter is within the model universe. Both AIC and SIC are special cases of the criterion

IC(k) = -\frac{2}{T} \mathcal{l}(\hat{\theta};y) + k g(T)

where \mathcal{l}(\hat{\theta};y) is the log likelihood of the parameter estimates \hat{\theta}, k is the number of parameters and T is the sample size. When the model universe consists of linear, Gaussian models, it can be shown that we need:

g(T) \to 0 \; \text{as} \;\infty

for the IC not to select a model that is smaller than the true model with probability one and

Tg(T) \to \infty \; \text{as} \;\infty

for the IC not to select a model that is larger than the true model with probability one.

We have that

g_{AIC}(T) = \frac{2}{T},\;\; g_{SIC}(T) = \frac{\ln{T}}{T}

So SIC fulfills both conditions while AIC fulfills the first, but not the second condition. For a very accessible exposition of these features and a discussion of practical implications, see Chapter 6 of this book.

Elliott, G. and A. Timmermann (2016, April). Economic Forecasting. Princeton University Press.

Schwarz, Gideon. "Estimating the dimension of a model." The annals of statistics 6.2 (1978): 461-464.

**Attribution***Source : Link , Question Author : Erosennin , Answer Author : Matthias Schmidtblaicher*