Does the use of “variational” always refer to optimization via variational inference?

Examples:

- “Variational auto-encoder”
- “Variational Bayesian methods”
- “Variational renormalization group”

**Answer**

It means using variational inference (at least for the first two).

In short, it’s an method to approximate maximum likelihood when the probability density is complicated (and thus MLE is hard).

It uses Evidence Lower Bound (ELBO) as a proxy to ML:

$log(p(x)) \geq \mathbb{E}_q[log(p, Z)] – \mathbb{E}_q[log(q(Z))]$

Where $q$ is simpler distribution on hidden variables (denoted by $Z$) – for example variational autoencoders use normal distribution on encoder’s output.

The name ‘variational’ comes most likely from the fact that it searches for distribution $q$ that optimizes ELBO, and this setup is kind of like in calculus of variations, a field that studies optimization over functions (for example, problems like: given a family of curves in 2D between two points, find one with smallest length).

There’s a nice tutorial on variational inference by David Blei that you can check out if you want more concrete description.

EDIT:

Actually what I described is one type of VI: in general you could use different divergence (the one I described corresponds to using KL divergence $KL(q, p)$). For details see this paper, section 5.2 (VI with alternative divergences).

**Attribution***Source : Link , Question Author : conner.xyz , Answer Author : Jakub Bartczuk*