# What does “vanilla” mean?

In machine learning blogs I frequently encounter the word “vanilla”. For example, “Vanilla Gradient Descent” or “Vanilla method”. This term is literally never seen in any optimization textbooks.

For instance, in this post, it says:

This is the simplest form of gradient descent technique. Here, vanilla
means pure / without any adulteration. Its main feature is that we
take small steps in the direction of the minima by taking gradient of
the cost function.

Pray tell, what does “adulteration” mean in this context? The author goes further by contrasting vanilla gradient descent with gradient descent with momentum. So in this case vanilla gradient descent is another word for gradient descent.

In another post, it says,

Sadly I have never heard of batch gradient descent either. Oh boy.

Can someone clarify what “vanilla” means and if there is a firmer mathematical definition to it?

, where $x^*$ is randomly sampled from our entire dataset. It is a variant of normal gradient descent, so it wouldn’t be vanilla gradient descent. However, since even stochastic gradient descent has many variants, you might call this “vanilla stochastic gradient descent”, when comparing it to other fancier SGD alternatives, for example, SGD with momentum.