In deep learning papers, data augmentation is often presented as a type of regularization. For example, this is explored in Chiyan Zhang and coauthor’s presentation at ICLR17, Understanding deep learning requires rethinking generalization. Why is this classification given? Intuitively, I see data augmentation as a way of expanding a

dataset, but regularization as a means of modifying a trainingalgorithmto (hopefully) improve the generalization error.

**Answer**

Regularization (traditionally in the context of shrinkage) adds prior knowledge to a model; a prior, literally, is specified for the parameters. Augmentation is also a form of adding prior knowledge to a model; e.g. images are rotated, which **you** know does not change the class label. Increasing training data (as with augmentation) decreases a model’s variance. Regularization also decreases a model’s variance. They do so in different ways, but ultimately both decrease regularization error.

Section 5.2.2 of Goodfellow et al’s Deep Learning proposes a much broader definition:

Regularization is any modiﬁcation we make to a learning algorithm that

is intended to reduce its generalization error but not its training

error.

There is a tendency to asssociate regularization with shrinkage because of the term “l-p norm regularization”…perhaps “augmentation regularization” is equally valid, although it doesn’t roll off the tongue.

**Attribution***Source : Link , Question Author : Gilly , Answer Author : Community*