In deep learning papers, data augmentation is often presented as a type of regularization. For example, this is explored in Chiyan Zhang and coauthor’s presentation at ICLR17, Understanding deep learning requires rethinking generalization. Why is this classification given? Intuitively, I see data augmentation as a way of expanding a dataset, but regularization as a means of modifying a training algorithm to (hopefully) improve the generalization error.
Regularization (traditionally in the context of shrinkage) adds prior knowledge to a model; a prior, literally, is specified for the parameters. Augmentation is also a form of adding prior knowledge to a model; e.g. images are rotated, which you know does not change the class label. Increasing training data (as with augmentation) decreases a model’s variance. Regularization also decreases a model’s variance. They do so in different ways, but ultimately both decrease regularization error.
Section 5.2.2 of Goodfellow et al’s Deep Learning proposes a much broader definition:
Regularization is any modiﬁcation we make to a learning algorithm that
is intended to reduce its generalization error but not its training
There is a tendency to asssociate regularization with shrinkage because of the term “l-p norm regularization”…perhaps “augmentation regularization” is equally valid, although it doesn’t roll off the tongue.