Per Image Normalization vs overall dataset normalization

I am confused whether the standardization (subtract mean and divide by std) should be done per image basic or across the overall dataset. While overall dataset makes more sense, popular libraries like TensorFlow provide functions like tf.image.per_image_standardization that does the following.

Linearly scales image to have zero mean and unit norm.

This op computes (x – mean) / adjusted_stddev, where mean is the average of all values in image, and adjusted_stddev = max(stddev, 1.0/sqrt(image.NumElements())).

stddev is the standard deviation of all values in image. It is capped away from zero to protect against division by 0 when handling uniform images.

Is this good enough?


Each method has their own purposes. In sequential data such as speech [1], the mean and covariance are calculated from an utterance (a recording) and is then subtracted from all the observations in that utterance. This is done for each utterance separately.

In images on the other hand, one image can be seen as a sequence of pixels. Therefore the mean and variance in an image are calculated from individual pixels in that image.
For pixed-wise or per-image normalization, mean and covariance are calculated for each image separately.

In case of the overall normalization, it is better though to calculate the mean and variance from the training data and use it to normalize all the sets including training, validation, test etc.


Source : Link , Question Author : Karthik Hegde , Answer Author : PickleRick

Leave a Comment