In gene expression studies using microarrays, intensity data has to be normalized so that intensities can be compared between individuals, between genes. Conceptually, and algorithmically, how does “quantile normalization” work, and how would you explain this to a non-statistician?

**Answer**

A comparison of normalization methods for high density oligonucleotide array data based on variance and bias by Bolstad et al. introduces quantile normalization for array data and compares it to other methods. It has a pretty clear description of the algorithm.

The conceptual understanding is that it is a transformation of array $j$ using a function $\hat{F}^{-1} \circ \hat{G}_j$ where $\hat{G}_j$ is an estimated distribution function and $\hat{F}^{-1}$ is the inverse of an estimated distribution function. It has the consequence that the normalized distributions become identical for all the arrays. For quantile normalization $\hat{G}_j$ is the empirical distribution of array $j$ and $\hat{F}$ is the empirical distribution for the averaged quantiles across arrays.

At the end of the day it is a method for transforming all the arrays to have a common distribution of intensities.

**Attribution***Source : Link , Question Author : Stephen Turner , Answer Author : NRH*