# How does quantile normalization work?

In gene expression studies using microarrays, intensity data has to be normalized so that intensities can be compared between individuals, between genes. Conceptually, and algorithmically, how does “quantile normalization” work, and how would you explain this to a non-statistician?

The conceptual understanding is that it is a transformation of array $j$ using a function $\hat{F}^{-1} \circ \hat{G}_j$ where $\hat{G}_j$ is an estimated distribution function and $\hat{F}^{-1}$ is the inverse of an estimated distribution function. It has the consequence that the normalized distributions become identical for all the arrays. For quantile normalization $\hat{G}_j$ is the empirical distribution of array $j$ and $\hat{F}$ is the empirical distribution for the averaged quantiles across arrays.