What exactly does it mean to ‘pool data’?

I thought that ‘pooling data’ simply meant combining data that was previously split into categories…essentially, ignoring the categories and making the data set one giant ‘pool’ of data. I guess this is a question more about terminology than application of statistics.

For example: I want to compare 2 sites, and within each site I have two year-types (good and poor). If I want to compare the 2 sites ‘overall’ (that is, ignoring the year-types), is it correct to say that I’m pooling the data within each site? Further to that, since several years of data comprise the good and poor year-types, is it also correct to say that I am pooling the data among years to achieve the ‘good year’ and ‘poor year’ data set within each site?
Thanks for your help!


Yes, your examples are correct.

The Oxford English Dictionary defines pool as:

pool, v.


1.1 trans. To throw into a common stock or fund to be distributed according to agreement; to combine (capital or interests) for the common benefit; spec. of competing railway companies, etc.: To share or divide (traffic or receipts).

Another example would be:

you measure blood levels of substance X in males and females. You don’t see statistical differences between the two groups so you pool the data together, ignoring the sex of the experimental subject.

Whether it is statistically correct to do so depends very much on the specific case.

