While reading a few papers, I came across the term “gold set” or “gold standard”. What I don’t understand is what makes a dataset gold standard? Peer acceptance, citation count and if its the liberty of the researcher and the relevance to problem he is attacking?
Say that you want to measure a certain polluant in drinking water, the golden standard will be the method which detects it with the highest sensitivity and accuracy. Any other method can then be compared to it, knowing that -under certain conditions- the golden standard is the best (e.g.: if you need to measure the polluant on site, the golden standard will not be any method that require huge machinery to be used, as you will not be able to bring it with you).
I think the Wikipedia article explains it quite well:
In medicine and statistics, gold standard test refers to a diagnostic, test or benchmark that is the best available under reasonable conditions. It does not have to be necessarily the best possible test for the condition in absolute terms. For example, in medicine, dealing with conditions that require an autopsy to have a perfect diagnosis, the gold standard test is normally less accurate than the autopsy. Anyway, it is obviously preferred by the patients.