# How to choose significance level for a large data set?

I am working with a data set having N around 200,000. In regressions, I am seeing very small significance values << 0.001 associated with very small effect sizes, e.g. r=0.028. What I’d like to know is, is there a principled way of deciding an appropriate significance threshold in relation to the sample size? Are there any other important considerations about interpreting effect size with such a large sample?

If, on the other hand, you want to make some judgement as to whether to treat a particular coefficient as statistically significant or not, you might want to take Good’s (1982) suggestion as summarized in Woolley (2003): Calculate the q-value as $p\cdot\sqrt{(n/100)}$ which standardizes p-values to a sample size of 100. A p-value of exactly .001 converts to a p-value of .045 — statistically significant still.
You do need to consider whether the relationship you’re seeing is practically significant, as commenters have noted. Converting the figures you quote from $r$ to $r^2$ for variance explained ($r$ is correlation, square it to get variance explained) gives just 3 and 6% variance explained, respectively, which doesn’t seem like much.