It seems like everyone just uses
set.seed(1234)when they are doing random sampling. If so many people use just a select few integers for
set.seed(), doesn’t that mean that everyone is drawing from the same state of the random number generator and therefore all results are not a true random sample?
An interesting question, though I don’t know whether it’s answerable here at CV. A few thoughts:
If you do an analysis involving random sampling, it’s always a good idea to re-run it with different seeds, just to assess whether your results are sensitive to the choice of seed. If your results vary “much”, you should revisit your analysis (and/or your code).
If everyone did this, I wouldn’t worry overly about the aggregate effect of everyone in the end using the same seed, because after this sanity check, everyone’s results don’t depend on it too much any more.
Given that random numbers are used in many, many, many different contexts, with different models used in different applications, transforming the pseudorandom numbers in different orders and in different ways, I wouldn’t worry too much about a possible systematic effect overall. Even if, yes, such an effect could in theory be visible on an aggregate level even when it is not visible to each separate researcher as per the previous bullet point.
Finally, I personally never use 123 or 1234 as seeds. I use 1 😉 Or the year. Or the date. I really don’t think 123 or 1234 are all that prevalent as seeds. You could of course set up a poll somewhere.