Why Are Data Silos Important in Unit Tests?

I have a lot of strong opinions about this issue, but had a hard time finding official documents today to back up my position. A lot of times we say it is bad practice to use seeAllData=true, but we don’t really explain why. Maybe we can generate a more definitive and thorough answer here. So I want to ask the broader community:

Why is it important to create our own data when we write unit tests?

Answer

I think the answer is quite simply one of consistency. It’s also one of minimizing variations between environments we test in. This makes our test more consistent and repeatable. These are all things that are helpful should we need to diagnose or troubleshoot a problem during deployment.

  • When we generate our own data, that eliminates a variable between what happens in any sandbox vs a production environment. We KNOW we’ve used the same test data. We know what values we can expect to see at any point in our unit test.

  • We can create not only the database records, but also our User records. This allows us to control the context for running our unit tests with RunAs().

  • Similarly, we can create users who become the owners of the records in our test data. Doing this creates even more consistency and reduces additional variances between environments (development org to development org or vs production). Again, this allows us to eliminate another variable should we need to troubleshoot a deployment from one environment to another.

  • We don’t always know that the kind of data we need to run our tests will exist in an org if we don’t create it. This is especially true when testing in a sandbox, a fresh org, or one where records of the type our code uses have never been created. How an our test class possibly grab test data from the org if there are no records records that meet the criteria to test all of the conditions our code uses? Clearly we can’t do that. It’s incumbent on us to ensure the data we need is available which means we need to create valid data to test with.

  • Our code must also be bulk safe. In my view, testing using only one record per method doesn’t constitute an adequate unit test. All it does is allow code to “pass the minimum requirements” needed to deploy it. The latter doesn’t ensure our code will function reliably. That’s what system.asserts, positive and negative test cases, bulk test cases along with other test methods are for. I strongly believe that our unit tests need to include those methods as well.

  • I want to add that what I’m speaking of doesn’t prohibit a developer from creating randomness in their data. It still allows for that. The randomness is created in a controlled and consistent manner so the behavior of the code can be predicted and results still asserted. Validity of the data will be known provided the environment doesn’t “do something” to it that’s unexpected.

  • It’s this consistency during creation that allows the diagnostics and troubleshooting of what’s “different” between two environments when there’s an issue with deployment or functionality of the code is not as expected.

This goes beyond the question that’s been asked, but I’m a big believer in Test Driven Development. These help me test functionality as I build my code. When I’m finished, my test class is also completed and it’s not an afterthought or chore. I like to create tests using lists of records where I can change a variable in my class that initially sets the list size to a length of 1. When I’m ready to bulk test, I’m then able to set it to a larger number, including raising it to 200 (or even more if appropriate), to ensure my code is bulk safe.

During deployment, I can lower that value to create 1 record/list to test with so time isn’t wasted. But the method is there in my unit test and I KNOW that my data has been bulk tested and can easily be retested at any time in production. If I want to, the value of that list size can easily be controlled through a custom setting via a test environment utility/helper class. This allows me to retest in bulk at any point in time I want to. I’ll add that I generally create a test utility class that creates all of my objects. This is great way to reuse my code.

So, my answer to the question is we do it so the developer of the code can achieve the maximum amount of control over the consistency of test conditions as he or she possibly can; regardless of the environment it runs in, each and every time the test method is executed. This is a discipline I was taught in engineering school for designing and creating tests. It was also expected when I was an engineer in the automotive industry (see this article by Charles Deming famous for “The Deming Way”). It only seems logical to me that it would apply to software as well since it applies to any other kind of testing, including tests of humans using software and electronics running it as well.

Attribution
Source : Link , Question Author : Adrian Larson , Answer Author : crmprogdev

Leave a Comment