When To Use Set vs. List In SOQL

I have always assumed that it is better to use a Set in my query filters instead of a List. For example:

Set<Id> parentIds = generateParentIds();
List<Child__c> children = [SELECT Id FROM Child__c WHERE Parent__c IN :parentIds];

My reasoning was:

  • Set can be more easily (and quickly) generated from a Map via keySet().
  • Set ought to consume less memory than List (no longer sure this is true).
  • null values can more easily be removed from a Set.
  • Set cannot contain any duplicate values.

What are the actual benefits to using Set instead of List? What are the drawbacks? What factors help decide? For example I know if the field is nillable, then the ability to remove null is much more important, because it helps avoid a table-scan.

Answer

(tl;dr at bottom)

Filter Speed

If you can use Map.keySet(), you’ll get a Set back in about 1/10th of the time versus a loop over those same records and adding all the values. Of course, the Map<Id, SObject>(List<SObject>) constructor only works on Id, not on related ID values, which means you’d have to fall back to a loop. However, in cases where you just want to use the ID of a list of records for the purpose of a query, you can just use the list directly:

Account[] accounts = [SELECT Id FROM Account];
Contact[] contacts = [SELECT AccountId FROM Contact WHERE AccountId IN :accounts];

This takes linear time that’s far smaller than constructing a Map/Set or a List of IDs. Alternatively, you can often use a semi join sub-selects to the same effect:

Contact[] contacts = [SELECT AccountId FROM Contact];
Account[] accounts = [SELECT Id FROM Account WHERE Id IN (SELECT AccountId FROM Contact WHERE Id IN :contacts)];

Assuming the semi join is not prohibited (e.g. you cannot use it with an OR filter), this is the fastest method available. If not, then binding to list directly is faster, followed by a Map Set, followed by a plain Set or List.

In other words, while generateParentIds() seems to be handy, it’s actually slower than simply baking the logic directly into the query, if possible. If you have to choose a List or a Set, there’s no clear benefit to either, as they’ll take the exact same amount of time to filter records.

Size

Given a List and a Set that each contain the same, unique values, both collections will be the same size, byte for byte. Once you start having duplicate values, then the set becomes more useful. However, for the volume of values you’d expect from a typical collection, there’s little meaningful difference between a list and a set. For example, in a trigger of 200 records, the maximum size of a list of ID values should be about 3,000 bytes. As you can see, a handful of ID values don’t use much storage. You will not experience any benefits at all until you have non-unique values, and even then, the savings don’t add up unless there’s just a few values repeated many times.

Storage Speed

Here’s where things get tricky. At very small sizes object sizes, and very small data sizes, a Set and a List are about equal in terms of speed, assuming you’re using native objects. For example, storing values in to a Set of ID values is about the same speed as using a List, assuming there’s not many values.

Consider the following code:

Integer counter = 0;
Set<Id> values = new Set<Id>();
// Id[] values = new Id[0];
while(counter++<100000) {
    String k = String.valueOf(counter), v = '005000000000000'.left(15-k.length()); 
    values.add(v+k); 
}

Building a list of 100,000 unique values in a list or a set shows a very similar time stamp (~4.8 seconds). It’s not until you get to absurd values (~1,000,000 values) that you start noticing a considerable penalty, and that’s usually moot because anything even as complex as the above code will CPU time out far before processing 1,000,000 values.

Using this very simple loop:

while(counter++<1000000) values.add(counter);

I get about 6 seconds for a List, and 7 seconds for a Set. And that’s over 1,000,000 values. The deviation in terms of time penalty is about 0.001 ms per item in the list. I promise, your users won’t notice your choice of using a Set over a List.

Retrieval Speed

Here’s where a Set really shines. Given these two blocks of code:

for(Case record:Trigger.new) {
    if(record.AccountId != null) {
        accountIds.add(record.AccountId);
    }
}

Or

for(Case record:Trigger.new) {
    accountIds.add(record.AccountId);
}
accountIds.remove(null);

The clear winner is the Set. While it’s harder to benchmark this sort of code, I do know that repeatedly checking a condition each iteration is linearly more expensive than having to check just once. The results can be pretty impressive on large lists (~100 ms or more in some extreme cases).

tl;dr

At normal volumes that you’d find within a trigger context and most types of page contexts, there’s no clear benefit to using either a List or a Set in terms of memory or speed, with the sole exception that a Set can identify and remove bad values in near-constant time (far better than a List can do). So, if you need to avoid a specific value, use a Set, if you need a specific order or possibly duplicate values, use a List, and if neither apply, just use whatever you feel like.

A query that uses IN, with the same value more than once, does not cause an error, so there’s no point in worrying about uniqueness in normal filters. However, if you’re truly concerned about query performance, consider using a semi join sub-select, which is far superior in terms of speed than either the List or the Set. Obviously, this doesn’t work on records not committed in the database, so often you will need to fall back to using either type of collection. I’m just advocating that you not worry about which you use as long as it gets the correct results.

Attribution
Source : Link , Question Author : Adrian Larson , Answer Author : sfdcfox

Leave a Comment