SOQL query selectivity changes with number of IN clause records

I have a custom object with a Master-Detail relationship to Opportunity. The relationship field to the parent Opportunity is automatically indexed.

There are 200,000 plus of these custom objects in production, so the selectivity of the any SOQL query from the custom object is important.

A trigger on Opportunity needs to find all the related child records from the custom object. Because the query is in a trigger context in needs to be bulkified to handle up to 200 records at a time. E.g.

List<OppChild__c> children = [Select Id, Name, SomeField__c from OppChild__c where Opportunity__c in :triggerOpportunityIds];

If there is only a single Opportunity in the trigger the query is fine and the Query Plan from the developer console is using the Index for a cost of 0.00024.

enter image description here

However, if I have two Opportunities the cost raises to 0.00508.

By the time you reach a full 200 Opportunity ids the relative cost of the index scan exceeds that of a TableScan!

I ran a whole lot of query plans for various numbers of Ids for the IN clause and came up with the following.

le

Up until 20 records the relative cost of using the index is lowest. But from 20 records onward the TableScan has the lowest cost and the query fails.

Is it expected that the selectivity with an IN clause over an index would degrade so quickly?


I don’t believe data skew is an issue here. Each Opportunity is expected to have exactly 3 child custom objects. So the SOQL query in the trigger is only expected to return 200 * 3 = 600 records. This should be well below the selectivity threshold of 10% for the first million records.


The cardinality for each query plan seems very high. Due to the data structure each Opportunity can only have up to 3 child custom objects. Yet both the Index and Tablescan are returning cardinalities that are 100 to 1000 times more records (rather than 3 times).

enter image description here

Answer

Further analysis done. This is a child object with total cardinality of 3.7 million records which is being filtered based on parent field.

enter image description here

Below are the observations:

  1. The query cost is directly proportional to the number of records retrieved from database for that particular type of field (parent lookup field in my case).
  2. IN operator is virtually OR conditions in terms of cost. ParentId IN ('id1','id2') is same as ParentId='id1' OR ParentId='id2' and so the cost simply adds up.
  3. As observed above each record retrieving cost is 0.00000427332165921 which is constant for that parent field filter. It is always the multiple of this number to the total records being retrieved from database either by = or IN operator.

Now coming to your use case:

From your screenshot,
enter image description here

Cost of each child record is 0.0000483956831 for given Parent Opportunity field AB2__Opportunity__c.

However, if I have two Opportunities the cost raises to 0.00508.

This makes sense as the number of child records for those 2 opportunities is 105 whereas the first opportunity has only 5 child records.

Considering this, I think it is better to check how many child records is being returned when you are using 200 opportunity ids. (It should be much more than 3 child records per opportunity)

Attribution
Source : Link , Question Author : Daniel Ballinger , Answer Author : salesforce-sas

Leave a Comment