Any way to get consistent test counts when parallel testing is used?

Moving to parallel testing has made one of our builds run 4x faster but the test result files returned from:

sfdx force:apex:test:report

after polling via:

sfdx force:apex:test:run

vary on every run. This is illustrated by the last section of this Jenkins trend chart:

Jenkins trend chart

or this individual Jenkins test summary (showing the difference from the last build):

Jenkins test summary

Is losing consistent test results the price you pay for parallel testing, or is there a way to get a consistent count?

PS

The actual code that is running is this https://github.com/claimvantage/sfdx-jenkins-shared-library/blob/master/vars/runApexTests.groovy

PPS

Adding the query that Daniel Ballinger suggests in our build confirms that the variation is that entire classes don’t get tested e.g.:

...
BenefitClaimedPaymentsControllerTest      Completed  (4/4)
BenefitReductionsTest                     Failed     Could not run tests on class 01p1k000000EEbF because: connection was cancelled here
BenefitReductionsWithLookbackTest         Completed  (22/22)
BenefitReductionGraphsControllerTest      Completed  (1/1)
...

Answer

I just completed a parallel test run. Note that all these tests currently pass when run individually or in the synchronous test mode. After the parallel run I then ran the same SOQL query you are:

select Status, MethodsEnqueued, MethodsCompleted, MethodsFailed 
from ApexTestRunResult 
where AsyncApexJobId = '7071W00006mnz7p'

enter image description here

That came back with:

  • Status: Completed
  • MethodsEnqueued: 811
  • MethodsCompleted: 797
  • MethodsFailed: 13

This aligns somewhat with your findings. E.g. Why doesn’t MethodsCompleted + MethodsFailed equal MethodsEnqueued?

A query against ApexTestResult for the Apex Job showed only 797 records.

select 
Id, TestTimestamp, Outcome, ApexClassId, MethodName, AsyncApexJobId, QueueItemId, RunTime 
from ApexTestResult 
where AsyncApexJobId = '7071W00006mnz7p'

And that included the 13 marked as failures. So the first thing we can conclude is that ApexTestRunResult.MethodsFailed is actually included in the ApexTestRunResult.MethodsCompleted count and there are 14 test methods unaccounted for.

I think the important details are in ApexTestQueueItem with the Status Failed.

select 
Id,ApexClassId, ApexClass.Name,Status,ExtendedStatus,ParentJobId,TestRunResultId,ShouldSkipCodeCoverage
from ApexTestQueueItem 
where ParentJobId = '7071W00006mnz7p'
order by Status desc

There are three test classes that outright failed to run. They failed with the statuses:

  • Status: Failed
  • ExtendedStatus: “Could not run tests on class 01p40000000HDZA because: connection was cancelled here”

I checked the number of test methods in the classes that failed. They were 2, 1, and 11 respectively. So that explains my missing 14 test methods. Those test methods where neither Completed or Failed. They just didn’t run.

For my scenario – There is a data isolation issue when running the tests in parallel that manifests as a “connection was cancelled here” error. As a result, the test methods aren’t run and no results come back for them. As it is a concurrency error the number of failures will vary from test run to test run. I’ve gone into some of the painful details of this in Speeding up Salesforce unit testing performance.

Your scenario is likely similar. You could try adding some sort of guard statement that ApexTestRunResult.ClassesEnqueued equals ApexTestRunResult.ClassesCompleted to make sure entire classes aren’t being excluded from the results.

With this being a concurrency issue it would also be useful to control the number of test cases that run at any one time. The current limit of 30 is way to high when there are issues siloing test data. Please consider voting for the idea – Control the degree of parallelism when running apex tests in parallel


I’ll need to gather some more data to prove it, but my suspicion is that is caused by using the Steaming API to monitor and report on the results.

It will either be a timing issue between when the test job finishes and when the last CometD message is received or possibly an outright failure to receive a message. Either way, the SFDX CLI command is missing a number of ApexTestResult (07M) records.

I’d take your test ApexJobIds and directly query for the corresponding ApexTestResult records. E.g.

select 
    Id,TestTimestamp,Outcome,ApexClassId,MethodName,AsyncApexJobId,QueueItemId,RunTime 
from ApexTestResult 
where AsyncApexJobId = '7071W000052hsrXQAQ'

The other possibility is that the force:apex:test:run command isn’t queuing up all the existing test classes. If it is somehow not creating the ApexTestQueueItem records for a subset of Apex classes then those tests would never run. I’d confirm that the expected number of records are created for the ApexJobId.

select Id,ApexClassId,Status,ExtendedStatus,ParentJobId,TestRunResultId,ShouldSkipCodeCoverage 
from ApexTestQueueItem 
where ParentJobId = '7071W000052hsrXQAQ'

There should be one record per Apex test class.


Attribution
Source : Link , Question Author : Keith C , Answer Author : Daniel Ballinger

Leave a Comment