EAI_AGAIN errors when pushing or running unit tests in a newly created scratch org from Jenkins CI using SFDX

We sometimes get this error reported from a sfdx force:source:push of a largish code base:

ERROR: Error: getaddrinfo EAI_AGAIN
nosoftware-momentum-7459-dev-ed.cs9.my.salesforce.com:443

some six and a half minutes into the push. The push starts about 20 seconds after the sfdx force:org:create that creates the scratch org has completed. We are running Jenkins Pipeline CI on AWS.

Some Googling suggests this EAI_AGAIN error (coming from Node.js that sfdx runs on) means:

Temporary failure in name resolution

Has anyone found a workaround for this? Note we are running on AWS.

PS

We are using parallel pipelines and running many builds at once: does this error get generated when the DNS service is overloaded with requests?

PPS

From the AWS docs this might be relevant:

Each Amazon EC2 instance limits the number of packets that can be sent
to the Amazon-provided DNS server to a maximum of 1024 packets per
second per network interface. This limit cannot be increased.

More…

Interesting to see some explicit checking for this error in yarn.js:

  async exec(args = []) {
    ...
    try {
      await this.fork(this.bin, args, options);
      debug('done');
    } catch (err) {
      // TODO: https://github.com/yarnpkg/yarn/issues/2191
      let networkConcurrency = '--network-concurrency=1';
      if (err.message.includes('EAI_AGAIN') && !args.includes(networkConcurrency)) {
        debug('EAI_AGAIN');
        return this.exec(args.concat(networkConcurrency));
      } else throw err;
    }
  }

Answer

Tried adding 3 minute, 6 minute, 9 minute etc delay for each parallel build so that the builds are not at the same phase at the same time. Resulted in one clean build but looks like that was just random good luck. Not a solution.

So changed to polling thanks to the answer to this question Any way to use sfdx force:apex:test:report to poll?. That seems to work around the problem but requires ugly code; this is from a Jenkins pipeline for the unit testing part that most frequently has the errors, though the push does too:

def experiencingEaiAgainErrors = true

if (experiencingEaiAgainErrors) {

    // Use polling to workaround EAI_AGAIN errors

    def r1 = shWithResult "sfdx force:apex:test:run --testlevel RunLocalTests --targetusername ${org.username} --json"
    def testRunId = r1.testRunId

    def totalSleeps = 0
    def status = ''
    while (status != 'Completed' && totalSleeps < 180) {

        sleep 60
        totalSleeps++;

        def query = "select Status, MethodsEnqueued, MethodsCompleted, MethodsFailed from ApexTestRunResult where AsyncApexJobId = '${testRunId}'"
        def r2 = shWithResult "sfdx force:data:soql:query --usetoolingapi --query \"${query}\" --targetusername ${org.username} --json"
        def record = r2.records[0]

        status = record.Status;
        def enqueued = record.MethodsEnqueued
        def completed = record.MethodsCompleted
        def failed = record.MethodsFailed

        echo "Test run status is \"${status}\" with ${completed} of ${enqueued} methods run (${failed} methods failed) after ${totalSleeps} one minute sleeps"
    }

    // Deliberately no status check so build doesn't fail immediately
    sh returnStatus: true, script: "sfdx force:apex:test:report --testrunid ${testRunId} --outputdir ${testResultsDir} --resultformat tap --targetusername ${org.username}"
} else {

    // Desired, simple approach

    // Deliberately no status check so build doesn't fail immediately
    sh returnStatus: true, script: "sfdx force:apex:test:run --synchronous --testlevel RunLocalTests --outputdir ${testResultsDir} --resultformat tap --targetusername ${org.username} --wait 180"
}

Attribution
Source : Link , Question Author : Keith C , Answer Author : Keith C

Leave a Comment