I observed that if i use
Database.QueryLocatorin the batch start method than for the
Iterable<sObject>I get better performance. However I could not figure out why.
In my case
Iterable<sObject>is taking around 18-20 seconds but if I use
Database.QueryLocatorthen the process is completed within 5-6 seconds.
Can any one explain how
Database.QueryLocatorworks behind the scene?
I’m not a Salesforce developer, but this is my thoughts on this observation…
Serialising. The result set returned by the start method is likely serialised into the batch subsystem of the platform regardless of how its delivered. This thought is supported by the fact that the records subsequently passed back to the execute method are effectively stale. Meaning changes made to them since the job was started are not reflected.
Execution driven by De-Serialising. As such this observation leads me to consider that the platform serialises the result set regardless, and the job then becomes driven by the platform de-serialising the resultset according to the scope size applied to the job.
Serialisation Overheads vs Querying. While one might expect the serialisation stage to show some improvement for itterable, in theory since the database work has been performed outside of the job execution. This is likely to masked by the platform performance variance and the chunking behaviour of the job. So what i think your observing, giving the variance is not so great, is more than likely to be the usual fluctuations in the platforms performance with it being a shared service.
Further Profiling and Evidence to Present Salesforce Support? That said, if you are finding a pattern regardless of time of day and averaged over a series of tests, perhaps with a larger volume of data to stretch out a bigger consistent and observable variance? Then it may be worth seeing if you can get an explanation from Salesforce via a Support case, though I think its likely they will claim its a bug, you may get some useful further insight to share perhaps? 🙂
In summary i think what your seeing is a combination of two things. Firstly the natural platform performance variance (as the servers respond to different loads by its users, timings vary). Secondly the difference in serialisation times during preparation phase of the job.
Regardless QueryLocator or Iterater the serialisation needs to be done to start processing the items in the job. What you may find if you increase the number of the items, is the time taken to serialise goes up, this will be most apparent in the start phase of the job rather than the overall execution. Or stated another way, a variance in the start phase of the job for higher volumes is not that much of an issue vs the performance of your code in the execute phase.
Note: Serialisation is the process that takes your output from the ‘start’ method and stores them in the internal server store for processing as the job runs. De-serialisation is the reverse of this process, and is being done ahead of the ‘execute’ method being called.