How can updated SObjects/CObject be queried/retrieved without duplicate or missing records?

Another way to ask the question: Are SOQL queries transactional/atomic on the second level?

I am writing a data integration with salesforce using the REST API. I want to retrieve a list of updated objects since the last time I pulled data from the API. This could be accomplished by a query like this:
SELECT Id, LastModifiedDate FROM Task WHERE LastModifiedDate > 2013-12-18T21:50:43.000+0000

The value “2013-12-18T21:50:43.000+0000” would come from the greatest LastModifiedDate available from the copy of the objects I already have.

BUT, how do I know that I retrieved all records for the 43rd second when I last queried the database? Are SOQL queries transactional/atomic on the second level? If they are transactional then the above query will retrieve the exact records I am interested in. If it is not transactional then I would need a query like this:
SELECT Id, LastModifiedDate FROM Task WHERE LastModifiedDate >= 2013-12-18T21:50:43.000+0000

BUT, now I will have duplicates for all records modified for the 43rd second. I know this wouldn’t be a huge deal, but I would like to pull back as little data as possible.

I am aware of the updated endpoint in the REST API. Documentation for that endpoint can be found here:
http://www.salesforce.com/us/developer/docs/api_rest/Content/resources_getupdated.htm
I do not want to use that endpoint for two reasons.
1) only Ids are retrieved. (I can’t retrieve a list of columns that I am interested in, I would need additional api calls to get those).
2) It ignores anything less than minutes in the start and end values. If I am going to hit the API, I don’t want to miss out on the last 1-59 seconds of activity.

Answer

The problem with LastModifiedDate is that it is updated sometime after the record has completed updating, which means you can miss records that were “in flight” at the time of the last query, since they were not committed (there’s a similar question that addresses this problem on this site). In order to be 100% accurate, you should use the replication API (getUpdated/getDeleted SOAP API or updated/deleted REST API). This will always include records that fall within the requested date range to the current date and give you the ability to accurately retrieve all records without duplication. You should take a look at those calls to see how they operate.

Attribution
Source : Link , Question Author : edgartheunready , Answer Author : sfdcfox

Leave a Comment