Delta Sync

Overview

Typically when you are synchronizing data between two systems you have to move every record the first time (to fill up the target), and then from that point forward you prefer to only move things that have changed.

These smaller sips of data are called “delta” syncs, or sometimes “changes only” syncs. They are preferred because moving small amounts of data means faster and more efficient updates to the target system (and with Salesforce in particular, it’s nice to have the LastModifiedDate field actually be meaningful and not have every record have the same value).

Delta syncing behavior is baked into Valence, and all source Adapters are expected to support it if the data they are talking to can be filtered appropriately.

Note

Sometimes, a data source can’t accurately detect what has changed, and in this circumstance the fallback is simply to get all the records each time. Less efficient, but still quite effective.

Whenever a Link run occurs (generating a Sync Event), the timestamp at which that run occurred is stored on the Sync Event. Depending on the results of the run, that Sync Event is marked as successful or unsuccessful.

If successful, Valence will remember that timestamp, which we refer to as the “last successful sync” timestamp.

Note

Valence always runs allowing “partial success”, meaning that if some records succeed they are allowed to be written to the target. A Sync Event is always marked strictly pass/fail, but a failed sync event might have 95% of records successfully written. We don’t want to lose those stragglers, though, so for safety this is still considered a failed run overall.

During a Link run, the last successful sync timestamp is made available so that the source Adapter can use it with the data it is about to fetch. That source Adapter simply attaches a filter of “last modified >= lastSuccessfulSync” to whatever records it is gathering for the run.

Summary

By tracking the last successful sync and only updating it on a successful run, we ensure that no records are ever dropped on the floor and forgotten about because they had an error or there was an issue with the run. As an example, let’s say you have a Link that ran daily. On Tuesday it is fetching all the records that have changed since Monday. Uh oh! There’s a failure in the Tuesday run. It runs again on Wednesday, except this time it is fetching 48 hours of changes instead of 24, because the lastSuccessfulSync timestamp is still from the Monday run that was successful. For each failed run the window of records being fetched expands so that none are left behind.

This also means that Valence self-recovers from many kinds of typical integration issues. If your external system is offline, or there’s a server error, etc, a run might fail but the next run will catch you up and things will be humming along again in no time. The combination of delta syncing and tracking the last successful timestamp is your first line of defense against integration issues. To learn about the others, check out Handling Errors.

Warning

A common mistake people make is to let one bad record “gum up the works”. If you have a record that is malformed or broken in some way and causes a failed run every time it is included, the lastSuccessfulSync timestamp will not be advanced forward and your fetch window will get larger and larger over time. You might not notice right away if most of your records are individually still succeeding, as data will still be flowing, you’ll just be missing some.

Solve this by keeping an eye on your Sync Events (checking from time to time, or setting up a Report or Dashboard or notification of your preference to monitor for Sync Event records where valence__Success__c == false and valence__Status__c == Completed).

Delta Sync With Cursors

Not every system uses a last modified timestamp to track changed records. Perhaps you’re interacting with a message queue of some kind, where each message has a unique sequential number.

In these sorts of scenarios Valence still supports delta sync out of the box. Instead of working with LinkContext’s lastSuccessfulSync, use lastSuccessfulCursor. This is a value you tell Valence about, and then Valence gives it back to you on the next run. So, for example if you see as far as message 200, the next time you fetch records you will get “200” and can start from there. This doesn’t have to be a numeric value, perhaps it’s a GUID or some other identifier. What’s important is that it makes sense to you and the system you are fetching data from.

Syntax for setting the cursor: valence.RunContext.currentContext().setCursor(yourStringCursorValue);

Tip

The same explanation above about successful and failed runs still applies. If you have run A that finishes at cursor 200 successfully, then run B starts at 200, finishes at 300 but is a failure, when run C starts it will receive “200”, not “300” as its cursor value.

Example Usage

public List<RecordInFlight> fetchRecords(LinkContext context, Object scopeObject) {

        Integer messageNumber = 0;
        if(context.lastSuccessfulCursor != null) {
                messageNumber = Integer.valueOf(context.lastSuccessfulCursor);
        }

        List<Message> messages = fetchMessagesSince(messageNumber);
        List<valence.RecordInFlight> records = new List<valence.RecordInFlight>();
        for(Message message : messages) {
                records.add(buildRecordFromMessage(message));
                valence.RunContext.currentContext().setCursor(message.messageNumber);
        }

        return records;
}

Runs and Full Runs

Delta syncs are automatic and you don’t have to do anything to keep them going. A few things to know:

Even though delta syncs are the default, the first run for a new Link always fetches all data (since it has no lastSuccessfulSync timestamp!). So typically your first run on a new Link gets everything, and then we grab small sips from there.

You can see the timestamp that was used as the start of the fetch window on the Sync Event Summary screen.

../_images/sync_event_summary_timestamps.png

There are two buttons on the Link Dashboard screen that allow you to run the Link immediately.

Run Now - Do a normal Link run right now, which will be a delta sync
Full Run - Do a special link run right now that will ignore lastSuccessfulSync and fetch all records regardless of last modified date

We recommend doing a Full Run if:

You’ve made changes to metadata or mappings and need fields from the source records you haven’t fetched before
You think you’ve had some data loss, or you’ve inadvertently deleted records from your target that you didn’t want to
Your business has some kind of data reconciliation process where they re-sync all their records on some schedule (quarterly, yearly, etc)

Note

If you don’t see the Run Now and Full Run buttons on the Link Dashboard screen, it’s because this Link uses a data source that does not fetch records. Some Links are designed to accept real-time record pushes and don’t control when or how they receive data.