How to Design a Complementary Valence API

Valence offers a lot of flexibility in how it talks to external systems, but there are best practices that make for a smoother build or that mitigate common pitfalls.

API Features

Must Haves

There are two important API features that will assist Valence in providing an efficient and scalable integration:

Last modified date as a timestamp filter
Being able to pass a timestamp (date + time) value to the API allows us to get only the records that have changed since the last time records were retrieved. Two timestamps can be used to demarcate the start and stop of a range to retrieve, or a single timestamp can be used and the implication is that the end of the date range is right now. This makes the connection efficient in that we’re only moving records that have changed.
Paginated result sets
Breaking large result sets into smaller batches is a standard way to handle large data volume. This functionality is baked into Valence, for example in the two-step fetch process in the SourceAdapterForPull interface. The two big benefits to baking pagination into your API integration from the start are that you can move the entire data set (for an initial load or a reload) and you can handle unexpected spikes in volume. (The number of times we’ve seen someone dump a static file into a database and it blows up downstream integrations…) Most industry-standard pagination patterns will work with Valence. You can paginate on page numbers or on an offset value, either is fine. Some patterns we’ve seen (all fine with Valence):
1. One endpoint to call to get a count, another to call repeatedly to get pages of records based on the result from the first endpoint (determinate)
2. A single endpoint that returns metadata alongside a result, such as a total count or information about pages that can be used to fetch the remaining pages (determinate)
3. A single endpoint that returns a token or value of some kind alongside a result, indicating that more records exist (but not how many) and where/how to retrieve them (indeterminate)

Should Haves

Valence really shines when it is able to be paired with a dynamic API that can reveal its structure as well as retrieve different data tables based on passed parameters.

Self-describing schema
One thing an external API can offer that, together with Valence, will really empower data admins is an endpoint that describes what tables and fields are accessible. This allows an admin in Salesforce to pick and choose which tables and fields they are interested in, and change their mind down the road with minimal overhead.
Dynamic fetch endpoint
Complementary to a self-describing schema is an endpoint that accepts the table and fields that should be queried, alongside other filters such as timestamp.
Compact data encoding format
In order to fit as many records as possible into each page, it’s best to use a space-efficient exchange format. Some examples, in order from most compact to least compact:
CSV

JSON

XML
Compression
An ideal API supports the Accept-Encoding header so that results can be compressed: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding

Tip

Compressed requests coming from Salesforce to your API use gzip compression, and only accept gzip in return.

Nice to Haves

Additional filters exposed in addition to timestamp
It’s not uncommon for end users to want to filter source data on other fields, such as a type of some kind. Records can be filtered out on the Valence side after reception, but it’s always nice to not transmit them to begin with.

Authentication

Valence supports these authentication methods out of the box:

Standard username + password header authentication
OAuth using the “authorization code” grant type
AWS Signature Version 4
JWT
JWT Token Exchange

You can still use a custom authentication mechanism or use one of the other OAuth grant types, it just requires some additional effort at the Adapter level to bake that in.

Example Ideal API for Fetching Data

To make these recommendations more concrete, here’s a breakdown of an example API written to expose data to Valence/Salesforce that supports our recommended best practices. This API would allow an admin to browse and select whichever tables they were interested in pulling into Salesforce. The entire table would be pulled over in a series of batches/pages, and then incremental delta sync updates would keep Salesforce up-to-date.

Endpoints:

/describe-tables
1. Parameters: None
2. Returns: A list of table representations (just names at a minimum, but always nice to have name, label, description, etc)
/describe-fields
1. Parameters:
  
  tableName - The name of a table that was returned by describe-tables
2. Returns: A list of field representations (just names at a minimum, but always nice to have name, label, description, data type, default value, etc)
/count-records
1. Parameters:
  
  tableName - The name of a table that was returned by describe-tables
  
  start - A timestamp for “last modified” start of range
  
  end - A timestamp for “last modified” end of range
2. Returns: A record count for how many records we would expect to receive were we to call fetch-records
/fetch-records
1. Parameters:
  
  tableName - The name of a table that was returned by describe-tables
  
  start - A timestamp for “last modified” start of range
  
  end - A timestamp for “last modified” end of range
  
  fieldList - A list of field names that come from describe-fields, and is likely a subset of the total fields available (and has been selected by an admin through building mappings)
  
  offset - The number of records to skip into the total result set
  
  pageSize - How many records to return per page
2. Returns: A single page of records from the <tableName> table, selecting the <fieldList> columns, with a last modified date on each record between <start> and <end>. No more than <pageSize> records returned per page, with an offset into the total result set of <offset> records.

Writing To Your API

Valence supports both reading from and writing to external APIs. When writing, it is best if there is a consistent pattern for specifying which table is being written to.

It is highly appreciated when an external API supports bulk writing, i.e. writing more than one record in a single HTTP invocation.

Here is an ideal endpoint for writing to:

/<table_name>

a: HTTP Method: POST

b: Parameters:
1. an array of record objects (in whatever format your API is written in…JSON, CSV, XML, etc)
c: Returns:
1. Success or failure of each record individually, and/or the success/failure of the entire operation
2. Unique identifiers in your system for each record, especially ones that were just created because of this operation