In the latest update to IBM® App Connect, we have added a powerful new area of functionality: batch processing. If you have data in one external system, such as a database, that you want to copy to another system, such as Salesforce or Marketo, then use the Batch process node. The node is optimized to retrieve a large number of records from an application and process them in parallel, and is available in instances of App Connect on IBM Cloud™.
For example, if your customer records are stored in a legacy database and you want to migrate them to a Salesforce system, App Connect can efficiently extract hundreds of thousands of database records and insert them into Salesforce for you, as leads, contacts, or whatever type of record you choose. Or if you want to copy a collection of lead records from one Salesforce system to another, you can use App Connect to extract and copy the leads. When the batch process is complete, you can easily view a report of the number of records that were processed.
You might remember that we already have a retrieve node in App Connect, which lets you retrieve multiple records from a system. You can then process these records by adding a For each node to your flow. The new batch retrieve node works differently: a batch process is an asynchronous process that is initiated by your flow, but does not run as part of it. A batch process completes at a different time to your main flow and many instances of the batch process are scheduled in parallel to maximise throughput. It also has different operational limits to the main flow, so you are less likely to exceed memory constraints when processing many records if you use a batch process, rather than a for each node. For more details on App Connect’s operational limits, see What are the operational limits for App Connect? Finally, the standard retrieve node has a limit of 1000 records, whereas the new batch retrieve node has no limit, other than any that you choose to set.
Note that if you are using the Lite plan, there is a limit on the number of batch processes that you can run concurrently. In the Professional plan there is no limit. Similarly, in the Lite plan there is a limit on the amount of data that you can send to external systems from flows, and in the Professional plan some data usage is included with your flow invocations charge, with additional data being charged for. For more details, look in the IBM Cloud catalog for App Connect’s pricing plans.
Using a batch process
To use a batch process, when you’re creating an event-triggered or API flow, just add an action and select “Batch process” from the “Toolbox” menu. For an event-triggered flow, you could use any application of your choice, but in this example I have chosen to use the Scheduler node, so that I can configure my batch process to run on a particular date or at a regular repeating interval.
You will be prompted to choose which application you would like to retrieve records from, and what type of record you want to retrieve. You can either retrieve all the records matching particular criteria in that system, or limit the number of records that you retrieve.
Then, add more actions or logic to the batch process. Any actions you add will be repeated for each record that was retrieved. For example, if you want to add a lead to a Salesforce system for every lead that you retrieve from a different Salesforce system, select the “Update or create lead” Salesforce action.
Finally, you can add actions to the main flow – for example, in this example I have added a Slack “Create message” action so that I know when my batch process has been initiated. These actions will happen outside the batch process and could complete either before or after the batch process completes. Note that you can’t map data from the batch process back into the main flow.
Choosing idempotent actions
When you’re choosing which actions to include in your batch process, consider what would happen if the request to the external system failed for some reason, and App Connect retried the request. Some application actions are idempotent (can be carried out many times with the same outcome) and others are not (if they were carried out more than once, the system would be in a different state than if the action was carried out only once). For example, if you use the Salesforce “Create lead” action in your batch process, and this action was repeated by App Connect for a particular record, say because the Salesforce system took too long to respond, you could potentially find that you have some duplicate leads when the batch process is complete. Instead, we recommend you choose an idempotent action such as “Update or create lead”, and configure the action so that it will only create a lead if an equivalent lead doesn’t already exist in your Salesforce account. This way, it doesn’t matter whether the action is tried only once, or five times – you will have the exact number of leads that you retrieved from your source system.
Monitoring a batch process
You can see from the App Connect dashboard when a batch process is executing:
And when a batch process is running or has completed, you can see its status by selecting “View batches” from the tile on the dashboard:
You can use the options in the Actions column to stop a batch process and to view the logs for a specific batch process. (For more information, see Viewing IBM App Connect logs in Kibana.)
To learn more about this new functionality, read the tutorial How to use batch processing in IBM App Connect.