Batch processes are optimized for handling much larger volumes of data than the standard retrieve action in IBM® App Connect. You can use a batch process to copy records from one external system to another, or to carry out an action that needs to be repeated many times. You might also want to use a batch process in your flow if you want to retrieve more records than the retrieve action will allow.
A batch process differs from the standard retrieve action because it’s an asynchronous process that’s initiated by your flow, but doesn’t run as part of it. A batch process completes at a different time to your main flow. It also has different operational limits to the main flow, so you’re less likely to exceed memory constraints when processing many records if you use a batch process, rather than a For each node. For more details on App Connectâ€™s operational limits, see What are the operational limits for App Connect? The standard retrieve action has a limit of 1000 records, whereas the new batch retrieve node has no limit, other than any that you choose to set.
We’ve provided a Batch process node that you can use with some applications, including Salesforce. This guide shows you how to use the Batch process node.
If you can’t find what you want, or have comments about the “how to” information, please either add comments to the bottom of this page or send us comments by email.
What should I consider first?
Here are some things to consider when using this Batch process node:
- The Batch process node is available in the App Connect on IBM Cloud™ (Bluemix) service only. This means that you can take advantage of the logs that are available with IBM Cloud.
- The number of batch processes that you can run depends on the plan that you’ve purchased. If you try to run more batch processes concurrently than your plan will allow, you’ll see a message in the IBM Cloud logs that says something like “The batch process could not be started because the maximum number of concurrent running batch processes for the instance has been reached”.
- You’ll be billed for the flow invocation, plus any outbound data that your flow sends to external applications.
- When youâ€™re choosing which actions to include in your batch process, consider what would happen if the request to the external system failed for some reason, and App Connect retried the request. Some application actions are idempotent (they can be carried out many times without different outcomes) and others are not (if they were carried out more than once, the system would be in a different state than if the action was carried out only once). For example, if you use the Salesforce Create lead action in your batch process, this action could be repeated by App Connect for a particular record, say because the Salesforce system took too long to respond. Because Create lead is not an idempotent action, you could potentially find that you have some duplicate leads when the batch process is complete. Instead, we recommend that you choose an idempotent action such as Update or create lead, and configure the action so that it will only create a lead if an equivalent lead doesnâ€™t already exist in your Salesforce account. This way, it doesnâ€™t matter whether the action is tried only once, or five times – you will have the exact number of leads that you retrieved from your source system.
Find or create everything you need:
- An App Connect service in IBM Cloud.
- A Salesforce account with some data that you can retrieve.
If you want to create a free Salesforce account to test out App Connect, make sure that you create a Developer account rather than a Trial account. If you connect to App Connect with a Trial account, the Salesforce events don’t work.
- The user names and passwords for any other accounts that you want to access in your flow (if you haven’t already connected App Connect to your accounts).
You can connect to your accounts now on the Applications tab on the Catalog page, or you can connect as you add each application to your flow.
Some applications need some extra information to be able to connect to App Connect; if you need help finding this info, see “How to” guides for apps.
Next, create your flow:
(App Connect automatically saves your changes as you go. If you move away from the flow at any point, the flow is saved as a draft flow that you can come back to later.)
- Log in to your App Connect service in IBM Cloud.
- From the Dashboard, click New > Event-driven flow.
(You could also create an API flow, depending on how you want to invoke your batch process.)
- Enter a name that identifies the purpose of your flow.
- Select your first application (source), then select the event that’ll trigger the actions in the rest of your flow.
For example, you might want to copy leads from one Salesforce account to another, and to run this flow on a specific date. In this case, you would choose the Scheduler Schedule flow event to start your flow.
- To add a batch process action to your flow, click the plus icon, click Toolbox, then select Batch process.
This creates a branch off the main flow with a Batch process box, which will contain all the actions that make up your batch process.
- Select the application that you want to extract data from, then expand the object that you want to retrieve, and select the retrieve link for that object. For example, if you want to retrieve Salesforce leads, select Retrieve leads.
- If you want to retrieve records that meet certain criteria, add one or more conditions. Alternatively, if you want to retrieve all records, delete the condition by clicking the cross to the right of the condition field.
- Optional: To set a maximum number of records to retrieve, select Specify maximum number of items to retrieve. You can either type in a number or click the icon to set the limit to the maximum possible number of records.
- Optional: To define an ID that identifies each record in the batch in the log, select Specify a log ID for each record of the batch and define the ID.
Messages are written to the log if you’ve included a Log node in your flow or if an error occurs during the batch process. To identify specific records that haven’t been processed successfully, you can define the ID that appears in the log. Make sure that the ID is unique for each record in the batch, and it should be a maximum of 256 characters long. (If the ID is longer than 256 characters, it’s truncated in the log.) The ID can consist of fields mapped from the source application, free text, and JSONata expressions. For more information about creating JSONata expressions, see JSONata.
In the following examples, free text has been used to identify the main batch process in a flow and a nested batch process, which happens to be a batch completion flow. The ID also contains mappings to the Lead ID and first and last name fields:
If you’ve used a Log node in your flow, or errors occur during batch processing, the ID that you’ve defined appears in the batch-record-id_str column of the Kibana log.
- To process the records that you’ve retrieved, you can add more actions or logic after the retrieve action. In the Batch process box, click the plus icon, expand a target application, then select an action. All actions that you add within the Batch process box will be completed for each record that has been retrieved at the beginning of the batch process.
For example, you might select the Salesforce Update or create lead action to create a lead in a second Salesforce account.
The Update or create lead action is a better choice than the Create lead action for this kind of data copy process because it’s idempotent and will therefore reduce the chances of duplicate leads being created.
- Map the appropriate information between your retrieved records and your action.
Applications that are part of the batch process are prefixed with “Batch process”.
- Optional: You can add one or more batch completion actions at the end of the batch process, which can complete actions based on the status of the batch process. Make sure that you add any batch completion actions after the batch completion icon in the batch process flow: . In the following example, an If node has been added after the batch completion icon, with different actions that are completed depending on whether the batch process has completed successfully, has failed, was stopped, or timed out.
- You can also add actions to the main flow, outside the batch process. These actions happen after the batch process has started, but run independently of the batch process, so could finish either before or after the batch actions complete. You can’t map data from the batch process branch into actions in the main flow.
For example, you could add a Slack Create message action that posts a message on your chosen Slack channel to tell you that a batch process has been triggered.
- When you’ve finished defining your flow, open the options menu [⋮] in the banner and click Start flow. Then click Dashboard to exit the flow.
Your flow is represented by a tile on the dashboard. The tile shows whether a batch is currently running:
You can use options in the Actions column to stop a running batch or toÂ view the logs for a particular batch process in Kibana. (For more information, see Viewing IBM App Connect logs in Kibana.) Note that batch status information is cleared when you stop the flow. Stopping a flow also stops any running batch processes. If errors occur in a batch process, and you’ve defined an ID for each record in the batch process, you can find your defined ID in the batch-record-id_str column of the Kibana log, and therefore identify any specific records that weren’t processed successfully.
You can also use options in the Actions column to pause, then later resume, a running batch extraction process. The pause and resume actions apply to extraction (retrieving records) from the source system (you cannot pause and resume the processing of those retrieved records). When you resume, the batch process continues extracting records from the point at which it was paused. If a batch process fails to extract a record, the extraction process is paused automatically. After a set time, the batch extraction automatically resumes from the point at which it was paused. This is useful if the extraction failed because of network issues, or if you hit rate limits on the application from which you extract, for example. The extraction process can be paused and resumed continuously until the process times out. You can also manually resume an automatically paused batch extraction by using the menu on the dashboard tile.