How to Reprocess Solution Errors

Overview

Failures, errors, and outages are unavoidable parts of any technical system. Of course, as engineers, we should do our best to design solutions with failures in mind. Regardless of our best intentions and planning, situations sometimes come up we had not anticipated, which makes elegant recovery difficult. All we can do is re-attempt and hope that connectivity is restored. One such example of this is the so-called heisenbugs.

The Connect capability of TIBCO Cloud Integration provides the ability to reprocess failed records. When an execution fails with record errors, a copy of each source record with an error is stored, either in the cloud or locally in the on-premise agent database. It gives us the ability to retry the processing of these failed records.

In this article, we will show you how we can automate reprocessing of solution errors with the help of the Scribe Platform API Connector.

Short on time? Check out this video on how to reprocess solution errors!

Use Case

Consider the case when you have an unstable connection to one of your source or target systems in a solution. We want to automate reprocessing of all failed records in this solution.

Prerequisites

As a prerequisite, you should have one unstable solution. For demo purposes let?s use solution with a single map as follows:

This map will only succeed in 50% of the cases. Let?s see why:

We?re using a fictional entity called SelectOne from Scribe Labs Tools Connector. It just provides a single row with current datetime in it. It can be very handy if you just want to start the map without querying an external data source.
IF block checks the seconds part of current datetime using DATEPART function and compares it with 30 (here we get 50% success rate)
- You can replace 30 with another value if you want a different success rate
- We?re using GETUTCDATETIME function to get current datetime instead of UtcNow property, because in the latter case TIBCO Cloud Integration will use the same datetime value during reprocessing. This leaves no chance of successful reprocessing. However, GETUTCDATETIME will always provide current datetime.
In the ELSE clause, we put an Execute command with a Dates entity ? which will always fail because we put invalid values to target connection fields

After you finish with the map you should keep in mind Id and OrganizationId of this solution (you can get them from the URI). In this article, I will use the following values:

OrganizationId = 3531
SolutionId = ?6c6bac38-4447-4ce3-a841-8621a3f72f9b?

Also, I encourage you to check the Scribe Labs Tool Connector. It provides other useful blocks such as SHA1, which can help with GDPR compliance in some cases.

Iteration #1: Getting solutions with errors

The execution history of the solution can be retrieved both from the API directly, or from an external system as shown in a previous article. For simplicity, I will use the first approach since it doesn?t require any additional connectors:

A few notes about the map above:

We want to reprocess only the latest solution history, that?s why:
- Query block sorting histories by Start column are in descending order
  - Possible values for ExecutionHistoryColumnSort and SortOrder columns can see in API tester
- We use Map Exit block to guarantee to reprocess of no more than one execution history
We want to reprocess only the histories that contain errors.
- For this reason, we?re using If/Else control block which filters out histories by the Result value
- If you want to distinguish reprocess only fatal and/or record errors you can change the condition

Iteration #2: Marking solution errors for reprocessing

To reprocess errors, first, we should mark all the errors for reprocessing. Scribe Platform API provides two REST resources to accomplish this task:

POST /v1/orgs/{orgId}/solutions/{solutionId}/history/{id}/mark
- Mark all errors from the solution execution history for reprocessing
POST /v1/orgs/{orgId}/solutions/{solutionId}/history/{historyId}/errors/{id}/mark
- Mark particular errors from the solution execution history for reprocessing

Currently, the Scribe Platform API connector supports only the first resource via MarkAllErrors command.

Iteration #3: Reprocessing solution errors

The next step after marking all the errors is reprocessing. We will use ReprocessAllErrors command block, which will reprocess all marked errors from solution execution. Important note from documentation: this command will be ignored if the solution is running.

Iteration #4: Retries

If you want to have more attempts for solving errors by reprocessing, we can add retry logic into the map itself. However, it will require refactoring our map a bit.

Notable changes:

We added a Loop with and If/Else control block which uses SEQNUM function as a retry counter
- As an alternative to SEQNUM function you can try to use Scribe Labs Variables Connector
On every retry, we want to work with the latest Execution History record. That?s why the initial root block decomposed into two:
- The new root query block which works with Solutions
- Lookup History block which will retrieve the latest possible history record

Iteration #5: Truncated Exponential Backoff

From the other side, straightforward retries can be one of the sources of accidental Denial-of-Service. It?s a classic example of ?The road to hell is paved with good intentions? anti-pattern.

To avoid this pitfall we can implement truncated exponential backoff algorithm. It?s not as hard as it sounds. The idea here is to exponentially increase the delay time between retries until we reach the maximum retry count or maximum backoff time.

Optionally, we can add some amount of randomness when we compute value of delay time, but it?s not needed for our case.

At the time of writing the Connect capability of TIBCO Cloud Integration doesn?t support POW function (you can check that here). But we can emulate it with precomputed Lookup Table Values since we know all the possible retry counter values. This is so-called memoization.

And here?s the updated map:

Notable Changes:

We used the Sleep block from Scribe Labs Tools Connector for suspending the work of the map
SEQNUM function was replaced by SEQNUMN function
- We created ?RetryCounter? named sequence, with which we can work in any further map blocks
With the help of SEQNUMNGET we can peek the current value of our named sequence without increment (just as with stack!)
LookupTableValue2 function gets precomputed, resulting a power of 2 from according Lookup Table

Summary

In this article we learned:

How to mark and reprocess all errors from particular solution execution with help of Command block from Scribe Platform API Connector
How to implement retries with exponential backoff to prevent accidental Denial-of-Service
- Sleep block helped us with pausing the solution
- With Lookup Tables we overcame the absence of POW function

Sign In

How to Reprocess Solution Errors

Overview

Use Case

Prerequisites

Iteration #1: Getting solutions with errors

Iteration #2: Marking solution errors for reprocessing

Iteration #3: Reprocessing solution errors

Iteration #4: Retries

Iteration #5: Truncated Exponential Backoff

Summary

Table of contents

User Feedback

Recommended Comments

Create an account or sign in to comment

Create an account

Sign in

Activity