Jump to content
  • How to Reprocess Solution Errors


    Manoj Chaurasia

    Screenshot2022-11-14at2_51_13PM.thumb.png.4bc61e95124fd66560e73b7d84f44447.png

    Overview

    Failures, errors, and outages are unavoidable parts of any technical system. Of course, as engineers, we should do our best to design solutions with failures in mind. Regardless of our best intentions and planning, situations sometimes come up we had not anticipated, which makes elegant recovery difficult. All we can do is re-attempt and hope that connectivity is restored. One such example of this is the so-called heisenbugs.

    The Connect capability of TIBCO Cloud Integration provides the ability to reprocess failed records. When an execution fails with record errors, a copy of each source record with an error is stored, either in the cloud or locally in the on-premise agent database. It gives us the ability to retry the processing of these failed records.

    In this article, we will show you how we can automate reprocessing of solution errors with the help of the Scribe Platform API Connector.

    Short on time? Check out this video on how to reprocess solution errors!


    Use Case

    Consider the case when you have an unstable connection to one of your source or target systems in a solution. We want to automate reprocessing of all failed records in this solution.

    Prerequisites

    As a prerequisite, you should have one unstable solution. For demo purposes let?s use solution with a single map as follows:

    Screenshot2022-11-14at2_52_24PM.thumb.png.f6a9b4367b8ad6cb1e37d6bf972238f1.png

    This map will only succeed in 50% of the cases. Let?s see why:

    • We?re using a fictional entity called SelectOne from Scribe Labs Tools Connector. It just provides a single row with current datetime in it. It can be very handy if you just want to start the map without querying an external data source.
    • IF block checks the seconds part of current datetime using DATEPART function and compares it with 30 (here we get 50% success rate)
    • In the ELSE clause, we put an Execute command with a Dates entity ? which will always fail because we put invalid values to target connection fields

    After you finish with the map you should keep in mind Id and OrganizationId of this solution (you can get them from the URI). In this article, I will use the following values:

    • OrganizationId = 3531
    • SolutionId = ?6c6bac38-4447-4ce3-a841-8621a3f72f9b?

    Also, I encourage you to check the Scribe Labs Tool Connector. It provides other useful blocks such as SHA1, which can help with GDPR compliance in some cases.

    Iteration #1: Getting solutions with errors

    The execution history of the solution can be retrieved both from the API directly, or from an external system as shown in a previous article. For simplicity, I will use the first approach since it doesn?t require any additional connectors:

    Screenshot2022-11-14at2_52_34PM.thumb.png.36172b35e74e50ca4508a9901fc0f07a.png

    A few notes about the map above:

    • We want to reprocess only the latest solution history, that?s why:
      • Query block sorting histories by Start column are in descending order
        • Possible values for ExecutionHistoryColumnSort and SortOrder columns can see in API tester
      • We use Map Exit block to guarantee to reprocess of no more than one execution history
    • We want to reprocess only the histories that contain errors.
      • For this reason, we?re using If/Else control block which filters out histories by the Result value
      • If you want to distinguish reprocess only fatal and/or record errors you can change the condition

    Iteration #2: Marking solution errors for reprocessing

    To reprocess errors, first, we should mark all the errors for reprocessing. Scribe Platform API provides two REST resources to accomplish this task:

    • POST /v1/orgs/{orgId}/solutions/{solutionId}/history/{id}/mark
      • Mark all errors from the solution execution history for reprocessing
    • POST /v1/orgs/{orgId}/solutions/{solutionId}/history/{historyId}/errors/{id}/mark
      • Mark particular errors from the solution execution history for reprocessing

    Currently, the Scribe Platform API connector supports only the first resource via MarkAllErrors command.

    Screenshot2022-11-14at2_52_44PM.thumb.png.8ab9aa5a5b5828d7007bc172e857c58f.png

    Iteration #3: Reprocessing solution errors

    The next step after marking all the errors is reprocessing. We will use ReprocessAllErrors command block, which will reprocess all marked errors from solution execution. Important note from documentation: this command will be ignored if the solution is running.

    Screenshot2022-11-14at2_52_53PM.thumb.png.c21e741f89690e25fc4a5e82bbedc2c6.png

    Iteration #4: Retries

    If you want to have more attempts for solving errors by reprocessing, we can add retry logic into the map itself. However, it will require refactoring our map a bit.

    Screenshot2022-11-14at2_53_21PM.thumb.png.6b73d21a92b1b3af5acbbf9a3bef71ad.png

    Notable changes:

    • We added a Loop with and If/Else control block which uses SEQNUM function as a retry counter
    • On every retry, we want to work with the latest Execution History record. That?s why the initial root block decomposed into two:
      • The new root query block which works with Solutions
      • Lookup History block which will retrieve the latest possible history record

    Iteration #5: Truncated Exponential Backoff

    From the other side, straightforward retries can be one of the sources of accidental Denial-of-Service. It?s a classic example of ?The road to hell is paved with good intentions? anti-pattern.

    To avoid this pitfall we can implement truncated exponential backoff algorithm. It?s not as hard as it sounds. The idea here is to exponentially increase the delay time between retries until we reach the maximum retry count or maximum backoff time.

    Screenshot2022-11-14at2_53_34PM.png.059cdd350e7c8685cf38949458f2255a.png

    Optionally, we can add some amount of randomness when we compute value of delay time, but it?s not needed for our case.


    Screenshot2022-11-14at2_53_42PM.png.f85f3692fd78323d0fad47cf48e98e55.png
     

    At the time of writing the Connect capability of TIBCO Cloud Integration doesn?t support POW function (you can check that here). But we can emulate it with precomputed Lookup Table Values since we know all the possible retry counter values. This is so-called memoization.

    Screenshot2022-11-14at2_53_51PM.thumb.png.0cde9ff54206bd4c17adbad1c1dd517c.png

    And here?s the updated map:

    Screenshot2022-11-14at2_54_19PM.thumb.png.590859cfcb183e819d1e998e126850a9.png

    Notable Changes:

    • We used the Sleep block from Scribe Labs Tools Connector for suspending the work of the map
    • SEQNUM function was replaced by SEQNUMN function
      • We created ?RetryCounter? named sequence, with which we can work in any further map blocks
    • With the help of SEQNUMNGET we can peek the current value of our named sequence without increment (just as with stack!)
    • LookupTableValue2 function gets precomputed, resulting a power of 2 from according Lookup Table

    Summary

    In this article we learned:

    • How to mark and reprocess all errors from particular solution execution with help of Command block from Scribe Platform API Connector
    • How to implement retries with exponential backoff to prevent accidental Denial-of-Service
      • Sleep block helped us with pausing the solution
      • With Lookup Tables we overcame the absence of POW function


    User Feedback

    Recommended Comments

    There are no comments to display.



    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now

×
×
  • Create New...