Introducing the all-new TIBCO Community site!

For current users, please click "Sign In" to reset your password and access the enhanced features. If you're a first-time visitor, we extend a warm welcome—click "Sign Up" to become a part of the TIBCO Community!

If you're seeking alternative community sites, explore ibi, Jaspersoft, and Spotfire.

Jump to content
  • Analyzing upgrade scenarios to determine best practice for upgrading LogLogic® Log Source Packages to minimize maintenance time


    Manoj Chaurasia

    Table of Contents


    Back to HomePage


    TIBCO LogLogic® provides the industry's first enterprise-class, end-to-end log management solution. Using LogLogic® log management solutions, IT organizations can analyze and archive log and machine data for the purpose of compliance and legal protection, decision support for security remediation, and increased system performance and improved availability of overall infrastructure.


    Introduction

    This article documents the primary methods of performing an LSP upgrade of a LogLogic® LMI High-Availability (HA) cluster and compares them with respect to how long each takes to complete and how much data loss will occur. Other considerations affecting the upgrade methodology are not the primary focus. As you'll notice, the scenarios are fairly similar. Only two stand out as being the best with respect to how long it takes to complete the procedure but they're all equal with respect to how much data is lost. There isn't any scenario in which log loss can be completely prevented simply because log synchronization is not bi-directional but this article will show the analysis supporting how to choose the best upgrade procedure. More analysis is provided below in the conclusion section.

    Scenario list

    Here is the list of scenarios analyzed in this article:

    1. Upgrade simultaneously without disabling HA, keeping the same role for each node (recommended)
    2. Upgrade master first without disabling HA, keeping the same role for each node
    3. Upgrade vice master first without disabling HA, keeping the same role for each node
    4. Upgrade master first without disabling HA but switching roles (not applicable)
    5. Upgrade vice master first without disabling HA but switching roles
    6. Upgrade simultaneously without disabling HA but switching roles. (not applicable)

    Of the scenarios above, starting with LSP 34 Release Notes recommend scenario 1 to describe the upgrade process in an HA configuration.

    For this article, it is important to understand log loss, when it can occur and how to quantify it. Log loss can occur at different points in time when using LogLogic® LMI. It can occur during log collection, one reason for which is that the cluster's virtual IP address is not bound to any network interface as a result of the system is unavailable. And it can occur during the initial data migration phase in a High Availability cluster. Despite the use of HA, log loss can occur because the initial data migration phase is not bi-directional. As a result, LMI will delete any data on the vice master that the master node does not also possess. We call these 2 points of log loss "log loss at collection" and "log loss at synchronization", respectively. Keep this in mind as you read the scenarios below, especially when we calculate the amount of log loss based on how much time elapses. "Total log loss" documented below includes log loss at the collection and at synchronization.

    Note: This article cannot include the calculated amount of data lost as measured in bytes or messages because that is dependent on the characteristics of each environment. We can only quantify log loss as measured by time for the purposes of this article. These are estimations because any given version upgrade can introduce additional tasks that cause the upgrade to take longer so it's best to view these duration estimates as relative to each scenario for any given LMI version rather than absolute durations.

    Conclusion

    As stated above, the analysis in this article focuses on log loss however, any given organization's needs and preferences may dictate focusing on other considerations as well. For example, log loss may be important but the continuity of operations is a top priority. In that situation, the final recommendation below would not suffice because if an upgrade failure occurs that affects both systems then there will not be any node remaining that is sufficiently healthy to maintain operations until the affected node is operational.

    Because log loss occurs for any scenario the first question is which scenario allows the smallest amount of log loss? Using that criteria they are all still equal.

    The next criteria is which scenario allows for the smallest amount of downtime. Based on that ranking the preferred order is scenario 1 (2.5 minutes), 2 and 5 (tied at 5 minutes) then 3 (8.5 minutes).

    Scenario 1 stands out as the best due to having the shortest downtime period because the systems are upgraded in parallel rather than serial. The parallelism benefit also occurs when upgrading LogLogic® LMI itself as documented here.

    Another criteria that can be used if so desired is the # of failovers. In the least to most failover quantity order are scenario 5 which requires 0 failovers, scenario 5 which requires 1 failover, and scenarios 2 and 3 which each requires 2 failovers. Even though scenario 1 doesn't trigger failovers it must be reiterated that data loss (at collection) still occurs because the VIP is not owned by any node during the upgrade period.

    In summary, scenario 1 is the best scenario both for minimizing the # of failovers and minimizing the total amount of downtime.

    ** Collection Status as used below is based on the status after the step has completed executing. Therefore the collection status indicated is the status going into the next step of the procedure unless otherwise noted.

    Additional Scenario Details

    Scenario 1 Upgrade simultaneously without disabling HA, keeping the same role for each node

    Screenshot2022-11-06at9_52_18AM.thumb.png.fdf62d37786557a4535e81572bde997b.png
    • Total log loss: About 2.5 minutes
    • Total maintenance time: About 2.5 minutes, not including the duration of LSP's pre-requisite checks, initial data migration, or rundbm

    The remaining scenarios simply document other upgrade procedures users have employed in the past and are listed here for comparison purposes.

    Scenario 2 Upgrade master first without disabling HA, keeping the same role for each node

    Screenshot2022-11-06at9_52_26AM.thumb.png.60cc6feab0d9943beda731c9a8195716.png
    • Total log loss: About 2.5 minutes
    • Total maintenance time: About 5 minutes, not including the duration of LSP's pre-requisite checks, initial data migration, or rundbm

    Scenario 3 Upgrade vice master first without disabling HA, keeping the same role for each node

    Screenshot2022-11-06at9_52_34AM.png.f967c5708d4bb79e73e7610a75802227.png
    • Total log loss: About 2.5 minutes
    • Total maintenance time: About 8.5 minutes, not including the duration of LSP's pre-requisite checks, initial data migration, or rundbm

    Scenario 4 Upgrade master first without disabling HA but switching roles

    This scenario is not applicable because with HA not being disabled there is no need to re-enable it and it's during the re-enabling of HA when the roles are switched. Upgrading the master first will not in and of itself force a role switch in the final state.

     

    Screenshot2022-11-06at9_52_42AM.thumb.png.dce2a7ea2a975aed3f155b0cb6442353.png

    • Total log loss: N/A
    • Total maintenance time: N/A

    Scenario 5 Upgrade vice master first without disabling HA but switching roles

    Screenshot2022-11-06at9_52_50AM.thumb.png.2fd8e08b743f5539fa3070dd91c63053.png
    • Total log loss: About 2.5 minutes
    • Total maintenance time: About 5 minutes, not including the duration of LSP's pre-requisite checks, initial data migration, or rundbm

    Scenario 6 Upgrade simultaneously without disabling HA but switching roles

    This scenario is not applicable because the nodes automatically elect to possess the same role when they leave the HA as when they join therefore the roles will not switch automatically. Switching roles would require disabling HA but that contradicts the purpose of this scenario.

     

    Screenshot2022-11-06at9_52_57AM.thumb.png.6ddb9e2b8aa9c46c507c353546b1260b.png

    • Total log loss: N/A
    • Total maintenance time: N/A

    Other Resources


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...