Jump to content

I wanted to share some scripts I wrote to solve a problem we were faced with from time to time on BW and BE engines.


Chris Christian 2

Recommended Posts

Background:

I wanted to share some scripts I wrote to solve a problem we were faced with from time to time on BW and BE engines. Upon starting up a BW or BE server, we needed a fast way to start up just the engines that had been running prior to taking the server (or TIBCO components) down. Not every engine that is deployed is running, sometimes they are staged, sometimes they are not running for what ever reason, or sometimes we want to bring up a specific list of engines (e.g. to support a small subset of apps that have signed up to do DR Testing). Obviously, there are many reasons you can't just do a select all, and then start; as this may start up an engine that was intentionally left in a non-running state. As a Best Practice take away, it is true that perhaps an official policy on only having engines deployed to BW and BE that are running would resolve this, but that would be a hard sell here and in other shops.

 

Problems/Challenges:

 

During maintenance procedure at some point, we needed to be able to quickly restore ALL BW/BE engines that were running before. In the past, we would do screen shots of the TIBCO Admin screens for the server, showing the state of each.

During a break/fix outage of the server for what ever reason, when it comes back up we needed to be able to quickly restore ALL BW/BE engines that were running before the server crashed.

During DR Testing, we would know days in advance which apps (and thus which engines) will be brought up for testing. We needed a way to generate a list of engines (per BW/BE) that needed to be brought up for DR testing once infrastructure was up.

 

 

Solution:

I wrote a couple scripts to determine which engines were running (or manually create our own) and allow us to start up everything in that list. Here is a high level of the 4 scripts:

 

generate a list of running BW or BE engines and out put to a file (this is scheduled via crontab to run every 4 minutes and does multiple versions). the output of this script is a list of engines (full path and name)

generate an execution script, that can be executed to restart all of the engines (default is to take current version of running list, but can be giving a path/filename to another list), The output of this script is a finished executable script.

**NOTE** this script has additional things it can do for BE engines such as look for the word cache and alphabetically sort so that cache starts before inference, etc; as well as add a 1, 2 or more minute delay if needed

a start script that starts up all of the engines in the list, doing a nohup and putting the processes in the background and at the end killing all instances of notty (and TIBCO), to clean up after the script is done

 

 

If there is any interest beyond this, I could be convinced to share the shell script code. ;-)

Link to comment
Share on other sites

In contrast, now that I have this wonderful command. I am going to redo both our running list generator, the stop function and the start functions. Luckily I have all of our scripts pointing to our main library and have the function calls pretty modular. I should be able to pull out my current techniques and allow our several different scripts that perform tasks to continue to work as intended ;-)

I'm writing a parser now to build some kind of an output file that our scripts can utilize. Thanks again Prashant, my whole team loves this new command. I'm wondering now how many other commands are already on our servers that we just haven't tapped into yet. Hmmmmmm, sounds like some future posts to write ;-)

Link to comment
Share on other sites

As info, when I get a minute this week, I will write another similar post to this one. Once I had the above for stopping, the next logical scripts were for stopping. This then lend to a reusable framework set of scripts for doing simple maintenance tasks that require bringing down all TIBCO components. I call it a framework, because it has a reusable component and gives a proper structure for performing maintenance.

Combine this script with the fact that all of our BW/BE is either load balanced or fault tolerant and all of our EMS is fault tolerant, if done correctly, you can achieve a "rolling maintenance" ability. By this I am implying that there is no outage of an app or loss of messaging or services; when performing simple maintenance. AND it orchestrates it in a prescribed way, such that any admin can perform the tasks and get the same results.

At a high level this framework:

- preps the server for all TIBCO components going down

- determines the current state (what BW engines are running, what BE engines are running, what EMS instances are running, etc)

- stops all TIBCO components

- verifies all TIBCO components are down and waits for user to validate that

- performs the maintenance you need to do (obviously this would not be changing major versions) , then verify maintenance done correctly

**This is the main place the new script is created, everything before this and after this is reused as part of the framework**

- start all TIBCO components back up utilizing the state awareness that was done before the maintenance, returning it back to the same state

- verify that the state of all previously determined components are back to the same state they were before and wait for the user to validate that

- cleans up after the maintenance (moves output files to central location, etc)

-

Link to comment
Share on other sites

  • 1 year later...

This is very interesting, indeed!

Since version 5.7.4, TIBCO TRA bundles AppStatusCheck utility that you can run to export the status of applications deployed in a domain to an xml file. It's located within tibco_home/tra//bin.

You can find more details about it at: TIBCO Administrator --> Admin User's Guide --> Chapter 6 Command Line Utilities --> AppStatusCheck

https://docs.tibco.com/pub/administrator/5.9.0/doc/html/wwhelp/wwhimpl/j...

As always, all suggestions are welcome to enhance this utility.

Thanks,

Prashant

Link to comment
Share on other sites

Prashant, THANK YOU! We had racked our brains how to get Application Name from the command world so that we could use AppManage commands to stop and start. I just checked that AppStatusCheck command out and it outputs EXACTLY what we were wanting (Application Name, Service Name (and multiples if there are any within the ear), the Deployment Status, Service Instance Name/Machine Name/Status for each one, etc.

I'm writing a shell script now to parse through this output and build us an xml file or delimited file as out put. This will run every couple of minutes (say 5) and build a file for us that can be then used to safely stop, or start BW/BE apps using AppManage...

What I had done before this was to output the PID's of all running BW or BE engines and then issue a straight kill command. When we were testing how to do this we carefully performed these tasks in Admin GUI and AppManage and noted the entries in the logs. Then tried kill -9, kill -3, kill, etc until we found one that did the "same thing" (as far as we could tell) when performing a "stop" in the Admin GUI which was a straight kill. Then we did the kill option in the admin gui and found it to be the same as a kill -3. And of course from time to time there are instances we can't even stop or kill an app via the Admin GUI and have to resort to a kill -9.

To do a start of the BW/BE engines, I was using the ps -ef command and grepping out /opt/TIBCO/bw or /opt/TIBCO/be and the doing an AWK on the next to the last column, which just happens to be the full path and file name to the tra file for the engine in memory. I used crontab to run this every 4 minutes on each server. I then removed the .tra and replaced with a .sh which just happens to be the very script used to start the engine up. Next build a file with all these entries adding a nohup before and a "&" at the end to make this a background instance. then as cleanup I noticed there was a "NOTTY" PID still running from the initial running of the scripts, that I kill. You are left with all of the same BW/BE engines that were running before.

The key was to create a library that all of our scripts use and then "componentize" each function call (such as stop, start, stop all, start all, stop , start , status, etc, etc). So now, when I pull out my old logic in our library and put in the new, there are several scripts that should function as they currently do. I also use a pull logic to manage always making sure each script has the latest library file in the .sh

Here is an example of one of our deployed apps across 4 BW servers and 4 BE servers in QA using this command AppStatusCheck

Application Name: ITMS/Notifications

Service Name: ITMS-Notifications.par

Deployment Status: Success

Service Instance Name: ITMS-Notifications-pixie

Machine Name: pixie

Status: Running

Service Instance Name: ITMS-Notifications-genie

Machine Name: genie

Status: Running

Service Name: ITMS-SearchIndexCleanup.par

Deployment Status: Success

Service Instance Name: ITMS-SearchIndexCleanup-pixie

Machine Name: pixie

Status: Running

Service Instance Name: ITMS-SearchIndexCleanup-genie

Machine Name: genie

Status: Running

Service Name: ITMS-CleanupNotifications.par

Deployment Status: Success

Service Instance Name: ITMS-CleanupNotifications-genie

Machine Name: genie

Status: Running

Service Instance Name: ITMS-CleanupNotifications-pixie

Machine Name: pixie

Status: Running

Link to comment
Share on other sites

  • 9 months later...
  • 11 months later...
  • 1 year later...
  • 7 months later...
  • 4 years later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...