Jump to content
  • Troubleshooting an TIBCO® EMS client (TIBCO ActiveMatrix BusinessWorks?) that stops processing messages from an TIBCO® EMS queue


    Manoj Chaurasia

    Basic TIBCO® EMS Checks

    The following are some basic checks that can be performed to determine why a client stops processing messages from a TIBCO EMS queue. Each case may vary depending on process design and other factors.

    Check that the TIBCO ActiveMatrix BusinessWorks? client connection to the EMS server is okay and that there are no related reconnect issues due to network failure, etc. You can run the following commands from the tibemsadmin tool to check this.

    show server

    Provides statistics on the total number of queues, topics, pending messages on the server, memory usage, etc.

    show connections

    Displays client connections and sessions

    show connections version

    Shows the EMS client libraries used by the client.

    It is recommended that the client libraries be on the same version as the EMS server to take advantages of the latest defect fixes and enhancements.

    For issues specific to a queue

    show queue

    Gives queue statistics, like the number of receivers on the queue, pending messages, prefetch

    show consumers

    Shows consumers for the queues and topics.

    For issues specific to a topic.

    show topic

    Gives statistics on a topic

    show durables

    Gives statistics on durable subscribers

    show durable

    Gives statistics on a durable subscriber

    Running these basic EMS commands will give some more information on the type of connections, the number of receivers, acknowledgment mode, etc.

    Check for design issues

    If you have multiple "Wait for JMS message" activities within the same process or the same application listening on the same queue, this will create multiple receivers for that particular queue. When the BusinessWorks engine starts, if one listener picks up the messages and stores them in process memory there is no way the other listener will be able to receive those messages. This is the default behavior when using ?Wait for JMS message? activity. In such scenarios, where you need to have multiple listeners on the same queue, the correct design would be to use the ?GetJMSQueueMessage? activity for which the filtering is being done by the EMS server. It is the EMS server that determines if the message can be delivered to a particular receiver.

    While using ?JMS Queue Receivers? and ?JMS Topic Subscribers? in one BusinessWorks engine, you should use separate JMS Connection resources for accessing the queues and topics, else BW engine might create duplicate durable subscribers and the BusinessWorks Topic Subscriber may stop consuming.

    Check for configuration issues

    If the acknowledge mode of the receiver is ?client? and max sessions is set to ?1?, when one JMS queue message is received the session will block until the message is acknowledged and no further messages will be received on that session. From within the admin GUI check if any previous job is still at ?confirm activity? or has errored before the confirm activity.

    When using the Flow control mechanism, check the status of the process starter from within the Admin GUI. If the status is FLOW_CONTROLLED, the engine will not be able to process further messages until it comes out of the FLOW_CONTROLLED state.

    Check the version of the EMS client libraries

    (show connections version/show connections full) and check if the client libraries need to be upgraded. The EMS client libraries are shipped as part of the TRA (TIBCOjms.jar). To upgrade these client libraries you can manually copy the jars from your EMS installation into /tra/5.x/hotfix/lib or upgrade the TRA to the latest version.

    Check for exceptions in the BusinessWorks or EMS logs

    If further investigation is required enable tracing on both the BusinessWorks engine and the EMS server and collect the logs.

    For the EMS server logs, please set the logfile location from the tibemsd.conf file.

    You can enable detailed tracing by issuing the following commands using tibemsadmin tool.

    To enable EMS server tracing:

     set server log_trace=DEFAULT,+CONNECT,+DEST,+AUTH,+PRODCONS,+MSG

     

    To enable EMS client tracing:

     set server client_trace=enabled addprop queue  trace=body

     

    BusinessWorks

    In the deployed application .tra file add

     Trace.Task.*=true Trace.JC.*=true (if the issue is with the process starter)

     

    Run the BusinessWorks engine from the command line (not from Administrator GUI) and redirect the output to a debug.out file. (This is required as the EMS client tracing is generated on the console and needs to be redirected to a file.)

    Start the application from .sh (for Unix) or .cmd (for Windows) from /tra/domain//application/ as follows

     .sh > debug.out 2>&1 

     

    Document References

    For details, please refer to the TIBCO ActiveMatrix BusinessWorks and TIBCOEnterprise Message Service documentation for configuration check.

    Troubleshooting

    To isolate the cause of the problem:

    • Check the JMS activities used in the process.

    • Identify the activities that seem to hang or are hanging.

    • Check with another dummy process if the messages can be consumed from the queue whilst the BW engine is hung.

    • Check the Admin GUI for the execution time of these activities by collecting engine statistics.

       

      The statistics will show for each job and activity how much of eval time and elapsed time is taken. This will tell us which activity(s) could be causing the bottleneck.

    • Check the Admin GUI- Active processes to see at which activity the jobs are hanging.

    • If the message rate is high and the process starter is hanging, check the application.tra to see what maxjobs and Flowlimit are set to for the process starter (s).

    • Check the Admin GUI- Process Starters. If the Flowlimit is set and the process starter has reached the Flow limit, the Process Starter will be in a FLOW-CONTROLLED state.

       

      This means the process starter is disabled and cannot accept any new jobs. The process starter comes out of the Flow controlled state after approximately half of the jobs are executed to completion.

    • Check the BW application logs if there are any exceptions thrown before going into the hung state.

    • Check the EMS server logs, for any exceptions.

    • Check the version of the EMS client libraries.

    • If the issue is reproducible, please run the BW engine from the command line, redirecting its output to an output file and collect thread dumps as detailed in the next section to help n find the root cause of the problem.

    Running the engine from the command line will also capture any errors if thrown on the console.

    Information to be sent to TIBCO Support

    1. Operating system and hardware configuration

    2. Confirm the Admin/TRA/BW/EMS versions with hotfixes if any.

    3. Details on message rate and message size

    4. The multi-file project and the deployed .ear file.

    5. The deployed application .tra file.

    6. The BusinessWorks engine logs.

    7. The EMS server logs and output of the EMS admin commands given previously in this article.

    8. The EMS server configuration files from \bin\*.conf.

    9. From the Admin GUI, collect engine statistics. Click on the Service Instance-Engine Control Tab?collect statistics --- Start and save this to a .csv file.

    10. From the Admin GUI, screenshots of the active processes (if any) showing the jobs that are hanging/stuck at a particular activity

    11. Thread dump -if the subscriber is hanging, restart the BW engine from the command line, re-direct the output to a file and capture thread dumps as follows:

      1. For Unix Systems

        • Run:

           /tra/domain//application//.sh > debug.out 2>&1

           

          and note the PID for the engine

          When the issue occurs, execute:

           kill -3

           

          for the engine 2 or 3 times at an interval of 5 minutes. Then send us the re-directed output (debug.out).

      2. For Windows System

        • Run:

           \tra\domain\\application\\.cmd > debug.out 2>&1

           

          When the issue occurs, press the key combination ?Ctrl + Break? when focus is on the running application at least 2 or 3 times at intervals of 5 minutes. Then send us the re-directed output (debug.out).


    User Feedback

    Recommended Comments

    There are no comments to display.



    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now

×
×
  • Create New...