Jump to content
  • TIBCO EMS Timeouts


    Manoj Chaurasia

    I had an unusual experience with Tibco EMS where requests were timing out in the queues before they could be picked by the queue receiver. 

    My Setup is SOA where I have a Gateway, a Business Server, and a technical server. 

    Transport between the above server is via EMS.

    There was no logical reason why requests would timeout before exhausting the set timeout of 10-60 seconds between the services on the servers and to increase the complexity of troubleshooting, this issue only occurred at certain periods of the night. 

    This was quite an issue and several cases were raised for support. 

    We suspected Backup jobs on DB or VM snapshots which were all stopped temporarily but the issue persisted. 

    Eventually, I found out that our NTP time was moving ahead by 130 seconds between 0030-0125 Hrs and again between 0330-0415 Hrs. 

    These servers were randomly getting a time mismatch of 130 seconds. Business servers would maintain normal time while the primary EMS would move ahead by 130 seconds approximately 2 minutes. The secondary EMS server would also maintain the correct time. 

    A deeper analysis leads you to the NTP server time and eventually to the NTP time provider. 

    The NTP provider was changed and all timeouts cleared. 

    Conclusion:

    NTP time has enormous effects on BW-EMS setup. 

    1. Timeouts

    The timeouts between the server and EMS will occur when the server time runs out of sync and EMS will drop the requests if the requests have an expiry period set. 

    2. Fault Tolerance

    If you have a Fault Tolerance setup for your EMS, the EMS servers could lose time between them when the NTP time runs out of sync. This causes the secondary EMS server to attempt to write to the shared filestore. 

    If the NFS is not correctly configured to prevent writing on a locked file, the secondary EMS will write into the message store file that is locked by the primary EMS when the secondary EMS fails to get a heartbeat from the primary EMS. 

    The primary EMS will then trigger shutdown when the integrity of the message store is affected. This may cause a total downtime on your setup 


    User Feedback

    Recommended Comments

    There are no comments to display.



    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now

×
×
  • Create New...