We have installed TIBCO EMS 6.0 on our x86_64 GNU/Linux servers (Kernel 3.10,RedHat 7.4) due to compatibility issue with the correspondence application using our EMS 8.4 queues earlier. So , we got the installable for 6.0 from TIBCO then we setup the same on our production environment. Everything went fine in respect of performance & compatibility with the other application but after a while we started facing issue with EMS service going down abruptly and no specific error is getting captured in the logs on which we can work. Below is some scenario we observed and also one error which we have started seeing recently but still not able to get to the resolution.
If primary service goes down , it goes to hung state and does not release lock on the shared file which makes difficult for the secondary service to pick the transition where it was left as the file are coming locked for it.
2018-11-21 13:46:20.499 ERROR: Unable to open store file '/usr/tibco/data/emsft/prod/emsfiles/async-msgs.db', file may be locked.
2018-11-21 13:46:20.500 ERROR: Unable to open store file '/usr/tibco/data/emsft/prod/emsfiles/meta.db', file may be locked.
2018-11-21 13:46:20.500 ERROR: Unable to open store file '/usr/tibco/data/emsft/prod/emsfiles/sync-msgs.db', file may be locked.
In the other scenario primary service goes down completely and secondary picks it up where it was left but that also goes on hung state giving no error in sometime and will not proceed further until restarted. Third is the below error which we are seeing when the service goes down completely without getting hung.
On server1- Nov 28 01:14 : /usr/tibco/ems/6.0/bin/tibemsd64.sh: line 6: 59502 Segmentation fault (core dumped) ./tibemsd64 -config "/usr/tibco/cfgmgmt/ems/data/tibemsd-file-PROD_FT.conf"
We are trying to get the core dumb to understand but are not able detect where it is getting created, also have checked on the NAS mount statistics that also seems correct so not sure what exactly is happening on every Weekend.
Dec 1 04:40:34 server1 kernel: tibemsd64: segfault at 74 ip 000000000048bd74 sp 00007f205f5bceb0 error 4 in tibemsd64[400000+34b000]
Dec 1 04:40:35 server1 abrt-hook-ccpp: Process 93512 (tibemsd64) of user 2395 killed by SIGSEGV - dumping core Dec 1 04:40:35 server1 abrt-hook-ccpp: Failed to create core_backtrace: waitpid failed: No child processes
Dec 1 04:40:35 server1 abrt-server: Executable '/usr/tibco/ems/6.0/bin/tibemsd64' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Dec 1 04:40:35 server1 abrt-server: 'post-create' on '/var/spool/abrt/ccpp- 2018-12-01-04:40:35-93512' exited with 1
Dec 1 04:40:35 server1 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2018-12-01-04:40:35-93512'
Can you tell if we can configure to dump before crash in the application ?
Could you please help us with your expertise or any guidance on this issue, should we check any other detail or any config change you would suggest. Please let us know if you need any more details on above problem.
Thank you in Advance !!