PROBLEM
All Traverse component health status entries critical
CAUSE
This symptom is likely caused by a failure within the 'Correlation & Summary Engine' or the 'Internal Communication Bus' (a.k.a. Java Messaging Service or JMS) that prevents the processing of the periodic heartbeat updates being sent by each Traverse component across all the Traverse servers. Note that the components (other than JMS and CSE) may continue to function correctly even though the 'heartbeats' are not updated on the page 'Superuser->Health'.
In addition, while in this state the Event Manager (Status->Events) might not be updated.
RESOLUTION
To allow us to determine the cause of the failed heartbeat status updates, kindly gather the following data prior to restarting ANY Traverse components.
Where 'TRAVERSE_HOME' is the installation directory for Traverse, kindly forward:
- a zip archive of the folder 'TRAVERSE_HOME\logs' from the BVE server
- a screenshot of the 'Superuser->Health' page showing the heartbeat statuses and their last update times
- a screenshot of the Traverse Service Controller launched from the Windows BVE or the output from linux command 'service traverse status'
- a stack dump of the CSE component taken from a Windows command prompt on the BVE (launched using 'Run as Administrator'):
cd TRAVERSE_HOME
apps\jre\bin\java -classpath webapp\WEB-INF\lib\traverse-7.0.jar com.zyrion.traverse.utils.FullThreadDump localhost:7696 > cse_stack.dump
- a heap dump of the CSE component created according to the instructions outlined in Generating a heap dump using JConsole (via TCP port 7696)
- a stack dump of the JMS component taken from a Windows command prompt on the BVE (launched using 'Run as Administrator'):
cd TRAVERSE_HOME
apps\jre\bin\java -classpath webapp\WEB-INF\lib\traverse-7.0.jar com.zyrion.traverse.utils.FullThreadDump localhost:7697 > jms_stack.dump - a heap dump of the JMS component created according to the instructions outlined in Generating a heap dump using JConsole (via TCP port 7697)
After capturing the diagnostic information above, restart the CSE component to temporarily correct the issue.
Should the issue persist:
- Stop all Traverse components on the BVE
- Delete the folder 'TRAVERSE_HOME\database\jms\broker' on the BVE (it will be re-created when the JMS component starts)
- Start all Traverse components on the BVE
APPLIES TO
All versions of Traverse.
REFERENCE
None.