Traverse Component Health page shows non-zero time offsets



1 comment

  • Avatar
    Rob Arends

    We have a tip for those wanting to have an alert for the timesync of the Traverse servers being off.


    The info on the Traverse Component Health page is contained in the MySQL DB liveeventsdb on the BVE.

    So you can craft a sql_value/mysql test to do this.


    In the liveeventsdb, the componentId defines the various parts of Traverse.

    These are the valid "IDs" [up to the first space, the rest is my description]

    CSE_ID        - Summary engine / Live events
    DC_ID          - Data Collector (DGE Extension)
    DGE_ID         - Data Gathering Engine
    MSG_HNDLR_ID - Message Server
    RDC_ID       - Remote Distribution Client (filesync) [everything except BVE]
    RDS_ID       - Remote Distribution Server (filesync) [BVE]
    WEB_APP_ID   - Web App


    Below I have 'selected' DGE and DGE Extension as separate tests.
    But you can combine by adjusting the SQL select.

    eg: change "=" for "in" and provide a list in Brackets.
    -> All DGE and DGE Extensions.
    "where componentId=\'DC_ID\' " for "where componentId in (\'DC_ID\', \'DGE_ID\', ) "
    -> All of Traverse
    "where componentId=\'DC_ID\' " for "where componentId in (\'RDS_ID\', \'RDC_ID\', ) "


    * Substitute {bve_ip/user/pass} for your environment.
    * Substitute "BVE" for your devicename for the BVE.
    * Substitute actionname of None to one valid for your environment - you can do this after, via the GUI.
    The " >= 1 " is the threshold in seconds. --host={bve_ip} --username={user} --password={pass] --exec 'test.create "actionname=None", "criticalthreshold=1", "database=liveeventsdb", "devicename=BVE", "", "interval=60s", "loginname=emerald", "password=mysql", "port=7663", "query=select count(*) from liveeventsdb.ComponentStatus where componentId=\'DC_ID\' and ABS((pingTimeStamp-receivedTimeStamp)/60000) >= 1;", "subtype=mysql", "testname=DGEx Time Sync needs checking", "testtype=sql_value", "thresholdtype=1", "units=", "warningthreshold=1"'  

  --host={bve_ip} --username={user} --password={pass] --exec 'test.create "actionname=None", "criticalthreshold=1", "database=liveeventsdb", "devicename=BVE", "", "interval=60s", "loginname=emerald", "password=mysql", "port=7663", "query=select count(*) from liveeventsdb.ComponentStatus where componentId=\'DGE_ID\' and ABS((pingTimeStamp-receivedTimeStamp)/60000) >= 1;", "subtype=mysql", "testname=DGE Time Sync needs checking", "testtype=sql_value", "thresholdtype=1", "units=", "warningthreshold=1"'      

    It provides a count of the number of matching servers exceeding 1 second, but at least you know to look into your timesync.

    Hope that helps someone else.



Please sign in to leave a comment.