SUMMARY
In certain cases, the ipmiutil watchdog timer service has been left enabled and can produce unexpected behavior such as a hard reset of the DPU.
ISSUE
Appliance reboots/resets without reason and you may see the following when checking the BMC event log.
[root@Recovery-933S ~]# ipmiutil sel -e
ipmiutil ver 2.97
isel: version 2.97
-- BMC version 1.86, IPMI version 2.0
SEL Ver 0 Support 03, Size = 512 records (Used=6, Free=506)
RecId Date/Time_______ SEV Src_ Evt_Type___ Sens# Evt_detail - Trig [Evt_data]
0001 05/05/16 00:29:31 MAJ BMC Watchdog_2 #ca Timer interrupt 6f [c8 04 ff]
0002 05/05/16 00:29:32 CRT BMC Watchdog_2 #ca Hard Reset action 6f [c1 04 ff]
0003 05/05/16 00:36:30 MAJ BMC Watchdog_2 #ca Timer interrupt 6f [c8 04 ff]
0004 05/05/16 00:36:31 CRT BMC Watchdog_2 #ca Hard Reset action 6f [c1 04 ff]
0005 05/05/16 00:44:31 MAJ BMC Watchdog_2 #ca Timer interrupt 6f [c8 04 ff]
0006 05/05/16 00:44:32 CRT BMC Watchdog_2 #ca Hard Reset action 6f [c1 04 ff]
ipmiutil sel, completed successfully
RESOLUTION
Perform the following to first verify that the watchdog timer service is enabled.
[root@Recovery-933S ~]# service ipmiutil_wdt status
ipmiutil ver 2.97
iwdt ver 2.97
-- BMC version 1.86, IPMI version 2.0
wdt data: 04 01 00 00 84 03 84 03
Watchdog timer is stopped for use with SMS/OS. Logging
pretimeout is 0 seconds, pre-action is None
timeout is 90 seconds, counter is 90 seconds
action is Hard Reset
ipmiutil wdt, completed successfully
ipmiutil_wdt is running...
Once you've verified that the ipmiutil_wdt service is running as indicated by the output above, stop and disable the service from running at boot by issuing the following commands.
[root@Recovery-933S ~]# service ipmiutil_wdt stop
Stopping /usr/bin/ipmiutil wdt:
[root@Recovery-933S ~]# chkconfig ipmiutil_wdt off
Finally, do verify that the service is stopped considering the init script does not output the results of the stop argument.
[root@Recovery-933S ~]# service ipmiutil_wdt status
ipmiutil ver 2.97
iwdt ver 2.97
-- BMC version 1.86 IPMI version 2.0
wdt data: 01 00 1e 00 b0 04 b0 04
Watchdog timer is stopped for use with BIOS FRB2. Logging
pretimeout is 30 seconds, pre-action is None
timeout is 120 seconds, counter is 120 seconds
action is No action
ipmiutil wdt, completed successfully
ipmiutil_wdt is stopped
CAUSE
In certain cases, the ipmiutil watchdog timer service has been left enabled and can produce unexpected behavior such as a hard reset of the DPU.