Agent offline status emails taking longer than expected

Problem :
Agent offline status emails taking longer then expected. The check-in time is set to 2 minutes but the alarms are getting generated after 5 minutes.
Cause :
Offline Alert (Behind the scenes)

On the Agent monitoring UI:

Agent has not checked in for _______ Min, Rearm alert after ___________

There seems to be a little confusion on how these two settings work

(1) "Has not checked in" entry is to signal when an alert to be raised.  The very first agent offline alert would be produced after the agent is offline for that the specified period of time.
(2) The rearm setting determines for how long the alert will be disabled before it's automatically re-enabled. Which means, right after the first agent offline alert, you won't receive another offline alert for this offline agent until REARM length + your "hasn't checked in" period.

Because of our distributed nature of agents/KServer, any short network delay or network noise would cause Agent NOT to check in properly, then KServer will have no choice but to consider that as offline.This might produce excessive alerts that you may not need. To prevent these false alarms from happening, we have implemented a mechanism to wait for 2 agent checkins before we signal an agent offline alert.So, even if you put both 0s in the settings above, you won't get any quicker offline alert until 2 X Agent checkin intervals.

After alerts are created, there is a background system process that will send our offline emails. It usually runs every 2 minutes or so. So in the worst case, you should get those offline alerts processed every 2 minutes under normal system load.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request


Article is closed for comments.