Agent Offline Status Emails Taking Longer Than Expected

Problem

Agent offline status emails taking longer than expected. The check-in time is set to 2 minutes but the alarms are getting generated after 5 minutes.

Cause

Offline Alert (Behind the scenes)

On the Agent monitoring UI:

Agent has not checked in for _______ Min, Rearm alert after ___________

There seems to be a little confusion on how these two settings work

(1) "Has not checked in" entry is to signal when an alert to be raised.  The very first agent offline alert would be produced after the agent is offline for that specified period of time.
offline.png
(2) The rearm setting determines for how long the alert will be disabled before it's automatically re-enabled. This means, right after the first agent offline alert, you won't receive another offline alert for this offline agent until REARM length + you're "hasn't checked in" period.

 

Because of our distributed nature of agents/KServer, any short network delay or network noise would cause Agent NOT to check in properly, then KServer will have no choice but to consider that as offline. This might produce excessive alerts that you may not need. To prevent these false alarms from happening, we have implemented a mechanism to wait for 2 agent check-ins before we signal an agent offline alert. So, even if you put both 0s in the settings above, you won't get any quicker offline alert until 2 X Agent check-in intervals.

After alerts are created, there is a background system process that will send our offline emails. It usually runs every 2 minutes or so. So in the worst case, you should get those offline alerts processed every 2 minutes under normal system load.

Have more questions?

Contact us

Was this article helpful?
0 out of 0 found this helpful

Provide feedback for the Documentation team!

Browse this section