Problem
Cause
On the Agent monitoring UI:
Agent has not checked in for _______ Min, Rearm alert after ___________
There seems to be a little confusion on how these two settings work
(1) "Has not checked in" entry is to signal when an alert to be raised. The very first agent offline alert would be produced after the agent is offline for that specified period of time.
Because of our distributed nature of agents/KServer, any short network delay or network noise would cause Agent NOT to check in properly, then KServer will have no choice but to consider that as offline. This might produce excessive alerts that you may not need. To prevent these false alarms from happening, we have implemented a mechanism to wait for 2 agent check-ins before we signal an agent offline alert. So, even if you put both 0s in the settings above, you won't get any quicker offline alert until 2 X Agent check-in intervals.
After alerts are created, there is a background system process that will send our offline emails. It usually runs every 2 minutes or so. So in the worst case, you should get those offline alerts processed every 2 minutes under normal system load.