Anytime an agent appears offline or does not connect to the platform, we should check the below. Bear in mind that the same applies if the device triggered an offline alert.
If the device has never connected to the platform (does not exist in the portal)
- What devices are affected? What is in common between all affected devices?
- If it's one device, it may be something on the device, but if it's a whole site, it's more likely to be a connection issue.
- Screenshare to the device if possible
Check the agent icon –
- If no icon – possibly did not install correctly. Check program files and see if all necessary files and logs are in place. If not, see Agent Won't Install (Web view)
- If showing red, service may be stopped or is failing to start. Check services. If the services do not start, check the event log for error messages. May want to try reinstalling if this is the case, but .NET repair or upgrade may also be needed.
- If showing "connecting to" the cc server, this suggests a connection issue (see below)
- If showing "connected to" the cc server, it means the device IS connecting to the platform, so if the partner does not see it, they're looking in the wrong place. It could be in the wrong portal, have security level restrictions, or some other oddity
- Check the logs - FIRST! Before the agent Health Check!!
Normal agent connectivity looks like the below. Where the "ping" is the agent telling the platform "I'm here and connected," and "200" is a successful acknowledgement by the platform. If we do not get a 200 back, it means the message to the platform never made it there, so the platform doesn't see the "I'm here" message and thinks the device is offline:
4.4.2206.2206|2024-02-02T00:01:34.743-05|INFO|12|CsControlConnection: Sending request dcdf670a-95a3-4a98-911f-4ca5c0d33c44 to CS - "ping"|{ "messageId": "dcdf670a-95a3-4a98-911f-4ca5c0d33c44", "messageMethod": "ping" }
4.4.2206.2206|2024-02-02T00:01:35.760-05|INFO|38|CsControlConnection: Response to dcdf670a-95a3-4a98-911f-4ca5c0d33c44 is Http - 200|{ "messageId": "dcdf670a-95a3-4a98-911f-4ca5c0d33c44", "responseStatus": 200 }
- Check for on-demand status. Devices in On-Demand sites will appear offline and show "connecting to" on the agent icon as normal behavior. The partner may have just expected something different. Search log.txt for "demand" and you will see a status of true or false, where True is OnDemand, and False is a regular Managed agent
- 4.4.2240.2240|2025-02-19T00:02:27.495+01|INFO|32|CsConnection: On Demand = False|{ }
- Check if the account is over licensed. If so, you will see this explicit error message:
CS returned error code 429 - This request was made too soon after a previously failed login request. This is normally due to the account being over its device limit. - Look for these common errors with known fixes:
- "SSL Error: The remote certificate is invalid according to the validation procedure."
- "CS returned error code 429 - This request was made too soon after a previously failed login request - This is normally due to the account being over its device limit"
- This is likely caused by them having more devices than 110% of the licenses they've purchased
- Health check will likely catch this, but should also be fairly frequent in log.txt
- The true solution is to purchase more licenses, but if they are only over by a couple of devices, in a pinch, they could theoretically delete ones that are less important if they need to get a mission-critical device online quickly
- "SSL Error: A call to SSPI failed, see inner exception." OR
"Message": "SSL Error: The client and server cannot communicate, because they do not possess a common algorithm"- Most often caused by the device not having the needed cipher suites (more common on older devices, especially servers)
- Full info and solution here: Kristopher Corpus: Windows Server devices failing to reconnect - Post v11.2 update...
- Note that when we test for cipher suites, test needs to be done in internet explorer
- See also: https://rmm.datto.com/help/en/Content/1INTRODUCTION/Infrastructure/INFRASTRUCTUREANDSECURITY.htm#Supported_TLS_Cipher_Suites
- If we've made all the necessary changes to cipher suites and connections still do not pass – ask partner to review their group policy – sometimes they have policies that override local settings and connect with the wrong TLS or cipher suite version
- An existing connection was forcibly closed by the remote host
- This usually suggests packet inspection is interfering, especially when connection is intermittent. If an agent has never connected or lost connection and never reconnected it's less likely to be the case, but this could mean packet inspection on the firewall is killing "idle" connections between agent and platform
- It could also suggest security software is killing the connection. We can check with the partner to see if security software can be bypassed for testing to confirm.
- No such host is known
- This almost always signals a DNS issue, where their DNS is not able to translate a URL into an IP address. The partner may be using a custom DNS and/or need to add Centrastage URLs to their DNS.
- 4 bytes requested but only 0 read in BEBinaryReader (HTTPS filter)
- If we see other less-specific error messages (ssl, no response, etc.), double check on the scope of the affected devices. If it's a general connection issue, it's more than likely a network issue.
- Sharing allow listing is helpful, but we need to do more than this. Partners are often not thorough enough to identify their own allowlist mistakes
- If we have evidence to suggest there is a network issue, we should ask to review their firewall logs with them. Oftentimes there is a traceable block in the firewall that shows when our connection is was blocked. Usually getting partners to see this is enough for them to take ownership of fixing their allowlisting to allow our connections
- If all else fails, try the agent HealthCheck, but beware of unrelated errors. One to beware of is one saying that AEMAgent is out of date and to check dependencies, etc. AEMAgent is downloaded BY cagservice when the cagservice installs, so if the service never makes connection to the platform, it will never download any version of AEMAgent. This is a symptom of the connection issue, not the cause. Don't waste time on that error message if the problem is a connection issue on the main service to the platform
Flow: