SUMMARY
How to resolve a backup failing with message "Could not connect to BP Agent on client X"
ISSUE
Purpose
If a backup fails with the following message:
could not connect to BP Agent on client X
This indicates that the DPU was unable to connect to the agent service on the client machine at the time that the backup started.
Resolution
Client was not available
If you're looking at a backup that occurred in the past, the first thing you should check is to see if this connectivity issue is still present. You can easily check the connectivity between the DPU and a client in the RRC by clicking on Configure->Clients, clicking the name of the client, and then clicking Save without making any changes. This will immediately re-test the connection.
If this works successfully, you can usually assume that any subsequent backups will be able to connect successfully.
If this message was encountered during the process of trying to initially register the client to the DPU (under Configure->Clients->Add Client in the RRC), then this may indicate that the agent software has not yet been installed on the target machine.
The Unitrends agent software is required for all backups except vProtect backups, so ensure that the customer has the software installed.
If the machine has successfully backed up in the past, then this is not likely to be the issue
Agent is not running
For some reason, the service has been stopped, or the process has been killed. Confirm that the agent is running on the client machine.
Agent is not listening
Testing
If the service appears to be running, confirm that it is listening. By default, the service listens on TCP port 1743, and will issue a binary handshake request when connected to on that port.
In order to test this, run the following command from the DPU:
You should see something that looks like the following come back. You can issue a Ctrl-] followed by 'quit' to exit the shell:
telnet X 1743
Trying 172.21.4.62...
Connected to x.y.local (172.21.4.62).
Escape character is '^]'.
▒A,Connect1745^]
telnet> quit Connection closed.
If you see a message indicating that the connection was refused, you should look into firewall issues, and check the agent logs for inability to open the port.
It can be helpful to repeat the same test from the client machine itself, having it connect to itself on TCP port 1743:
telnet localhost 1743
You should get the same result back as on the DPU.
- If you can have the client successfully connect to itself, and the DPU can also connect to it, then there shouldn't be any connectivity issue. Re-check that the DPU is able to register the client.
- If you can have the client successfully connect to itself, but the DPU cannot connect to it, look into networking issues. Firewall issues and name resolution issues are the most common culprits.
- If you cannot connect to the client from either the DPU or the client itself, look into port binding issues
If you see a message indicating that the host was unreachable, you should confirm that the hostname can be resolved correctly.
Could_not_connect_to_BP_Agent_on_client_X Port binding issues
If the agent is running, but not listening on port 1743, it may have not been able to bind to the correct service port at initialization - this is a particularly common on Windows.
Check to see if Bug # 9044 applies
Check the BPNETD log (C:\PCBP\Logs.dir\BPNETD_X.log on Windows, /usr/bp/logs.dir/bpnetd_x.log on other systems) for messages indicating that it was unable to open the port.
If there is a conflict, stop the agent service, ensure that all BPNETD and WBPS processes have exited, and then start it again.
Hosts file
If the DPU is unable to correctly resolve the name of the client into an IP address, ensure that the hostname -> IP mapping has been statically defined in the hosts table.
Under the RRC, this can be done in Configure->Networking->Hosts.
In the terminal, this can be done by editing the /etc/hosts file (these methods are equivalent).
A note about DNS - relying upon the customer's DNS system to map hostnames to IP addresses does NOT work reliably. If you find that there are connectivity issues, you should ensure that the name/IP are mapped in the hosts file explicitly, even if this has never been necessary in the past.
[edit]Too many concurrent backups (SQL Bug)
If multiple VSS SQL database backups are being performed concurrently, then you may experience an intermittent problem with the DPU reporting this message. It will not necessarily always occur, or always occur on the same database.
This is bug - [#10160] - which is caused by the agent reaching the limit of how many log files it can have open at once (it attempts to recycle the oldest log file, while that file is still in use.)
To fix this, edit the client-side master.ini file. Add the following line in the [BProfessional] section:
NumberWBPSLogFiles=
Where is the number of WBPS logs you want to keep. You want to be greater than your maximum parallel jobs that could be running of this client