SUMMARY
Concurrent VMware backups may exceed vCenter memory limitations.
ISSUE
A large number of concurrent backups against an ESXi or vmware host could result in exceeding vmware's default read buffer size, resulting in intermittent backup failures.
This issue will occur when a substantial number of backups use the NBD or NBDSSL protocol for backup. This does not impact backups that use the SAN backup protocol used for SAN Direct backups on hardware appliances, nor HotAdd protocol used in UBs who's hosts have access directly to vmware datastores.
These backup failures may produce errors like these in the Unitrends UI:
VixDiskLib: Detected DiskLib error 802 (NBD_ERR_INSUFFICIENT_RESOURCES)
VixDiskLib: VixDiskLib_Read: Read 8192 sectors at 0 failed. Error 2 (Memory allocation failed. Out of memory.) (DiskLib error 802: NBD_ERR_INSUFFICIENT_RESOURCES) at 5178.
RESOLUTION
If you are experiencing this issue, Unitrends recommends addressing this issue in the following ways:
1) Ensure all ESXi hosts are expressly added as clients to the Unitrends UI under Configure > Assets in addition to vcenter. Re-save each of these assets to ensure the credential that may have been applied originally has no since been modified if they already exist. ESXi credentials may differ from vcenter (in 6.X this is guaranteed) and we are unable to directly connect the VDDK for backup to ESXi without this information. Using direct connections to ESXi bypasses buffer read limitations on vcenter that VMWare currently does not allow to be changed. Without this configuration, all backup read traffic will pass through vcenter, and its single 32MB read limit will impact the environment more dramatically than the individual esxi limitations.
2) If after ensuring the above, the NBD memory issues persist, investigate if SAN Direct or HotAdd could be used where it is not being used.
- SAN direct requires a physical Unitrends appliance, HotAdd backups are used by UB Appliances.
- vcenter is required to use SAN Direct or HotAdd access
- Configuring iSCSI or FC connection to SAN when using a Physical appliance is discussed further in this KB. HotAdd will be used automatically where possible.
- HotAdd can only be used for VMs located in SAN LUNs (FC or iSCSI) and for locally attached disks located on the same host as the UB appliance. Local disk on other hosts and NFS datastores are not compatible with VMWare's HotAdd technology. .
- Note that SAN backups that encounter communication difficulties on the storage network (especially those connections that may route through a gateway or firewall) may fall back to NBD protocol mid-backup. Storage network failovers or unusual switch load balancing may also cause this issue. Speak to your storage or network vendor if this is occurring in your environment to resolve.
The configuration information can be located with the following information. Proper tuning of this value should be done after communicating with VMWare Support. Unitrends at this time cannot recommend appropriate values to use as this may differ based on environment configuration. The following is an example only that was provided by VMWare Support.
The configuration file of host is at "/etc/vmware/hostd/config.xml" there is an element for nfc server: <!-- The nfc service --> <nfcsvc> <path>libnfcsvc.so</path> <enabled>true</enabled> <maxMemory>50331648</maxMemory> <maxStreamMemory>10485760</maxStreamMemory> </nfcsvc>
4) If after the above are reviewed the memory errors persist, Unitrends can reduce the buffer size we request for each backup. Under release 10.0 or higher, customer can access the read buffer size configuration by accessing Configure > Appliances > Edit > Advanced > general Configuration. Filter the section to "vprotect" and locate the "mbsForBackup" value, which by default is 4. ESXi limits max concurrent buffer to 32MB by default, so at 4MB per thread, each host would be limited to 5 concurrent NBD protocol backups. Lowering this value to 2 would increase the backup concurrency. NOTE: reducing the read buffer per backup thread may decrease backup performance. The decrease will vary by environment and Unitrends cannot quantify the overall impact of this setting in your environment.
5) Optionally, instead of limiting read buffer size used for backup, we can limit the concurrent number of VMWare backups the appliance performs. This option is only available in release 10.1 or higher. Go to Configure > Appliances > Edit > Advanced > General Configuration. Filter the section to "vprotect" and locate the "concurrentjobLimit" value. Set this value to a value less than 10. Values as low as 5 may be required where a single host is used and all backups are NBD backups. The default value of 0 does not limit backups. unitrends is developing a more advanced option that will limit backups per host as opposed to globally (original vmware communication was that this issue was limited by vCenter itself, not esxi) but that feature is not available in release 10.1.
6) if you have multiple Unitrends Appliances, ensure 2 or more appliances are not protecting guests that may exist on the same esxi host at the same time such that more than 5 concurrent backups of the same ESXi host using NBD protocols would not concurrently occur.
CAUSE
This issue is related to a VMWare decision to limit the NBD protocol to a maximum 32MB read buffer by default on ESXi hosts. This decision impacts all backup vendors which utilize VMWare's official VDDK APIs for backup and are required to use the NBD protocol. VMWare itself is investigating changes to these default configurations.
This issue has historically been rare to encounter when VMWare environments are designed and scaled to VMWare's best practices, but, over time the number of guests per host has continued to increase as hardware capabilities have improved resulting in an increased likelihood that many guests on the same host would concurrently be protected by a backup vendor. Customers who have multiple Unitrends appliances (or multiple backup products in the same environment) will be more exposed to this issue, as are customers who perform backups more frequently per day increases the likelihood of host contention. Customers who use NFS datastores may also be more exposed as NBD backups are the only option in those environments. Customers who use VMFS 5 SAN connected host storage for protection are less likely to encounter this issue as HotAdd or SAN Direct configurations completely mitigate this VMWare Limitation.
In all cases of encountering this issue, tuning of VMWare ESXi settings should resolve this issue where backup methods other than NBD cannot be used. Additional settings unitrends has made available in 10.0 and 10.1 should further limit the impact to those who cannot make sufficient enough VMWare adjustments.
We are continuing to work with VMWare to better detect this issue, and are working on more advanced and adaptive technology that will limit a single appliances' esxi backup threading more dynamically. The best resolution to this issue remains at this time proper ESXi advanced performance tuning, which may require assistance from VMWare to complete.