SUMMARY
How capacity and change rate may lead to insufficient space for backups
ISSUE
Insufficient space being reported for backups. This article will cover how the nature of the data protected by a unitrends appliance may contribute to these issues.
For more information on "Out of Space" conditions see Error: No more space on device
RESOLUTION
For simplicity, we will assume the appliance is performing backups only and is not serving as a replication or vaulting target. It's important to understand that incoming data from another unit in that way would be separately calculated and added to the capacity information discussed below, both of which combined must be less than the unit's rated or recommended capacity, but discussion how to comute that is beyond our scope for this article.
There are 3 basic factors that impact how much space your data uses on the disk inside an appliance: Capacity, Data Reduction, and Change Rate. It is important to understand how all 3 come together to determine how they may be impacting your appliance.
Capacity
Each Unitrends Physical appliance is rated to protect a "max recommended" backup set that is advertised. For a UEB, the "Capacity" number is calculated as a factor of how much disk has been added to the UEB. Generally speaking this is displayed in the Capacity Report in the UI as "Capacity" in GB near the bottom of the report.The Capacity Report can be accessed in the Satori UI under Reports > Appliance > Capacity Report. In the Legacy UI, this is located under Reports > Capacity.
Generally speaking, "Capacity" is the maximum amount of data that this appliance should protect. "Total Used" is the sum of current protected source data plus additional space that may be reserved from that by Instant Recovery settings. If "Capacity" is exceeded, various appliance functions will may disabled. In certain UEB license models if capacity is exceeded the unit will stop permitting new backups until this is resolved. The report displays the sum of each current master backup only, also referred to as the raw protected content. This report does not include multiple copies, incrementals, differentials, and other aspects that add to the total disk space used, it is simply a view of how much data in your environment has been configured to be protected.
It is important that Total Used capacity never exceed Capacity. Doing so effectively sets you up to fail as there will be insufficient space for normal operation.
Depending on the generation of the appliance being reviewed, and with a typical expectation or attaining 30 days of retention for backups, most customers will need to target a Total Used capacity between 50% and 70% of Capacity max. Modern "S" series appliances using Advanced Adapative Deduplication can operate normally at much higher values and still get close to 30 days retention. The closer you get to Capacity in terms of the data protected, the less retention will be. On units using post process deduplication the expectation is that at max capacity retention is less than 1 week and deduplciation should not be in use. If Capacity is exceeded, there may be insufficient space to land new backups prior to deleting older backups which could cause "insufficient space" errors and cause backup failures or delays in backup processing.
However, staying below this max capacity number alone will not alone ensure successful operation. This "Max Recommended" Capacity number is an estimate, determined by making assumptions commonly seen in the industry. The 2 core assumptions surrounding recommended capacity are this: 1) Data Reduction rate is 2:1 or greater and 2) Change Rate over 7 days is 10% or less. Clearly if these numbers are not in alignment with common industry expectations, retention and max capacity can be negatively effected. So how do you further confirm this?
Data Reduction Ratio
In the UI, the Unitrends appliance reports the Data Reduction ratio. In Satori, this is located under Reports > Storage > Data Reduction. In the legacy UI this is located under Reports > Data Reduction.This is a global number that factors BOTH deduplciation and compression into account. The numbers to the right are the base ratio. On a new unit with only a few backups, this will primarily represent a base compression ratio. Typically customers will fall between 1.8:1 and 2.2:1 when compression alone is taken into account. Use of local backup encryption will reduce this number, usually to 1.25-1.5:1 or less as encryption by nature obfuscates data eliminating the ability to have reliable compression. It's unusual in the US or EU to require on-host encryption of backups when the backup appliance is located in the same secure datacetner as the original data, and we recommend against using encryption for backup data due to performance and retention detriment. Encryption on archives is all we typically expect to use. Customers who have lots of existing compressed data like video, audio, PDF and more will have lower compression ratios. Also, use of database dump scripts as opposed to VSS aware database backups or VM backups of database servers will also reduce compression ratios and increase change rate severely. Finally VMs will typically have slightly better compression compared to agent backups of the same client. Thick provisioned VMs will be backing up zero-length space that includes the total size of a provisioned disk beyond actual files and may have artificially high compression ratios if disk size is grossly over-provisioned.
As a unit acquires more backups, deduplication ratios will begin to rise. Initially the space overhead of building the SIS reserve for block storage will lower overall data reduction, but, once a unit is sustaining 2-3 concurrent full backups of each client, data reduction should sharply improve, and with each further data set deduplciation will continue to improve this number. This will vary by appliance generation as the deduplciation models have improved over several generations. CentOS 5 based units have the largest SIS overhead, CentOS 6 based units support hole punching in the SIS and SIS compression and will have ~50% less overhead than a cent 5 unit protecting the same dataset. New "S" series chassis as well as UEBs deployed post 8.2.0 use Advanced Adaptive Deduplication and eliminate a large part of the backup landing zone and have the best deduplciation ratios. Basically, the newer the model unit, the less storage overhead = better data reduction. Upgrades to new software releases will not acquire this advanced technology as some features are hardware dependent.
When looking at an otherwise normal unit that has been in production for some time and is full, you shoudl be looking for a data reduction above 2:1. If the data reduction is not above 2:1, then we need to think about how that impacts Capacity. Basically, take the inverse of 2:1 vs the current data reduction. Lets say the data reduction is 1.7:1 on your unit. Then the inverse is 1.3x. Take the current "total used" number and multiple by this, that's your virtual factoring, with lower data reduction, your unit will operate as if it had that larger amount of protected data instead of the reported capacity. If this number exceeds capacity, regardless of the reported values, you may have a unit that is undersized to support the nature of your data set vs industry averages.
Change Rate
Most normal systems produce an average of 10% changed data per week. Most VMs will produce slightly more vs agent backups when using host-based backups as changes like the pagefile can't be excluded from VM backups.Unlike the capacity and data reduction rates, change rate is not simple to calculate, it requires some effort. The easiest way to determine if you have lots of change rate that could reduce backup retention is to use the Backup History report. In the Satori UI this is located under Reports > Backup > Backup History. In the legacy UI this is under Reports > Backups. Note the default range for this report is 1 day, extend that range to either 14 or 30 days.
You may also with to check the options list (triangle at the bottom in legacy UI or column selector icon in Satori at the far right on the column list in the report) and enable the "database" column so you can review database and VM level granular options. From this report you can they click the size column 2 times to sort with the largest backups at the top. looking for large incrementals or differentials in this list should be relatively easy to do. Compare the incremental size of clients masters for the same client. Systems that have combined backups totaling more than 15% of the master over the 7 days following a master may be concerns for high change rate. High change rate can be seen often when database dump scripts are used, when databases are not excluded from file level backups (SQL, Exchange, and Oracle DBs we see in our UI are auto-excluded, others are not), or various copy or deduplication processes in windows or automated permission management on file shares may cause such change. The content of the backups can be reviewed from the backups report to see what files are changing in file level backup.
Combining this information, you can see trends in data change and conpression that will have a notable impact on system retention. The higher the change rates the more space is used for each backup set. Also typically high change rates = low deduplication as unique data is being created that data is unlikely to match pre-existing data on the same or other systems and will deduplicate poorly.
CAUSE
The logical D2D device configured for the backup job is at capacity and is having difficulty purging older data in real time (or at all).
The backup appliance may not be large enough to support the unique data set being protected.