SUMMARY
This article describes the deduplication data modes and their impacts.
ISSUE
Deduplication is by default set to LEVEL 1 on Trial activated UB systems and on Free edition units. Deduplication cannot be enabled on free editions.
Deduplication is set to LEVEL 3 on UBs deployed and activated with purchased activation codes that are not first activated as trials.
Based on the information below, you will need to choose and set your appropriate setting to the needed level of deduplication BEFORE you begin taking in backups.
Note: You can at any time, storage IO permitting, go up in deduplication levels, however going down in level is recommended to only be done with a re-installation of the unit. This is because existing deduplicated data is not UNdeduplicated when the setting changes, yet new backups would require undeduplciated space, typically resulting in storage full conditions.
The Deduplication levels available to licensed UB Appliances are as follows:
These settings are not available to physical appliances as Unitrends has designed all Generation 6 and later hardware appliances to operate at LEVEL 3 and prior generation systems to run at Level 2. The ability to reduce the deduplication levels on physical units is still possible but requires review by Unitrends Support and is rarely required.
RESOLUTION
LEVEL 1: No Deduplication (Compression and Hashing Only)
Use Level 1 deduplication when your backup device used to store the Unitrends backups and database partition cannot sustain the IO needed for greater levels, or when only absolute minimum retention (2 full backup rotations) would be used rendering the need for deduplication negligible. This setting trades storage needs for performance, allowing customers to use slower performance storage at lower prices where long term retention in the backup device of data is not a concern. This is also the option you should use if you intend on having more than one storage device added to CONFIGURE > Storage tab. It ensures that your selection for the storage location is not compromised by deduplicated blocks that could be on any of the storage devices.
Storage size needs are greatest when using Level 1 deduplication as every backup will require it's full free space allocation minus compression. The functional minimum requirement for storage is 2X the size of the protected data set, and 3X or more is typically required to store 30+ days of backups. Increasing retention further requires linear increases in storage size to accommodate. The appliance will store every block received independently.
Understand, the largest consumer of IO in a Unitrends System is the database. The database can be relocated in virtual machines to independent higher performance storage without the investment in high performance storage for the entire unit. Please see the KB Move the Unitrends database off of stateless storage and onto a new partition. If your DB can be moved to storage of not less than 500 write IOPS, (2K or more IOPS may be required for larger appliances) then it is often possible to run at LEVEL 3 on backup storage that might only otherwise support LEVEL 1.
Note: On units that are not also replicating Hot Copy backups, when using LEVEL 1 Deduplication, customer are advised to also disable the following setting: Configure > Appliances > Select Your Appliance > Click Edit > Advanced Tab > Click General Configuration then configure the following entry: Section InlineHashing > Name hashFileContents > Value = No. This will further decrease database overhead and improve performance. This setting should not be disabled if Hot Copy Replication is used.
LEVEL 2: Post Process Deduplication (Compression First, Deduplication Later)
In Level 2, the entire process of deduplication occurs as a passive background operation on backup sets that are not the current backup set, and blocks that exist in more than 1 backup are moved to Single Instance Storage (SIS). To reach this deep deduplication, your storage devices needs to be at least 2.5x your protected data set size. It will typically require 3X your protected capacity size to reach 30 days of retention similar to LEVEL 1, but beyond this value, additional retention is gained by only adding new change data in addition and is no longer linear. This system is best targeted at customers who retain not less than 3 full backup rotations, and the greatest benefit will be seen for those 6 or more full backup sets. The appliance will store typically 2 copies of each unique block, plus requires additional write storage overhead for post process deduplication that can be as much as the largest individual backup. If database performance is constrained, additional backups may be maintained in non-deduplciated states increasing storage requirements.
Use Level 2 to improve performance under the following conditions:
2) Your wish full system restore performance to be a primary focus, but are unable to meet SLAs using replicas or other methods for instant recovery and must rely on full traditional restores. This mode ensures data is recovered the fastest in those cases.
3) Your Cold Backup Copies (previously known as Archives) take too long to complete to meet offsite SLAs, and other optimizations for Cold Copy have already been reviewed and are still not sufficient, and Hot Copy replication is not an option to meet those offsite SLAs.
Though this model greatly reduces database IO requirements for an appliance, it actually can increase general backup storage IO requirements somewhat. It also has the highest functionally minimum acceptable storage size due to the unique overhead of this mode.
Anytime a Unitrends Database can be relocated to alternate storage to improve performance to allow LEVEL 3, that is recommended before choosing this mode, and it may still be recommended when using LEVEL 2 in scenarios where restore performance is still a customer priority.
LEVEL 3: Advanced Adaptive Deduplication (Deduplication on Arrival and Synthesis)
Deduplication occurs as the data is received, and all block storage is sent directly to the SIS directory for backup types that support deduplication. This method has the lowest storage size requirements, but has the highest system performance requirements as a trade-off. Level 3 will typically sustain 30 days of retention with as little as 1.5X protected capacity, and additional retention only ever requires space to store new unique changed data. The appliance will only ever store 1 copy of each unique block. Using this method may also slightly reduce backup performance and restore performance. The impact on performance is primarily affected by the performance of hardware provided to the Unitrends Database in terms of IO speed, CPU, and Memory. Initial first full backups may typically take 2-4X normal expectations, however due to overhead reductions of deduplication of future backups, UB systems deployed to best practices will typically perform future fulls faster than when not using deduplication.
Customers wishing to use this mode are encouraged to deploy UB systems with equivalent hardware configurations to Unitrends Physical appliances that would be sold for similar data sets, and should follow Deployment Best Practices for Unitrends Backup . Customers deploying systems to protect 5TB or more of data will have the best cost vs performance by ensuring high performance storage is available to the Unitrends Database engine. High performance storage is not required for bulk block storage, in fact, block storage for LEVEL 3 independent of the database has the lowest IO needs of any mode.
This mode is required to use Unitrends Image backup types introduced in 10.3.0 and is required on Hot Copy target systems. It is the intended mode when using Long Term Retention beyond 90 days.
Using a NAS as your Storage Device
Though many NAS devices may internally support reasonable IO performance, network layers between your virtual appliance and your NAS device typically inhibit IO response times dramatically, even on 10G networks, and may not be suitable for running the Unitrends Database Engine for LEVEL 2 or LEVEL 3 use except in the smallest environments. Your Unitrends Onboarding Engineer or a Support Customer Engineer can measure the performance of any intended storage system and advise where the Unitrends Database is constrained, and assist in moving it to alternate storage, or, assist in providing guidance for proper mode of deduplication and other options above that would improve performance. For Backup Storage use, Unitrends Supports NAS devices configured connected to VM hosts that those hypervisor vendors themselves have certified when configured as datastores and when using Unitrends best Practices of attaching virtual disks in storage to the UB. Support for CIFS/NFS passthrough is clarified in this KB: Supported external storage vendors for use with Unitrends Backup appliances
TASKS
Deduplication Settings are accessible from any v9.2 and newer system Unitrends UB Virtual Appliance capable of Inline Deduplication.
From the new HTML5 Administrative User Interface (http://<IP_of_DPU>/ui/):
1) Click on the gears symbol on the top-right of the interface.
2) Click on DEDUPLICATION SETTINGS
3) The Deduplication Settings is presented. Select the Level of deduplication that best meets your needs.
NOTES
Never allow your Storage Device to go offline, into a maintenance mode, or power cycling while the Unitrends Backup is still online where the Unitrends Database or Deduplciated SIS data resides. Ensure proper UPS management including automated staged shutdown of the Unitrends UB, VM hosts, and then storage is followed. Failure to safely shut down in cases where active in-flight data is in transit through the unitrends Database may result in catastrophic data loss.
Anytime storage will be manually rebooted or have maintenance performed, ensure all jobs are stopped on your attached UB and power it down gracefully.
As Unitrends primarily exists to restore data because hardware and power events happen that cause data loss, we are keenly aware the very same impacts can destroy data in the Unitrends System as well. It's not just a fairly basic single server, it in most environments is running the most powerful database engine and largest single data set of any system deployed in the environment it protects, and thus may have even greater risk of data corruption given it's rarely if ever truly idle. For this reason, Unitrends strongly encourages customers to not rely solely on the Unitrends Appliance itself for disaster recovery nor long term retention compliance, but instead implement a minimum of a 3-2-1 backup copy architecture or best practice 4-3-2 copy scheme leveraging cold copies and office hot Copy replication both. Unitrends sells a complete suite of solutions including our own Cloud and DRaaS solutions and can also sell appliances to be configured for the same tasks in private or 3rd party data centers, and partners with many MSPs to offer those services as well.