Resolving duplicate Hot Backup Copy (Replication) source identities

SUMMARY

Provides detailed debugging and troubleshooting information to resolve duplicate source appliance identities.

ISSUE

Unitrends receives the following trap message: “Replication target has multiple sources configured with the same identity”

Or,

User receives the following message when attempting to add a replication source to a target: “Detail: [System] Replication cannot be configured because of a configuration conflict with system Test_Source1. Please contact Unitrends Support for further assistance.”

For more information on Hot Backup Copy (Replication) issues see Unitrends KB 5030 - Hot Backup Copy (Replication) Overview, Setup and Error Troubleshooting

RESOLUTION

First Determine if the systems are valid, licensed systems and not clones:
If the two systems with duplicate identities are both UEBs, ensure that one was not cloned from the other. They each should have been deployed and licensed separately, so each should have a unique asset tag as found in /etc/unitrends-asset.
If the two UEBs have the same asset tag, inform the customer this is not supported and have them redeploy a UEB from the latest image.

Next redeploy the image if possible:
If the newly added replication source is a new system where the loss of the backup data is acceptable to the customer, the quickest solution is to destroy this UEB and redeploy a new UEB from the latest image.

Steps to Repair System:
If the customer is unwilling to redeploy the source system, the source system will need to be given a new identity, and all references to the source system on the target will need to be repaired to reference back to this source system. In addition, if the source system has written archives that need to be kept in active rotation, then those archive media will need to have their embedded system identities changed as well.
The instructions below are required to perform this cleanup:

Before beginning, identify which source systems have archive sets, since that requires a more extensive cleanup. The output of this command will be like this: the name of each system, followed by how many archive sets exist on that system:

=== SYSTEM-NAME-1: 3
=== SYSTEM-NAME-2: 1
=== SYSTEM-NAME-3: 0
=== SYSTEM-NAME-4: 2

On the target:
for s in `psql -Upostgres bpdb -Atc select name from bp.systems where role = 'Replication Source' order by system_id`; do echo -n === ${s}: ; psql -Upostgres bpdb -h $s -Atc select count(*) from bp.archive_sets; done

If there are any non-zero number of sets, first consult with the customer to find out if it's OK to purge the archives. You'll also need the script purge_archive_catalog.php to perform this operation. If purging is not acceptable, other measures must be peformed (more details at the end of this document).

Do the following steps for each source system. Any commands that are
multi-line in this doc can be executed by copy/paste of all the lines at once (you don't have to copy a line at a time or splice them into one line). Also, any commands that that have a comment on the same line can be copy/pasted with the comment, just get the whole line (the comments are included to help you find them when you need them later on the target). When comparing UUIDs, it's usually sufficient to compare the first few characters and the last few of a UUID string.

On the source:
1. Determine if archiving is affected. If either of these queries returns
non-zero, archiving will also require cleanup.

psql -Upostgres bpdb -c select count(*) from bp.archive_sets
psql -Upostgres bpdb -c select count(*) from bp.archive_profiles

2. Stop vcd on the source
pkill -P1 vcd
3. Stop the appliance agent on the source
pkill applianceAgent
4. Then, run this command until nothing is returned (all children gone)
pgrep vcd

5. If archiving is affected.
5a. Check if any archive jobs are currently running.
This command should only return the parent, and no children (there
should just be one row of output, with 1 in the PPID column). If there are children, you can do the next step to see if there are any archive schedules enabled, but don't continue with any other steps until no archive jobs are running. If you're not sure what the output should look like when children are running, substitute fileDedup (or some other process known to have kids) for uarchive to see the difference.

ps faxj | head -1 ; ps faxj | grep uarchive | grep -v grep

5b. See if there are any archive schedules, and if they are enabled:

psql -Upostgres bpdb -c select schedule_id,name,enabled
from bp.schedules where app_id=30

On the systems I cleaned up, there were no archive schedules, all the
archive sets were a result of doing on-demand archives. If there are
schedules, at this point you'd want to disable any that are enabled, so that new jobs don't start during the cleanup. After the cleanup, you'd re-enable them (being sure to not enable any that may have been disabled in the first place). When disabling or enabling schedules, tasker must be signaled to know about the change (/usr/bp/bin/dispatch reset). I can provide more detailed instructions for this step if archive schedules exist in an environment that requires cleanup.

5c. If archive jobs are running, don't continue until they complete.

6. Generate a new uuid and update state.install_id

newuuid=`uuidgen -t`
echo $newuuid # verify this is OK before proceeeding

Update the value in the database:
psql -Upostgres bpdb -c update bp.state set install_id='$newuuid'

Verify it's correct:

SUMMARY

ISSUE

RESOLUTION

Browse this section