Troubleshooting replication
This section covers information on troubleshooting your replication deployment.
When troubleshooting, check the log file associated with the subcommand that is producing the error. Learn more about Replication subcommand logs.
For replication issues related to certificate trust, see Repairing broken listener certificate trust in replication. |
Discovering obsolete replicas
About this task
To avoid entering lockdown mode when upgrading servers in a replicated topology, before upgrading, check the replicationChanges
database for any obsolete replicas. To perform this check, run the check-replication-domains
tool, which scans changelogDb
for all known replication domains and identifies any obsolete replicas still listed as part of a topology.
As of PingDirectory version 10.2, the risk of lockdown due to obsolete replicas is minimal. If you have already upgraded all servers in a replicated topology to version 10.2 or later, these steps are optional. |
Steps
-
Run
check-replication-domains --serverRoot <serverRootDirectory>
.You can use the
--serverRoot
argument to specify the root directory where the server containing the replication data is installed. If you don’t supply this argument,check-replication-domains
uses the default value of the server where you run the tool. -
Review the output for any replica IDs listed as
Obsolete
.Example:
The following is an example output from the
check-replication-domains
tool:Server topo-1 [pinguser@topo-1 ~]$ PingDirectory/bin/check-replication-domains --serverRoot PingDirectory/ SERVER DOMAIN DN ID ---------------- ------------------------ ------ topo-1 cn=schema 20693 (local) topo-1 dc=example,dc=com 23135 (local) topo-2 cn=schema 8371 topo-2 dc=example,dc=com 19233 Server topo-2 [pinguser@topo-2 ~]$ PingDirectory/bin/check-replication-domains --serverRoot PingDirectory/ SERVER DOMAIN DN ID ---------------- ------------------------ ------ <unknown> dc=example,dc=com 7403 OBSOLETE <unknown> dc=example,dc=com 7406 DELETED topo-1 cn=schema 20693 topo-1 dc=example,dc=com 23135 topo-2 cn=schema 8371 (local) topo-2 dc=example,dc=com 19233 (local)
Any replica marked
DELETED
has been deleted from the topology but is not yet obsolete.
Next steps
If you identified any obsolete replicas, purge the obsolete replicas.
Recovering a replica with missed changes
If a server has been offline for a period of time longer than the replication purge delay, you must run the dsreplication initialize
command to bring the replica into sync with the topology.
Any missed changes are detected at the time of server startup. A missed change is a change that the replica detects it needs, but the change is not found within any other replication server’s replicationChanges backend stored in the /changelogDb
server root path.
If missed changes are detected, the server enters lockdown mode, where only privileged clients can make requests. Any other server that is not missing changes can be used as a source for dsreplication initialize
.
If the server requires a manual backup and restore, perform the following steps, which are equivalent to dsreplication initialize
.
Performing a manual initialization
About this task
The PingDirectory server provides the tools necessary for backing up and restoring backends, which can be used to manually initialize a replica.
As detailed in the following procedure, you use <server-root>/bin/backup
to create a backup of the backend containing the replicated base DN. If encryption is enabled for the backend containing the replicated base DN, then you must also make a backup of the encryption-settings
backend.
When initializing a server that has been offline longer than the replication-purge-delay
, you must also make backups of the replicationChanges
and schema
backends.
You then need to transfer all backup files to the target server(s) and restore them individually using <server-root>/bin/restore
.
To preserve existing encryption settings, |
To manually initialize a server when an online initialization isn’t possible:
Steps
-
From another server in the replication topology, back up the
userRoot
,schema
,changelog
, andreplicationChanges
backends to the<server-root>/bak
directory.If data encryption is enabled, export the
encryption-settings
backend, because you might need to import one or more encryption settings IDs into the new replica.Example:
$ <source-server-root>/bin/backup --backendID userRoot --backupDirectory \ bak/userRoot $ <source-server-root>/bin/backup --backendID schema --backupDirectory \ bak/schema $ <source-server-root>/bin/backup --backendID changelog --backupDirectory \ bak/changelog $ <source-server-root>/bin/backup --backendID replicationChanges \ --backupDirectory bak/replicationChanges $ <source-server-root>/bin/encryption-settings export --id <id> \ --output-file bak/exported-key
-
Copy the
bak
directory to the new replica.Example:
$ scp -r <source-server-root>/bak/* \ <user>@<destination-server>:<destination-server-root>/bak
-
Stop the server.
-
Restore the
userRoot
,schema
,changelog
, andreplicationChanges
backends.If the
encryption-settings
backend was exported, import it before restoring any of the backends.Example:
$ <destination-server-root>/bin/encryption-settings import --input-file \ bak/exported-key --set-preferred Enter the PIN used to encrypt the definition: $ <destination-server-root>/bin/restore --backupDirectory bak/userRoot $ <destination-server-root>/bin/restore --backupDirectory bak/schema $ <destination-server-root>/bin/restore --backupDirectory \ bak/changelog $ <destination-server-root>/bin/restore --backupDirectory \ bak/replicationChanges
-
Start the server using
bin/start-server
.
Fixing replication conflicts
Replication conflicts occur when an incompatible change to an entry is made on two replicas at the same time. The change processes on one replica and then replicates to the other replica, which causes the conflict. While most conflicts resolve automatically, some require manual action.
To fix replication conflicts, initialize the replica containing the conflicts with the data from another replica that does not have conflicts. If the database is large and the number of conflicts is small, and the command includes the Replication Repair Control specified by OID value 1.3.6.1.4.1.30221.1.5.2
, run ldapmodify
against the server with the conflict. The Replication Repair Control prevents the change from replicating and enables changing operational attribute values, which are not normally writable.
The following tasks use the Replication Repair Control to fix replication conflicts and apply change only to the server with the conflict. There are two examples provided to fix replication conflicts: one for fixing a modify conflict using the ldap-diff
tool and the other for fixing a naming conflict.
Fixing a modify conflict
Steps
-
To isolate conflicting entries between two replicas, use the
bin/ldap-diff
tool.Replace the
sourceHost
value with the server that needs the adjustment.Example:
The following example uses the tool to search across the entire base distinguish name (DN) for any difference in user attributes and reports the difference in
difference.ldif
.$ bin/ldap-diff --sourceHost austin02.exmple.com --sourcePort 1389 \ --sourceBindDN "cn=Directory Manager" --sourceBindPassword pass \ --targetHost austin01.example.com --targetPort 1389 \ --targetBindDN "cn=Directory Manager" --targetBindPassword --baseDN "dc=example,dc=com" --outputLDIF difference.ldif \ --searchFilter "(objectclass=*)" --numPasses 3 "*" pass \ "^userPassword"
-
To apply changes to the server that contains conflicts, use the
difference.ldif
file in a format compatible withldapmodify
.Run
ldap-diff
command with thesourceHost
value as the server with conflicts.Example:
The following is an example of the contents of
difference.ldif
file.dn: uid=user.1,ou=people,dc=example,dc=com changetype: modify add: mobile mobile: +1 568 232 6789 - delete: mobile mobile: +1 568 591 7372 -
-
To correct the entries on the sole server with conflicts, run
bin/ldapmodify
.Example:
$ bin/ldapmodify --bindPassword password -J "1.3.6.1.4.1.30221.1.5.2" \ --filename difference.ldif
Fixing a naming conflict
About this task
In this example, a naming conflict was encountered when the replica attempted to replay an ADD
of uid=user.200,ou=people,dc=example,dc=com
. Because of this conflict, the server returns a replication conflict message. See the following example message.
[18/Feb/2010:14:53:12 -0600] category=EXTENSIONS severity=SEVERE_ERROR msgID=1880359005 msg="Administrative alert type=replication-unresolved-conflict id=bbd2cbaf-90a4-42af-94a8-c1a42df32fc6 class=com.unboundid.directory.server.replication.plugin.ReplicationDomain msg='An unresolved conflict was detected for DN uid=user.200,ou=People,dc=example,dc=com. The conflicting entry has been renamed to entryuuid=69807e3d-ab27-43a3-8759-ec0d8d6b3107+uid=user.200,ou=People,dc=example,dc=com'"
The PingDirectory server prepends the entryUUID to the DN of the conflicting attribute and adds a ds-sync-conflict-entry
auxiliary object class to the entry to aid in search.
To resolve the conflict:
Steps
-
Search for any entry that has the
ds-sync-conflict-entry
objectclass and returns only the DNs that match the filter.Example:
$ bin/ldapsearch --baseDN dc=example,dc=com --searchScope sub \ "(objectclass=ds-sync-conflict-entry)" "1.1"
Result:
The search results display the conflicting entry for uid=user.200.
dn: entryuuid=69807e3d-ab27-43a3-8759-ec0d8d6b3107+uid=user.200,ou=People,dc=example,dc=com dn: entryuuid=523c430e-a870-4ebe-90f8-9cd811946420+uid=user.200,ou=People,dc=example,dc=com
Conflict entries are not returned unless the
objectclass=ds-sync-conflict-entry
is present in the search filter. -
Compare the conflict entry with the target entry.
-
Apply the difference in two ways:
Choose from:
-
Use the
ldapmodify
tool with the Replication Repair Control.You can also delete the conflict entry using this command.
-
Run
bin/ldapmodify
with the Replication Repair Control to make the fix.When making changes using the Replication Repair Control, the updates are not propagated through replication. Examine each replica individually, and apply the necessary modifications using the request control.
Example:
$ bin/ldapmodify -J "1.3.6.1.4.1.30221.1.5.2" \ --filename difference.ldif
-
Fixing mismatched generation IDs
About this task
If you receive a warning message that multiple generation IDs were detected for a specific suffix, you must re-initialize one or more replicas. If the warning is presented from a server after an initialization, it could be that the post-external-initialization
command was not run as part of a global change in data.
Try the following fixes as needed.
Steps
-
To re-initialize replicas as part of a global change in data, run the
post-external-initialization
command. -
To fix mismatchd generation IDs, run the
dsreplication
command. -
To warn when any generation IDs are different across the topology, run the
dsreplication
tool with thestatus
command.