The transform-ldif tool reads data from one or more source LDIF files and writes the transformed data to a single output file.

Using this tool to scramble data lets you:

  • Obscure the values of certain attributes so that it's difficult to determine the original values in the source data.
  • Preserve the characteristics of the associated attribute syntax.

This process is repeatable, so if the same value appears multiple times, it yields the same scrambled representation each time. You can apply scrambling to both LDIF entries and LDIF change records.

The process of scrambling data is not the same as encryption. Only use scrambling to provide simple obfuscation of data. The following are general guidelines for scrambling attributes:

  • If the attribute is userPassword and its value starts with a schema name surrounded by curly braces, such as {SSHA256}XrgyNdl3fid7KYdhd/Ju47KJQ5PYZqlUlyzxQ28f/QXUnNd9fupj9g==, the schema name is left unchanged and the rest of the value is treated like a generic string.
  • If the attribute is authPassword and its value contains at least two dollar signs, such as SHA256$QGbHtDCi1i4=$8/X7XRGaFCovC5mn7ATPDYlkVoocDD06Zy3lbD4AoO4=, the portion up to the first dollar sign (which represents the name of the encoding schema) is preserved and the remainder of the value is treated like a generic string.
  • If an attribute has a Boolean syntax, the scrambled value will be either TRUE or FALSE. Scrambling Boolean values is not repeatable because the determination to use a value of TRUE or FALSE is random. By randomizing the scrambling for Boolean values, the syntax and obfuscation of the original value is preserved.
  • If an attribute has a distinguished name syntax (or a related syntax, such as a name and optional UID), scrambling is applied to the values of relative distinguished name (RDN) components for any attributes to be scrambled.

    For example, if you configure the tool to scramble both the member and uid attributes, and an entry has a member attribute with a value of uid=john.doe,ou=People,dc=example,dc=com, that member value is scrambled in a way that only obscures the john.doe portion but leaves the attribute names and all values of non-scrambled attributes intact.

  • If an attribute has a generalized time syntax, that value is replaced with a randomized timestamp using the same format (the same number of digits and the same time zone indicator). The randomization is over a time range that is double the difference between the time the transform-ldif tool launches and the timestamp is scrambled. For values where that time difference is less than one day, one day is added to the difference before it is doubled.
  • If an attribute has an integer, numeric string, or telephone number syntax, scrambling is only applied to numeric digits while all other characters are left intact. If there are multiple digits, then the first digit is nonzero.
  • If an attribute has an octet string syntax, it's scrambled as follows:
    • Each byte that represents a lowercase ASCII letter is replaced with a randomly-selected lowercase ASCII letter.
    • Each byte that represents a uppercase ASCII letter is replaced with a randomly-selected uppercase ASCII letter.
    • Each byte that represents an ASCII digit is replaced with a randomly-selected ASCII digit.
    • Each byte that represents a printable ASCII symbol is replaced with a randomly-selected printable ASCII symbol.
    • Each byte that represents an ASCII control character is replaced with a randomly-selected ASCII letter, digit, or symbol.
    • Each non-ASCII byte will be replaced with a randomly-selected non-ASCII byte.
  • If an attribute has a value that represents a valid JSON object, the resulting value is also a JSON object. All field names are left intact, and only the values of those fields can be scrambled. If the --scrambleJSONField argument is provided, only the specified fields have values scrambled. Otherwise, the values of all fields are scrambled. Field values are scrambled as follows:
    • Null values aren't scrambled.
    • Boolean values are replaced with randomly-selected Boolean values. As with attributes with a Boolean syntax, these values are non-repeatable.
    • Number values have only their digits replaced with randomly-selected digits and all other characters (minus sign, decimal point, exponentiation indicator) are left unchanged.
    • String values are replaced with a randomly-selected generic string.
    • Array values have scrambling applied as appropriate for each value in the array. If the array field itself should be scrambled, then all values in the array are scrambled. Otherwise, only JSON objects contained inside the array have scrambling applied to appropriate fields.
    • JSON values have scrambling applied as appropriate for their fields.
  • If an attribute does not match any of the previous criteria, it is scrambled as follows:
    • Each lowercase ASCII letter is replaced with a randomly-selected lowercase ASCII letter.
    • Each uppercase ASCII letter replaced with a randomly-selected uppercase ASCII letter.
    • Each ASCII digit is replaced with a randomly-selected ASCII digit.
    • All other characters are left unchanged.

The following example reads from an LDIF file named original.ldif, scrambles the values of the telephoneNumber, mobile, and homeTelephoneNumber attributes, and writes the results to scrambled.ldif.

$ bin/transform-ldif --sourceLDIF original.ldif \
  --targetLDIF scrambled.ldif \
  --scrambleAttribute telephoneNumber \
  --scrambleAttribute mobile \
  --scrambleAttribute homeTelephoneNumber \
  --randomSeed 0