@PublicAPI(stability=PRIVATE)

Package org.opends.server.backends.jeb

Contains the code for the Directory Server backend that uses the Berkeley DB Java Edition as the repository for storing entry and index information.

On-disk Representation

First it is important to understand JE (Java Edition) terminology. A JE database environment has similarities to a database in the relational database world. Each environment can have multiple databases, which are similar to tables in a relational database. A JE environment is identified by the file system directory in which it is stored. A JE database is identified by a unique name within its environment. Multiple databases in the same environment may be updated within the same transaction, but transactions cannot span environments.

In this description, database means a JE database.

Each instance of this backend creates a single JE environment to store its data. Unlike previous versions of Directory Server, environments are not shared by backend instances. The backend does support multiple base DNs, so it is still possible for data under multiple suffixes to share the same database environment, by declaring those suffixes as base DNs of a single JE backend instance.

The data for a base DN is kept in a set of databases, so that a database contains data for only one base DN. Each database name is prefixed by the base DN it belongs to, where the DN is simplified by preserving only letters and digits.

For example, if you were to use the DbDump utility to list the databases in the environment corresponding to a backend instance containing the base DN dc=example,dc=com, you might see the following:

 dc_example_dc_com_cn.equality
 dc_example_dc_com_cn.presence
 dc_example_dc_com_cn.substring
 dc_example_dc_com_dn2id
 dc_example_dc_com_givenName.equality
 dc_example_dc_com_givenName.presence
 dc_example_dc_com_givenName.substring
 dc_example_dc_com_id2children
 dc_example_dc_com_id2entry
 dc_example_dc_com_id2subtree
 dc_example_dc_com_mail.equality
 dc_example_dc_com_mail.presence
 dc_example_dc_com_mail.substring
 dc_example_dc_com_member.equality
 dc_example_dc_com_sn.equality
 dc_example_dc_com_sn.presence
 dc_example_dc_com_sn.substring
 dc_example_dc_com_telephoneNumber.equality
 dc_example_dc_com_telephoneNumber.presence
 dc_example_dc_com_telephoneNumber.substring
 dc_example_dc_com_uid.equality
 

Database Relocation

The data is stored in a format which is independent of system architecture, and is also independent of file system location because it contains no pathnames. The backend, and its backups, can be copied, moved and restored to a different location, within the same system or a different system.

The Entry ID

Each entry to be stored in the backend is assigned a 64-bit integer identifier called the entry ID. The first entry to be created is entry ID 1, the second is entry ID 2, etc. This ensures that the ID for any given entry is always greater than its superiors. The backend takes care to preserve this invariant, in particular during Modify DN operations where an entry can be given a new superior. Clients have come to expect child entries to be returned after their parent in search results, and the backend can ensure this by returning entries in ID order.

On disk, an entry ID is stored in eight bytes in big-endian format (from most significant byte to least significant byte). This enables binary copy of the backend from one system to another, regardless of the system architecture.

Currently, IDs of deleted entries are not reused. The use of a 64-bit integer means it is implausible that the entry ID space will be exhausted.

The entry database (id2entry)

Entries are stored in the id2entry database. The key to the database is the entry ID, and the value is an ASN.1 encoding of the entry contents. The default JE btree key comparator is used for the entry database, such that cursoring through the database will return entries in order of entry ID. When the backend starts it is able to determine the last assigned entry ID by reading the last key value in the entry database.

The format of the entry on disk is described by the following ASN.1.

 DatabaseEntry ::= [APPLICATION 0] IMPLICIT SEQUENCE {
  uncompressedSize        INTEGER,      -- A zero value means not compressed.
  dataBytes               OCTET STRING  -- Optionally compressed encoding of
                                           the data bytes.
 }

 ID2EntryValue ::= DatabaseEntry
  -- Where dataBytes contains an encoding of DirectoryServerEntry.

 DirectoryServerEntry ::= [APPLICATION 1] IMPLICIT SEQUENCE {
  dn                      LDAPDN,
  objectClasses           SET OF LDAPString,
  userAttributes          AttributeList,
  operationalAttributes   AttributeList
 }
 

Entry compression is optional and can be switched on or off at any time. Switching on entry compression only affects future writes, therefore the database can contain a mixture of compressed and not-compressed records. Either record type can be read regardless of the configuration setting. The compression algorithm is the default ZLIB implementation provided by the Java platform.

The ASN1 types have application tags to allow for future extensions. The types may be extended with additional fields where this makes sense, or additional types may be defined.

The entry count record

Previous versions of Directory Server provide the current number of entries stored in the backend. JE does not maintain database record counts, requiring a full key traversal to count the number of records in a database, which is too time consuming for large numbers of entries.

For this reason the backend maintains its own count of the number of entries in the entry database, storing this count in the special record whose key is entry ID zero.

The DN database (dn2id)

Although each entry's DN is stored in the entry database, we need to be able to retrieve entries by DN. The dn2id database key is the normalized DN and the value is the entry ID corresponding to the DN. A normalized DN is one which may be compared for equality with another using a standard string comparison function. A given DN can have numerous string representations, due to insignificant whitespace, or insignificant case of attribute names, etc., but it has only one normalized form. Use of the normalized form enables efficient key comparison.

A custom btree key comparator is applied to the DN database, which orders the keys such that a given entry DN comes after the DNs of its superiors, and ensures that the DNs below a given base DN are contiguous. This ordering is used to return entries for a non-indexed subtree or single level search. The comparator is just like the default lexicographic comparator except that it compares in reverse byte order.

For example, a cursor iteration through a range of the DN database might look like this:

 dc=example,dc=com
 ou=people,dc=example,dc=com
 uid=user.1000,ou=people,dc=example,dc=com
 uid=user.2000,ou=people,dc=example,dc=com
 uid=user.3000,ou=people,dc=example,dc=com
 uid=user.4000,ou=people,dc=example,dc=com
 uid=user.100,ou=people,dc=example,dc=com
 uid=user.1100,ou=people,dc=example,dc=com
 uid=user.2100,ou=people,dc=example,dc=com
 

At first, it may seem strange that user.1100 comes after user.1000 but it becomes clear when considering the values in reverse byte order, since 0011.resu is indeed greater than 0001.resu.

Index Databases

Index databases are used to efficiently process search requests. The system indexes, id2children and id2subtree, are dedicated to processing one-level and subtree search scope respectively. Then there are configurable attribute indexes to process components of a search filter. Each index record maps a key to an Entry ID List.

Entry ID List

An entry ID list is a set of entry IDs, arranged in order of ID. On disk, the list is a concatenation of the 8-byte entry ID values, where the first ID is the lowest. The number of IDs in the list can be obtained by dividing the total number of bytes by eight.

Index Entry Limit

In some cases, the number of entries indexed by a given key is so large that the cost of maintaining the list during entry updates outweighs the benefit of the list during search processing. Each index therefore has a configurable entry limit. Whenever a list reaches the entry limit, it is replaced with a zero length value to indicate that the list is no longer maintained.

Children Index (id2children)

The children index is a system index which maps the ID of any non-leaf entry to entry IDs of the immediate children of the entry. This index is used to get the set of entries within the scope of a one-level search.

Subtree Index (id2subtree)

The subtree index is a system index which maps the ID of any non-leaf entry to entry IDs of all descendants of the entry. This index is used to get the set of entries within the scope of a subtree search.

Attribute Equality Index

An attribute equality index maps the value of an attribute to entry IDs of all entries containing that attribute value. The database key is the attribute value after it has been normalized by the equality matching rule for that attribute. This index is used to get the set of entries matching an equality filter.

Attribute Presence Index

An attribute presence index contains a single record which has entry IDs of all entries containing a value of the attribute. This index is used to get the set of entries matching an attribute presence filter.

Attribute Substring Index

An attribute substring index maps a substring of an attribute value to entry IDs of all entries containing that substring in one or more of its values of the attribute. This index is used to get a set of entries that are candidates for matching a subtring filter.

The length of substrings in the index is configurable. For example, let's say the configured substring length is three, and there is an entry containing the attribute value ABCDE. The ID for this entry would be indexed by the keys ABC BCD CDE DE E. To find entries containing a short substring such as DE, iterate through all keys with prefix DE. To find entries containing a longer substring such as BCDE, read keys BCD and CDE.

Attribute Ordering Index

An attribute ordering index is similar to an equality index in that it maps the value of an attribute to entry IDs of all entries containing that attribute value. However, the values are normalized by the ordering matching rule for the attribute rather than the equality matching rule, and the btree key comparator is set to the ordering matching rule comparator. This index is used to get the set of entries matching inequality filters (less-than-or-equal, greater-than-or-equal).