Package org.opends.server.backends.jeb
On-disk Representation
First it is important to understand JE (Java Edition) terminology. A JE database environment has similarities to a database in the relational database world. Each environment can have multiple databases, which are similar to tables in a relational database. A JE environment is identified by the file system directory in which it is stored. A JE database is identified by a unique name within its environment. Multiple databases in the same environment may be updated within the same transaction, but transactions cannot span environments.
In this description, database means a JE database.
Each instance of this backend creates a single JE environment to store its data. Unlike previous versions of Directory Server, environments are not shared by backend instances. The backend does support multiple base DNs, so it is still possible for data under multiple suffixes to share the same database environment, by declaring those suffixes as base DNs of a single JE backend instance.
The data for a base DN is kept in a set of databases, so that a database contains data for only one base DN. Each database name is prefixed by the base DN it belongs to, where the DN is simplified by preserving only letters and digits.
For example, if you were to use the DbDump utility to list the databases in the environment corresponding to a backend instance containing the base DN dc=example,dc=com, you might see the following:
dc_example_dc_com_cn.equality dc_example_dc_com_cn.presence dc_example_dc_com_cn.substring dc_example_dc_com_dn2id dc_example_dc_com_givenName.equality dc_example_dc_com_givenName.presence dc_example_dc_com_givenName.substring dc_example_dc_com_id2children dc_example_dc_com_id2entry dc_example_dc_com_id2subtree dc_example_dc_com_mail.equality dc_example_dc_com_mail.presence dc_example_dc_com_mail.substring dc_example_dc_com_member.equality dc_example_dc_com_sn.equality dc_example_dc_com_sn.presence dc_example_dc_com_sn.substring dc_example_dc_com_telephoneNumber.equality dc_example_dc_com_telephoneNumber.presence dc_example_dc_com_telephoneNumber.substring dc_example_dc_com_uid.equality
Database Relocation
The data is stored in a format which is independent of system architecture, and is also independent of file system location because it contains no pathnames. The backend, and its backups, can be copied, moved and restored to a different location, within the same system or a different system.
The Entry ID
Each entry to be stored in the backend is assigned a 64-bit integer identifier called the entry ID. The first entry to be created is entry ID 1, the second is entry ID 2, etc. This ensures that the ID for any given entry is always greater than its superiors. The backend takes care to preserve this invariant, in particular during Modify DN operations where an entry can be given a new superior. Clients have come to expect child entries to be returned after their parent in search results, and the backend can ensure this by returning entries in ID order.
On disk, an entry ID is stored in eight bytes in big-endian format (from most significant byte to least significant byte). This enables binary copy of the backend from one system to another, regardless of the system architecture.
Currently, IDs of deleted entries are not reused. The use of a 64-bit integer means it is implausible that the entry ID space will be exhausted.
The entry database (id2entry)
Entries are stored in the id2entry database. The key to the database is the entry ID, and the value is an ASN.1 encoding of the entry contents. The default JE btree key comparator is used for the entry database, such that cursoring through the database will return entries in order of entry ID. When the backend starts it is able to determine the last assigned entry ID by reading the last key value in the entry database.
The format of the entry on disk is described by the following ASN.1.
DatabaseEntry ::= [APPLICATION 0] IMPLICIT SEQUENCE { uncompressedSize INTEGER, -- A zero value means not compressed. dataBytes OCTET STRING -- Optionally compressed encoding of the data bytes. } ID2EntryValue ::= DatabaseEntry -- Where dataBytes contains an encoding of DirectoryServerEntry. DirectoryServerEntry ::= [APPLICATION 1] IMPLICIT SEQUENCE { dn LDAPDN, objectClasses SET OF LDAPString, userAttributes AttributeList, operationalAttributes AttributeList }
Entry compression is optional and can be switched on or off at any time. Switching on entry compression only affects future writes, therefore the database can contain a mixture of compressed and not-compressed records. Either record type can be read regardless of the configuration setting. The compression algorithm is the default ZLIB implementation provided by the Java platform.
The ASN1 types have application tags to allow for future extensions. The types may be extended with additional fields where this makes sense, or additional types may be defined.
The entry count record
Previous versions of Directory Server provide the current number of entries stored in the backend. JE does not maintain database record counts, requiring a full key traversal to count the number of records in a database, which is too time consuming for large numbers of entries.
For this reason the backend maintains its own count of the number of entries in the entry database, storing this count in the special record whose key is entry ID zero.
The DN database (dn2id)
Although each entry's DN is stored in the entry database, we need to be able to retrieve entries by DN. The dn2id database key is the normalized DN and the value is the entry ID corresponding to the DN. A normalized DN is one which may be compared for equality with another using a standard string comparison function. A given DN can have numerous string representations, due to insignificant whitespace, or insignificant case of attribute names, etc., but it has only one normalized form. Use of the normalized form enables efficient key comparison.
A custom btree key comparator is applied to the DN database, which orders the keys such that a given entry DN comes after the DNs of its superiors, and ensures that the DNs below a given base DN are contiguous. This ordering is used to return entries for a non-indexed subtree or single level search. The comparator is just like the default lexicographic comparator except that it compares in reverse byte order.
For example, a cursor iteration through a range of the DN database might look like this:
dc=example,dc=com ou=people,dc=example,dc=com uid=user.1000,ou=people,dc=example,dc=com uid=user.2000,ou=people,dc=example,dc=com uid=user.3000,ou=people,dc=example,dc=com uid=user.4000,ou=people,dc=example,dc=com uid=user.100,ou=people,dc=example,dc=com uid=user.1100,ou=people,dc=example,dc=com uid=user.2100,ou=people,dc=example,dc=com
At first, it may seem strange that user.1100 comes after user.1000 but it becomes clear when considering the values in reverse byte order, since 0011.resu is indeed greater than 0001.resu.
Index Databases
Index databases are used to efficiently process search requests. The system indexes, id2children and id2subtree, are dedicated to processing one-level and subtree search scope respectively. Then there are configurable attribute indexes to process components of a search filter. Each index record maps a key to an Entry ID List.
Entry ID List
An entry ID list is a set of entry IDs, arranged in order of ID. On disk, the list is a concatenation of the 8-byte entry ID values, where the first ID is the lowest. The number of IDs in the list can be obtained by dividing the total number of bytes by eight.
Index Entry Limit
In some cases, the number of entries indexed by a given key is so large that the cost of maintaining the list during entry updates outweighs the benefit of the list during search processing. Each index therefore has a configurable entry limit. Whenever a list reaches the entry limit, it is replaced with a zero length value to indicate that the list is no longer maintained.
Children Index (id2children)
The children index is a system index which maps the ID of any non-leaf entry to entry IDs of the immediate children of the entry. This index is used to get the set of entries within the scope of a one-level search.
Subtree Index (id2subtree)
The subtree index is a system index which maps the ID of any non-leaf entry to entry IDs of all descendants of the entry. This index is used to get the set of entries within the scope of a subtree search.
Attribute Equality Index
An attribute equality index maps the value of an attribute to entry IDs of all entries containing that attribute value. The database key is the attribute value after it has been normalized by the equality matching rule for that attribute. This index is used to get the set of entries matching an equality filter.
Attribute Presence Index
An attribute presence index contains a single record which has entry IDs of all entries containing a value of the attribute. This index is used to get the set of entries matching an attribute presence filter.
Attribute Substring Index
An attribute substring index maps a substring of an attribute value to entry IDs of all entries containing that substring in one or more of its values of the attribute. This index is used to get a set of entries that are candidates for matching a subtring filter.
The length of substrings in the index is configurable. For example, let's say the configured substring length is three, and there is an entry containing the attribute value ABCDE. The ID for this entry would be indexed by the keys ABC BCD CDE DE E. To find entries containing a short substring such as DE, iterate through all keys with prefix DE. To find entries containing a longer substring such as BCDE, read keys BCD and CDE.
Attribute Ordering Index
An attribute ordering index is similar to an equality index in that it maps the value of an attribute to entry IDs of all entries containing that attribute value. However, the values are normalized by the ordering matching rule for the attribute rather than the equality matching rule, and the btree key comparator is set to the ordering matching rule comparator. This index is used to get the set of entries matching inequality filters (less-than-or-equal, greater-than-or-equal).