Indexes store data that make it possible to quickly retrieve matching entries during a search. As the size of an ID set increases, so does the potential resource cost of accessing the ID set.
Each index record maps an index key to a list of the entry IDs for the entries that match that key. If you have an equality index for a given attribute, there is a key for each unique value for that attribute and the values for each key of the IDs of the entries that contain that value. For substring indexes, there can be multiple keys for the same attribute value (one for each unique six-character substring within the value), and the same substring can apply to many different values of the same attribute.
For an attribute index, you can store ID sets in one of two ways that each affect performance differently:
- In the regular (non-exploded) case, each index key occurs once, and the value of this key is the entire list of IDs for entries that match the key. The ID set for a non-exploded index key can be retrieved quickly because only a single database read is required. However, as the size of the ID set grows, the cost of updating it grows because it is necessary to replace the entire set, which requires larger amounts of disk I/O and can place an increased burden on the database cleaner.
- In the exploded case, the same key can be stored multiple times (one for each entry that matches that key), with each instance of the key associated with a different entry ID. Updating the ID set for an exploded key is always fast because the writes are small, but the cost of reading the ID set increases with the number of IDs.
You can also use composite indexes. These offer many advantages over attribute indexes and when they can be used. Some of these advantages, such as the ability to configure a base DN or combinations of attributes, do not have any effect on performance with regard to large ID sets. They use a hybrid of the exploded and non-exploded approaches to maintaining the ID set, such that a large set can be split into multiple pieces, but each piece can have up to 5000 IDs rather than just one. This means that retrieving a large ID set from a composite index can be thousands of times cheaper than retrieving the same ID set from an exploded attribute index. Updating a large ID set in a composite index should be cheaper in terms of systems resources than updating the same ID set in a non-exploded attribute index because the write is much smaller.
In environments with performance problems related to very large index ID sets, you might consider the following options as a way to help improve performance:
- Consider reducing the index entry limit for that index. The index entry limit
specifies the maximum number of IDs that an ID set can have for an index key.
- If a key matches more entries than this limit, the server stops maintaining the index for that key, and attempts to access it behave as if it were unindexed, while the index continues to be maintained for keys matching smaller numbers of entries. If you don’t have searches that depend only on those keys, then this is an excellent way of eliminating the cost of maintaining large ID sets. It isn't logical to set the index entry limit to a value that is a large percentage of the total number of entries in the server. In such cases, there might not be a significant performance difference between indexed and unindexed search performance, but there would be no need to maintain the associated large ID sets.
- If there are cases in which you need a large index entry limit, then consider
increasing the limit only for that index, rather than increasing the default limit
for all indexes in the backend.
- The
index-entry-limit
property in the backend applies to all indexes that don’t specify their own limit, but each index also offers anindex-entry-limit
property that, if set, overrides the index entry limit configured for the backend. As such, if you need a higher index entry limit for a particular index, set a higher limit just for that one index instead of raising the default limit for all indexes in the backend.
- The
- For attribute indexes with keys matching a large number of entries, consider
converting it to a composite index when possible.
- Composite indexes can completely replace equality and ordering attribute indexes, and they can support “starts with” substring searches, regardless of whether they have “contains” or “ends with” components. Composite indexes can’t currently replace approximate match indexes or substring searches that don’t have a “starts with” component.
- Consider eliminating any unnecessary substring indexes.
- As previously noted, substring indexes are more likely to have large ID sets than equality, ordering, or approximate match indexes because substring keys are generally smaller and any given substring key can apply to multiple attribute values. It’s also not commonly understood that equality attribute indexes and equality composite indexes are used for substring searches with a “starts with” component. As such, a substring index is not used for substring searches with a “starts with” unless there isn’t an equality index for that attribute.
- If a substring index is defined for an attribute that isn’t targeted by
substring searches, or that is only targeted by substring searches that
contain a “starts with” component (regardless of whether it also includes
“contains” or “ends with” components), then that substring index is not
necessary and can be removed. You can use access logs to determine the types
of searches that clients are performing, and the
summarize-access-log
andsearch-logs
tools might help with that.
- If a substring index is needed for a given attribute, then consider increasing the
substring length for that index.
- By default, the server creates a separate substring key for each unique six-character substring in an attribute value, and there might be cases in which the same six-character substring appears in several different values. If that occurs and causes substring keys to have large ID sets, then increasing the substring length for that index reduces the number of values that might share the any given key and can reduce the number of IDs associated with that key.
- As a last resort, consider tuning the exploded index threshold for an index (the
number of entries that an ID set needs to have for a given key before it will
transition from a non-exploded set to an exploded one) based on the expected usage
for that index.
- If search performance is more important than update performance for an attribute with large ID sets, then raising the exploded index threshold helps keep the ID set stored as a monolithic block of IDs. On the other hand, if update performance is more important than search performance, then lowering the exploded index threshold might help.