Indexes store data that make it possible to quickly retrieve matching entries during a search. Each index record maps an index key to a list of the entry IDs for the entries that match that key. For example, for an equality index for a given attribute, there is a key for each unique value for that attribute, and the values for each key of the IDs of the entries whcih contain that value. For substring indexes, there may be multiple keys for the same attribute value (one for each unique six-character substring within the value), and the same substring may apply to many different values of the same attribute.

As the size of an ID set increases, so does the potential resource cost of accessing the ID set. For an attribute index, ID sets can be stored in one of two ways, each with their own pros and cons for performance:
  • In the regular (non-exploded) case, each index key occurs once, and the value of this key is the entire list of IDs for entries that match the key. The ID set for a non-exploded index key can be retrieved very quickly because only a single database read is required. However, as the size of the ID set grows, the cost of updating it also grows since it is necessary to replace the entire set, which requires larger amounts of disk I/O and may place an increased burden on the database cleaner.
  • In the exploded case, the same key may be stored multiple times (one for each entry that matches that key), with each instance of the key associated with a different entry ID. Updating the ID set for an exploded key is always fast because the writes will be very small, but the cost of reading the ID set increases with the number of IDs.

Composite indexes can also be used. These offer many advantages over attribute indexes when they can be used. Some of these advantages, such as the ability to configure a base DN or combinations of attributes, do not really have any effect on performance with regard to large ID sets. However, they also use a hybrid of the exploded and non-exploded approaches to maintaining the ID set in that a large set may be split into multiple pieces, but each piece may have up to 5000 IDs rather than just one. This means that retrieving a large ID set from a composite index may be thousands of times cheaper than retrieving the same ID set from an exploded attribute index, and updating a large ID set in a composite index should be much cheaper in terms of systems resources than updating the same ID set in a non-exploded attribute index since the write will be significantly smaller.

In environments with performance problems related to very large index ID sets, you may consider the following options as a way to help improve performance:
  • Consider reducing the index entry limit for that index. The index entry limit specifies the maximum number of IDs that an ID set may have for an index key. If a key matches more entries than this limit, the server will stop maintaining the index for that key and attempts to access it will behave as if it were unindexed, while the index will continue to be maintained for keys matching smaller numbers of entries. If you don’t have searches that depend only on those keys, then this is an excellent way of eliminating the cost of maintaining large ID sets. Also note that it doesn’t make any sense to set the index entry limit to a value that is a large percentage of the total number of entries in the server; in such cases, there may not be a significant performance difference between indexed and unindexed search performance, but there would not be a need to maintain the associated large ID sets.
  • If there are cases in which you need a large index entry limit, then consider increasing the limit only for that index rather than increasing the default limit for all indexes in the backend. The index-entry-limit property in the backend applies to all indexes that don’t specify their own limit, but each index also offers an index-entry-limit property that, if set, will override the index entry limit configured for the backend. As such, if you need a higher index entry limit for a particular index, then it is better to set a higher limit just for that one index than to raise the default limit for all indexes in the backend.
  • For attribute indexes with keys matching a large number of entries, consider converting it to a composite index when possible. Composite indexes can completely replace equality and ordering attribute indexes, and they can also support “starts with” substring searches (regardless of whether they also have “contains” or “ends with” components). Composite indexes can’t currently replace approximate match indexes or substring searches that don’t have a “starts with” component.
  • Consider eliminating any unnecessary substring indexes. As previously noted, substring indexes are more likely to have large ID sets than equality, ordering, or approximate match indexes because substring keys are generally smaller and any given substring key may apply to multiple attribute values. It’s also not all that well known that equality attribute indexes (and equality composite indexes) will be used for substring searches with a “starts with” component. As such, a substring index will not be used for substring searches with a “starts with” unless there isn’t an equality index for that attribute. If a substring index is defined for an attribute that isn’t targeted by substring searches, or that is only targeted by substring searches that contain a “starts with” component (regardless of whether it also includes “contains” or “ends with” components), then that substring index is not necessary and can be removed. Access logs can be used to determine the types of searches that clients are performing, and the summarize-access-log and search-logs tools may help with that.
  • If a substring index is needed for a given attribute, then you may consider increasing the substring length for that index. By default, the server will create a separate substring key for each unique six-character substring in an attribute value, and there may be cases in which the same six-character substring appears in several different values. If that is the case, and if this causes substring keys to have large ID sets, then increasing the substring length for that index will reduce the number of values that may share the any given key, and therefore may reduce the number of IDs associated with that key.
  • As a last resort, you may consider tuning the exploded index threshold for an index (the number of entries that an ID set needs to have for a given key before it will transition from a non-exploded set to an exploded one) based on the expected usage for that index. If search performance is more important than update performance for an attribute with large ID sets, then raising the exploded index threshold will help keep the ID set stored as a monolithic block of IDs. On the other hand, if update performance is more important than search performance, then lowering the exploded index threshold may help.