Schema dictionary

The schema dictionary is an often overlooked portion of running a well-managed directory. It can be time-consuming to create from scratch, but you'll get that time investment back.

A schema dictionary does not need to contain all of the attributes that you have defined in the schema. It should only contain those attributes that will get used by applications and other consumers of the directory and whose documentation is beneficial. You don’t need to thoroughly document operational attributes or attributes that are never used.
A schema dictionary should contain a thorough description of the attribute syntax and format in both the directory and in any data sources that feed that data into the directory.

For example, for HRemployeeID you might have something like this:

Directory: Multi-valued, Case-Insensitive Unicode, Min. Length 5, 
Max. Length 90
Peoplesoft: Single-valued, VARCHAR(30)
ContractorDB: Single-valued, NUM(45)

It can save a lot of time if you can get your developers to look at this document when they write their code. Having an attribute that is always a five-digit number can be dangerous if the source definition is VARCHAR. Documenting the real data format prevents developers from using an implied data format and hard coding something that will cause problems later.


Let's say you have an attribute called HRemployeeID that is used as a user's uid. In this example, it has always been a six-digit number. Then you hire your millionth employee. You're faced with what could be a difficult decision: how many of your application entitlement databases are using HRemployeeID as the key field? How many of these have that field defined as being six characters long because IDs are never longer than that? And how many of those have defined that field as an integer because it's always been an integer? Keeping the six character length and moving to alphanumeric will probably break some of your apps, but so will moving to a seven-digit integer.

The schema dictionary provides an easy means of preventing programmers and database administrators from making simple mistakes like this. It's not going to solve all of your issues, but most coders, database administrators, and application architects will read the document.

It's easy for a programmer to make a mistake with implied formats. Recovering from that mistake after it's been in production for a few years is hard. Just making sure app designers are aware that many of your attributes are case-insensitive all by itself will fully return your time investment.

Ownership and change process

Most of the data that's stored in your directory is not owned or managed by you, and you're probably the only person in your organization who knows that.

Adding some verbiage into the schema dictionary as to who the owner is for each attribute and what the change process is for that attribute will save you a lot of time.

This is also an opportunity to keep your directory clean. On a regular basis, you should reach out the data owners for each attribute and have them certify the information that you have recorded about that attribute.

If you can't find an owner for an attribute, you should remove it from your directory (after giving advance notice first to your app owners, especially the ones that might be using that attribute). The default owner of an unowned attribute that remains in your directory is you, which can lead to issues such as governance difficulties and audit compliance if you don't have accurate information about the attribute.

Once a year, you should provide attribute owners with a list of groups and accounts in the directory that have read and write access to their attributes. It's critical that access decisions to potentially sensitive attribute information are approved by the owners of that data and not by an area (the directory support team) that does not own that data.

Attribute metadata

Directory users are constantly getting themselves into trouble by thinking they understand what the attributes stored in your directory mean.

For example, with a streetAddress attribute, you know that a user has at least two business addresses in HR: a physical location and a physical mailing address. And you know that about 5% of the time, those two are different. You also know which of those two got mapped to streetAddress in our directory, but not everyone using the attribute is aware of that.

The schema dictionary is a great place to store information about your attributes that your users need. Initially, you won't have much metadata to worry about, but as identities get more complicated and start coming from a larger and larger number of disparate sources, the volume of metadata (and the importance of having it easily available) will grow.

Metadata can include:

  • A description of what the attribute is
  • Level of assurance
  • Data classification
  • Appropriate usage (for example, streetAddress sourced from Database A was collected under a EULA that prohibits usage for advertising)

To understand more about metadata documentation, you should reading up on Master Data Management (MDM) methodologies. If you've been doing IDM for a while, you'll realize fairly rapidly that IDM is essentially MDM for identity.

Application onboarding

About the only time an application team will feel motivated to answer your questions and provide documentation about their application is while they're waiting for you to approve creation of their service account. While they’re waiting, here is some of the data you will want to collect:

  • Expected SLAs

    You'll want to collect data about how responsive they expect the directory to be. This can be used later as business requirements or justification when you want to request more server resources.

  • Change windows

    Knowing the change windows for your consuming applications can help you identify the best time to perform maintenance. When asking the application areas for this, it should be made clear that staying in their change windows will not be guaranteed and is just best effort.

  • Financial impact

    How much revenue an application generates and how much money is lost when it is not available is very useful information to have when you are building a business case for funding and resources.

    • Revenue generated per year

    • Revenue lost for 1 minute / 10 minute / 30 minute / 1 hour / full day outage.

  • Service account password change process

    Application support teams rotate into and out of their areas all of the time. If you ever get into a situation where you want to mandate a password change after an application has been in place for more than a couple of years you might discover that no one knows how to safely change that password.

  • Contact information

    This can be useful to have documented somewhere if you notice that a specific application is creating issues in the directory (especially if the issues are only moderately impactful and don't rise to the level of an incident).

    You should send an email to all of your application contacts a few times a year so that you can keep the contact list updated.

Application profiling

PingDirectory provides the ability to filter the data written to access logs based on connection criteria (see the appendix for a configuration example). This provides us with a mechanism to create a custom access log that only contains operations performed by specific accounts.

This can be used with new or existing applications to collect data about how the application behaves. The bin/summarize-access-log utility can be used against this application-specific log file to generate an overview of the types of searches that are run by the application, index utilization, errors, and an operation response time histogram.

Having this information documented provides a useful comparison if the application begins to experience production issues and can greatly simplify troubleshooting.

Schema modification

The ability to extend the schema should not be delegated to any groups outside of the directory administrators.

If an application area needs to extend the schema they will need to document their requirements. Their documentation should include:

  • Attribute names
  • objectclass definitions
  • Single or multi-valued
  • Attribute syntaxes
  • Attribute indexing requirements
  • Who owns the data in the attribute
  • How the data is stored in the system of record (if the directory will not be the system of record)

The application area might need some help and guidance to answer these questions.

Before extending the schema, the data owners associated with this new data should be contacted and their approval granted and documented.

JSON attributes

If an application's data storage requirements cannot easily be met by a defined, structured schema (for example, largely non-homogeneous data), consider creating application specific JSON attributes for the application.

Because JSON attributes can contain any JSON-formatted set of data, they're an ideal candidate for storing complex data and relationships that cannot be easily defined or stored in a traditional hierarchical data model.

Because there could be little enforcement as to what gets populated, care should be taken with JSON attributes, and their usage should be reviewed on an annual basis.

Optionally, you can place restrictions on JSON attributes to restrict things such as allowable keys, must and may keys, and syntaxes. See the PingDirectory Administration Guide for details on how to implement JSON restrictions and JSON key indexing.