Import bulk data

The bulk import service lets you import large numbers of external entries over REST. You import entries from a comma-separated values (CSV) file to a specified managed object type in the IDM repository. Bulk import works as follows:

Loads bulk CSV entries and stores them temporarily (in the IDM repository) as JSON objects
Creates a temporary mapping between those entries and the managed object store in the repository
Performs a reconciliation between the JSON objects and the objects in the repository

The bulk import service assumes the CSV file is the authoritative data source. If you run an import more than once, the import overwrites all of the properties of the managed object (including timestamps) with the values in the CSV file.

The bulk import service assumes a singular type. If you submit an array of type attributes, the service sets the type as the last element of the array.

To import bulk CSV entries into the repository using the REST API, follow these steps:

Generate a CSV template

The first time you upload entries, you must generate a CSV template. The template is essentially an empty CSV file with one header row that matches the managed object type to which you are importing. In most cases, you will be importing data that fits the managed/user object model, but you can import any managed object type, such as roles and assignments.

To generate the CSV template, send a GET request to the openidm/csv/template endpoint. The following request generates a CSV template for the managed user object type:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request GET \
"http://localhost:8080/openidm/csv/template?resourceCollection=managed/user&_fields=header&_mimeType='text/plain'"
{
  "_id": "template",
  "header": "\"userName\",\"givenName\",\"sn\",\"mail\",\"description\",\"accountStatus\",\"telephoneNumber\",
 \"postalAddress\",\"city\",\"postalCode\",\"country\",\"stateProvince\",\"preferences/updates\",
 \"preferences/marketing\""
}

The template is generated based on the specified resourceCollection, and includes a single header row. The names of each header column are derived from the schema of the managed object type. The template includes only a subset of managed user properties that can be represented by CSV fields.

Only the following managed object properties are included in the header row:

Properties of type string, boolean, and number
Properties that do not start with an underscore (such as _id or _rev)

If you are importing entries to managed/user, the bulk import facility assumes that self-service password reset is enabled. This is because the import does not support upload of hashed passwords.
Properties whose scope is not private

Set the parameters _fields=header and _mimeType=text/csv to download the template as a CSV file.

When you have generated the template, export your external data to CSV format, using the headers in the generated template.

Upload a CSV file

You can use the bulk import service with a CSV file up to 50MBytes large and less than 100,000 records. If you need to import a larger file or more records, divide your data into chunks and import each file separately.

You can increase the maximum file size by changing the value of the maxRequestSizeInMegabytes property in your conf/servletfilter-upload.json file.

You need to use a CSV template to perform a bulk import. For more information, refer to Generate a CSV template.

After formatting your CSV file to match your template’s structure, upload the file to the IDM repository with the following request:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--form upload=@/path/to/example-users.csv \
--request POST \
"http://localhost:8080/upload/csv/managed/user?uniqueProperty=userName"
{
  "importUUIDs": [
    "3ebd514f-bdd7-491f-928f-21b72f44e381"
  ]
}

--form (-F)

This option causes curl to POST data using the Content-Type multipart/form-data, which lets you upload binary files. To indicate that the form content is a file, prefix the file name with an @ sign.

To import more than one file at once, specify multiple --form options, for example:

--form upload=@/path/to/example-users-a-j.csv \
--form upload=@/path/to/example-users-k-z.csv \

uniqueProperty (required)

This parameter lets you correlate existing entries, based on a unique value field. This is useful if you need to upload the same file a number of times (for example, if data in the file changes, or if some entries in the file contained errors). You can specify any unique value property here. You can also correlate on more than one property by specifying multiple, comma-delimited unique properties.

A successful upload generates an array of importUUIDs. You need these UUIDs to perform other operations on the import records.

Note that the endpoint (upload/csv) is not an IDM endpoint.

Query bulk imports

A query on the csv/metadata endpoint returns the import ID, the data structure (header fields in the CSV file), a recon ID, and a number of fields indicating the status of the import:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request GET \
"http://localhost:8080/openidm/csv/metadata/?_queryFilter=true"
{
  "result": [
    {
      "_id": "3ebd514f-bdd7-491f-928f-21b72f44e381",
      "_rev": "000000003e8ef4f7",
      "header": [
        "userName",
        "givenName",
        "sn",
        "mail",
        "description",
        "accountStatus",
        "country"
      ],
      "reconId": "2e2cf41a-c4b8-4dda-9d92-6e0af65a15fe-6528",
      "filename": "example-users.csv",
      "resourcePath": "managed/user",
      "total": 1000,
      "success": 1000,
      "failure": 0,
      "created": 1000,
      "updated": 0,
      "unchanged": 0,
      "begin": "2020-04-17T16:31:02.955Z",
      "end": "2020-04-17T16:31:09.861Z",
      "cancelled": false,
      "importDeleted": false,
      "tempRecords": 0,
      "purgedTempRecords": true,
      "purgedErrorRecords": false,
      "authId": "openidm-admin",
      "authzComponent": "internal/user"
    },
    {
      "_rev": "00000000d4392fc8"
    }
  ],
  ...
}

Query imports to a specific object type

Use a query filter to restrict your query to imports to a specific managed object type. The following example queries uploads to the managed user object:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request GET \
'http://localhost:8080/openidm/csv/metadata/?_queryFilter=/resourcePath+eq+"managed/user"'
{
  "result": [
    {
      "_id": "82d9a643-8b03-4cec-86fc-3e09c4c2f01c",
      "_rev": "000000009b3ff60b",
      "header": [
        "userName",
        "givenName",
        "sn",
        "mail",
        "description",
        "accountStatus",
        "country"
      ],
      "reconId": "417dae3b-c939-4191-acbf-6eb1b9e802af-53335",
      "filename": "example-users.csv",
      "resourcePath": "managed/user",
      "total": 1001,
      "success": 1000,
      "failure": 1,
      "created": 0,
      "updated": 0,
      "unchanged": 1000,
      "begin": "2020-04-20T13:12:03.028Z",
      "end": "2020-04-20T13:12:05.222Z",
      "cancelled": false,
      "importDeleted": false,
      "tempRecords": 0,
      "purgedTempRecords": true,
      "purgedErrorRecords": false,
      "authId": "openidm-admin",
      "authzComponent": "internal/user"
    }
  ],
  ...
}

Handle failed import records

The previous example showed the statistics that are returned when you query bulk imports. One of these fields is "failure": 0,. If the import was unsuccessful for any records, this failure field will have a positive value. You can then download the failed records, examine the failures and correct them in the CSV file, then run the import again.

To download failed records, send a GET request to the endpoint export/csvImportFailures/importUUID:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--request GET \
--header "Accept-API-Version: resource=1.0" \
"http://localhost:8080/export/csvImportFailures/82d9a643-8b03-4cec-86fc-3e09c4c2f01c"
userName,  givenName,  sn,      mail,     ...,  _importError
emacheke,  Edward,     Macheke, emacheke, ...,  "{code=403, reason=Forbidden, message=Policy validation
   failed, detail={result=false, failedPolicyRequirements=[{policyRequirements=[
   {policyRequirement=VALID_EMAIL_ADDRESS_FORMAT}], property=mail}]}}"

The output indicates the failed record or records, and the reason for the failure, in the _importError field. In this example, the import failed because of a policy validation error—the email address is not the correct format.

IDM does not scan for possible CSV injection attacks on uploaded files. Do not edit the downloaded CSV file with Microsoft Excel, as this can expose your data to CSV injection.

Cancel an import in progress

Cancel an import that is in progress by sending a POST request to the openidm/csv/metadata/importUUID endpoint, with the cancel action. You might want to cancel an import if the import is taking too long, or if you have noticed problems with the import data, for example:

curl \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--header "Accept-API-Version: resource=1.0" \
--request POST \
"http://localhost:8080/openidm/csv/metadata/92971c92-67bb-4ae7-b41b-96d249b0b2aa/?_action=cancel"
{
  "status": "OK"
}

Change the HTTP request timeout

By default, the timeout for the bulk import servlets is 30 seconds (or 30000 milliseconds). This parameter is set in your resolver/boot.properties file, as follows:

openidm.servlet.timeoutMillis=30000

If you are importing a very large number of records, you might need to increase the HTTP request timeout to prevent requests timing out.

In test environments, you can set this parameter to 0 to disable the request timeout. You should not disable the timeout in a production environment because no timeout can lead to DDoS attacks where thousands of slow HTTP connections are made.

For a list of all REST endpoints related to bulk import, refer to Bulk import.