Import bulk data
The bulk import service lets you import large numbers of external entries over REST. You import entries from a comma-separated values (CSV) file to a specified managed object type in the IDM repository. Bulk import works as follows:
-
Loads bulk CSV entries and stores them temporarily (in the IDM repository) as JSON objects
-
Creates a temporary mapping between those entries and the managed object store in the repository
-
Performs a reconciliation between the JSON objects and the objects in the repository
The bulk import service assumes the CSV file is the authoritative data source. If you run an import more than once, the import overwrites all of the properties of the managed object (including timestamps) with the values in the CSV file.
The bulk import service assumes a singular type
. If you submit an array of type
attributes, the service sets the type
as the last element of the array.
To import bulk CSV entries into the repository using the REST API, follow these steps:
Generate a CSV template
The first time you upload entries, you must generate a CSV template. The template is essentially an empty CSV file with one header row that matches the managed object type to which you are importing. In most cases, you will be importing data that fits the managed/user
object model, but you can import any managed object type, such as roles and assignments.
To generate the CSV template, send a GET request to the openidm/csv/template
endpoint. The following request generates a CSV template for the managed user object type:
curl \ --header "X-OpenIDM-Username: openidm-admin" \ --header "X-OpenIDM-Password: openidm-admin" \ --header "Accept-API-Version: resource=1.0" \ --request GET \ "http://localhost:8080/openidm/csv/template?resourceCollection=managed/user&_fields=header&_mimeType='text/plain'" { "_id": "template", "header": "\"userName\",\"givenName\",\"sn\",\"mail\",\"description\",\"accountStatus\",\"telephoneNumber\", \"postalAddress\",\"city\",\"postalCode\",\"country\",\"stateProvince\",\"preferences/updates\", \"preferences/marketing\"" }
The template is generated based on the specified resourceCollection
, and includes a single header row. The names of each header column are derived from the schema of the managed object type. The template includes only a subset of managed user properties that can be represented by CSV fields.
Only the following managed object properties are included in the header row:
-
Properties of type
string
,boolean
, andnumber
-
Properties that do not start with an underscore (such as
_id
or_rev
)If you are importing entries to
managed/user
, the bulk import facility assumes that self-service password reset is enabled. This is because the import does not support upload of hashed passwords. -
Properties whose
scope
is notprivate
Set the parameters _fields=header
and _mimeType=text/csv
to download the template as a CSV file.
When you have generated the template, export your external data to CSV format, using the headers in the generated template.
Upload a CSV file
You can use the bulk import service with a CSV file up to 50MBytes large and less than 100,000 records. If you need to import a larger file or more records, divide your data into chunks and import each file separately.
You can increase the maximum file size by changing the value of the maxRequestSizeInMegabytes
property in your conf/servletfilter-upload.json
file.
You need to use a CSV template to perform a bulk import. For more information, refer to Generate a CSV template.
After formatting your CSV file to match your template’s structure, upload the file to the IDM repository with the following request:
curl \ --header "X-OpenIDM-Username: openidm-admin" \ --header "X-OpenIDM-Password: openidm-admin" \ --header "Accept-API-Version: resource=1.0" \ --form upload=@/path/to/example-users.csv \ --request POST \ "http://localhost:8080/upload/csv/managed/user?uniqueProperty=userName" { "importUUIDs": [ "3ebd514f-bdd7-491f-928f-21b72f44e381" ] }
--form
(-F
)-
This option causes
curl
to POST data using the Content-Typemultipart/form-data
, which lets you upload binary files. To indicate that the form content is a file, prefix the file name with an@
sign.To import more than one file at once, specify multiple
--form
options, for example:--form upload=@/path/to/example-users-a-j.csv \ --form upload=@/path/to/example-users-k-z.csv \
uniqueProperty
(required)-
This parameter lets you correlate existing entries, based on a unique value field. This is useful if you need to upload the same file a number of times (for example, if data in the file changes, or if some entries in the file contained errors). You can specify any unique value property here. You can also correlate on more than one property by specifying multiple, comma-delimited unique properties.
A successful upload generates an array of importUUID
s. You need these UUIDs to perform other operations on the import records.
Note that the endpoint ( |
Query bulk imports
A query on the csv/metadata
endpoint returns the import ID, the data structure (header fields in the CSV file), a recon ID, and a number of fields indicating the status of the import:
curl \ --header "X-OpenIDM-Username: openidm-admin" \ --header "X-OpenIDM-Password: openidm-admin" \ --header "Accept-API-Version: resource=1.0" \ --request GET \ "http://localhost:8080/openidm/csv/metadata/?_queryFilter=true" { "result": [ { "_id": "3ebd514f-bdd7-491f-928f-21b72f44e381", "_rev": "000000003e8ef4f7", "header": [ "userName", "givenName", "sn", "mail", "description", "accountStatus", "country" ], "reconId": "2e2cf41a-c4b8-4dda-9d92-6e0af65a15fe-6528", "filename": "example-users.csv", "resourcePath": "managed/user", "total": 1000, "success": 1000, "failure": 0, "created": 1000, "updated": 0, "unchanged": 0, "begin": "2020-04-17T16:31:02.955Z", "end": "2020-04-17T16:31:09.861Z", "cancelled": false, "importDeleted": false, "tempRecords": 0, "purgedTempRecords": true, "purgedErrorRecords": false, "authId": "openidm-admin", "authzComponent": "internal/user" }, { "_rev": "00000000d4392fc8" } ], ... }
Query imports to a specific object type
Use a query filter to restrict your query to imports to a specific managed object type. The following example queries uploads to the managed user object:
curl \ --header "X-OpenIDM-Username: openidm-admin" \ --header "X-OpenIDM-Password: openidm-admin" \ --header "Accept-API-Version: resource=1.0" \ --request GET \ 'http://localhost:8080/openidm/csv/metadata/?_queryFilter=/resourcePath+eq+"managed/user"' { "result": [ { "_id": "82d9a643-8b03-4cec-86fc-3e09c4c2f01c", "_rev": "000000009b3ff60b", "header": [ "userName", "givenName", "sn", "mail", "description", "accountStatus", "country" ], "reconId": "417dae3b-c939-4191-acbf-6eb1b9e802af-53335", "filename": "example-users.csv", "resourcePath": "managed/user", "total": 1001, "success": 1000, "failure": 1, "created": 0, "updated": 0, "unchanged": 1000, "begin": "2020-04-20T13:12:03.028Z", "end": "2020-04-20T13:12:05.222Z", "cancelled": false, "importDeleted": false, "tempRecords": 0, "purgedTempRecords": true, "purgedErrorRecords": false, "authId": "openidm-admin", "authzComponent": "internal/user" } ], ... }
Handle failed import records
The previous example showed the statistics that are returned when you query bulk imports. One of these fields is "failure": 0,
. If the import was unsuccessful for any records, this failure
field will have a positive value. You can then download the failed records, examine the failures and correct them in the CSV file, then run the import again.
To download failed records, send a GET request to the endpoint export/csvImportFailures/importUUID
:
curl \ --header "X-OpenIDM-Username: openidm-admin" \ --header "X-OpenIDM-Password: openidm-admin" \ --request GET \ --header "Accept-API-Version: resource=1.0" \ "http://localhost:8080/export/csvImportFailures/82d9a643-8b03-4cec-86fc-3e09c4c2f01c" userName, givenName, sn, mail, ..., _importError emacheke, Edward, Macheke, emacheke, ..., "{code=403, reason=Forbidden, message=Policy validation failed, detail={result=false, failedPolicyRequirements=[{policyRequirements=[ {policyRequirement=VALID_EMAIL_ADDRESS_FORMAT}], property=mail}]}}"
The output indicates the failed record or records, and the reason for the failure, in the _importError
field. In this example, the import failed because of a policy validation error—the email address is not the correct format.
IDM does not scan for possible CSV injection attacks on uploaded files. Do not edit the downloaded CSV file with Microsoft Excel, as this can expose your data to CSV injection. |
Cancel an import in progress
Cancel an import that is in progress by sending a POST request to the openidm/csv/metadata/importUUID
endpoint, with the cancel
action. You might want to cancel an import if the import is taking too long, or if you have noticed problems with the import data, for example:
curl \ --header "X-OpenIDM-Username: openidm-admin" \ --header "X-OpenIDM-Password: openidm-admin" \ --header "Accept-API-Version: resource=1.0" \ --request POST \ "http://localhost:8080/openidm/csv/metadata/92971c92-67bb-4ae7-b41b-96d249b0b2aa/?_action=cancel" { "status": "OK" }
Change the HTTP request timeout
By default, the timeout for the bulk import servlets is 30 seconds (or 30000
milliseconds). This parameter is set in your resolver/boot.properties
file, as follows:
openidm.servlet.timeoutMillis=30000
If you are importing a very large number of records, you might need to increase the HTTP request timeout to prevent requests timing out.
In test environments, you can set this parameter to 0
to disable the request timeout. You should not disable the timeout in a production environment because no timeout can lead to DDoS attacks where thousands of slow HTTP connections are made.
For a list of all REST endpoints related to bulk import, refer to Bulk import.