---
title: Data Preparation
description: Once you have deployed Ping Autonomous Identity, you can prepare your dataset into a format that meets the schema.
component: autonomous-identity
version: 2022.11.12
page_id: autonomous-identity:admin-guide:chap-data-preparation
canonical_url: https://docs.pingidentity.com/autonomous-identity/2022.11.12/admin-guide/chap-data-preparation.html
section_ids:
  sec-data-collection: Data collection
  sec-csv-files-schema: CSV files and schema
---

# Data Preparation

Once you have deployed Ping Autonomous Identity, you can prepare your dataset into a format that meets the schema.

The initial step is to obtain the data as agreed upon between Ping Identity and your company. The files contain a subset of user attributes from the HR database and entitlement metadata required for the analysis. Only the attributes necessary for analysis are used.

There are a number of steps that must be carried out before your production entitlement data is input into Ping Autonomous Identity. The summary of these steps are outlined below:

## Data collection

Typically, the raw client data is not in a form that meets the Ping Autonomous Identity schema. For example, a unique user identifier can have multiple names, such as `user_id`, `account_id`, `user_key`, or `key`. Similarly, entitlement columns can have several names, such as `access_point`, `privilege_name`, or `entitlement`.

To get the correct format, here are some general rules:

* Submit the raw client data in `.csv` file format. The data can be in a single file or multiple files. Data includes application attributes, entitlement assignments, entitlements decriptions, and identities data.

* Duplicate values should be removed.

* Add optional columns for additional training attributes, for example, `MANAGERS_MANAGER` and `MANAGER_FLAG`. You can add these additional attributes to the schema using the Ping Autonomous Identity UI. For more information, refer to [Set Entity Definitions](set-entity-definitions.html).

* Make a note of those attributes that differ from the Ping Autonomous Identity schema, which is presented below. This is crucial for setting up your attribute mappings. For more information, refer to [Set Attribute Mappings](set-attribute-mappings.html).

## CSV files and schema

The required attributes for the schema are as follows:

**CSV Files Schema**

| Files            | Schema                                                                                                                                                                                                                                                                               |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| applications.csv | This file depends on the attributes that the client wants to include. Here are some required columns:- **app\_id**. Specifies the applications's unique ID.

- **app\_name**. Specifies the applications's name.

- **app\_owner\_id**. Specifies the ID of the application's owner. |
| assignments.csv  | * **user\_id**. Specifies the unique user ID to which the entitlement is assigned.

* **ent\_id**. Specifies the entitlements's unique ID.                                                                                                                                           |
| entitlements.csv | - **ent\_id**. Specifies the entitlements's unique ID.

- **ent\_name**. Specifies the entitlement name.

- **ent\_owner\_id**. Specifies the entitlement's owner.

- **app\_id**. Specifies the applications's unique ID.                                                           |
| identities.csv   | * **usr\_id**. Specifies the user's unique ID.

* **user\_name**. Specifies a human readable username. For example, `John Smith`.

* **usr\_manager\_id**. Specifies the user's manager ID.                                                                                          |
