Navigation

Supported Partition Attribute Types

The following table lists the supported data types for partition attributes and an example databases.[n].collections.[n].dataSources.[n].path for each data type:

note

When specifying the databases.[n].collections.[n].dataSources.[n].path, use the delimiter specified in stores.[n].delimiter.

KeyData TypeExample
stringParses the filename as a string. This is the default data type. If a data type is not specified for a partition attribute, Data Lake interprets the partition attribute as a string.

filename: /employees/949-555-0195.json

path: /employees/{phone string}

OR

path: employees/{phone}

In the above path examples, phone is interpreted as a string.

intParses the filename as an integer.

filename: /zipcodes/90210.json

path: /zipcodes/{zipcode int}

In the above example, zipcode is interpreted as an integer.

isodateParses the filename in RFC 3339 format as an ISO-8601 format date.

filename: /metrics/2019-01-03T00:00:00Z.json

path: /metrics/{startTimestamp isodate}

In the above example, startTimestamp is interpreted as an ISODate. Partitions with the following date formats are also supported by the ISODate attribute:

"2020-01-02T15:04:05Z07:00"
"2020-01-02T15:04:05.000000Z07:00"
"2020-01-02"
"2020-01-02T15:04:05.000000-0700"
"2020-01-02T15:04:05-0700"
"2020-01-02T15:04Z07:00"
"2020-01-02T15:04-0700"
"2020-01-02Z07:00"
"2020-01-02-0700"
"20200102T15:04:05.000000Z07:00"
"20200102T15:04:05.000000-0700"
"20200102T15:04:05Z07:00"
"20200102T15:04:05-0700"
"20200102T15:04Z07:00"
"20200102T15:04-0700"
"20200102Z07:00"
"20200102-0700"
"20200102"
epoch_secsParses the filename as a Unix timestamp in seconds.

filename: /metrics/1549046112.json

path: /metrics/{startTimestamp epoch_secs}

In the above example, startTimestamp is interpreted as a Unix timestamp in seconds.

epoch_millisParses the filename as a Unix timestamp in milliseconds.

filename: /metrics/1549046112000.json

path: /metrics/{startTimestamp epoch_millis}

In the above example, startTimestamp is interpreted as a Unix timestamp in milliseconds.

objectidParses the filename as an ObjectId.

filename: /metrics/507f1f77bcf86cd799439011.json

path: /metrics/{objid objectid}

In the above example, objid is interpreted as an ObjectId.

uuidParses the filename as a UUID of binary subtype 4.

filename: /metrics/3b241101-e2bb-4255-8caf-4136c566a962.json

path: /metrics/{myUuid uuid}

In the above example, myUuid is interpreted as a UUID of binary subtype 4.

note

Atlas Data Lake supports the Package Syntax for regular expressions in the path to the filename.

Parsing Null Values from Filenames

If there is an empty string ("") in the place of an attribute in the path to the file, Data Lake automatically parses it as a null value for all the attribute types except string. For example, consider the following S3 data store:

/records/january/1.json
/records/february/1.json
/records//1.json

For the path /records/{month string}/*, Data Lake does not add any computed fields for the month attribute to documents generated from the third record in the above store.

Parsing Padded Numbers from Filenames

For attribute types like int, epoch_millis, and epoch_secs, if you want Data Lake to correctly parse numeric values that are padded with leading zeros in the path to the file, specify the number of digits in the padded value using regular expressions. For example, consider a S3 store with the following files:

|--users
   |--001.json
   |--002.json
   ...

The following path syntax uses a regular expression with the int attribute type to specify the number of digits in the filename:

/users/{user_id int:\\d{3}}