Navigation

Supported Data Formats

Beta

The Atlas Data Lake is available as a Beta feature. The product and the corresponding documentation may change at any time during the Beta stage. For support, see Atlas Support.

Data Lake can read the following data formats:

Comma-Separated and Tab-Separated Value Data Files

Your CSV or TSV file must start with a header row. Atlas Data Lake utilizes the header row as field names. The dot-delimited field names in the header row become nested fields or objects in JSON format. For each dot in the field name, Data Lake creates another level of nesting.

example

Suppose your Data Lake is reading a CSV file with content similar to the following:

company,location.state,location.city.name,location.city.street
"MongoDB", "California", "Palo Alto", "Forest Ave"

For the data fields in the above example CSV file, Data Lake creates a JSON document similar to the following:

{
   "company": "MongoDB",
   "location": {
      "state": "California",
      "city": {
         "name": "Palo Alto",
         "street": "Forest Ave",
   }
}

Data Lake requires all field names at the same level of nesting to be unique.

  • One field duplicates another field at the same level of nesting.

    example

    Consider the following:

    company,location,company

    In the header, company is repeated twice at the same level of nesting.

  • One dot-delimited field duplicates another field at the same level of nesting.

    example

    Consider the following:

    company,location,location.city

    In the header, location is both a stand-alone field and dot-delimited field at the same level of nesting.