Querying with SQL¶
The support for SQL format queries is available as a Beta feature. The feature and the corresponding documentation may change at any time during the Beta stage.
Atlas Data Lake supports SQL format queries through the JDBC driver for Atlas Data Lake and using the
aggregation pipeline stage. To support
SQL format queries, Atlas Data Lake automatically creates a JSON schema that maps
to a relational schema of columns, tables, and databases for all new
collections, except wildcard (
*) collections, and views in the Data Lake
storage configuration. To learn more about the schema, see
SQL Schema Format.
Data Lake automatically generates a schema for a new non-wildcard collection or view in the storage configuration when you:
- Create the collection or view in the storage configuration.
- Rename a collection or view that does not already have a schema. If you rename a collection or view that already has a schema, the schema is also renamed. Data Lake does not generate a new schema for a renamed collection or view if it already exists.
- Set the storage configuration.
Data Lake automatically generates schemas for only new collections and
views in the storage configuration.
will not have auto-generated schemas. If you want Data Lake to automatically
generate schemas for your existing non-wildcard collections and views in
the storage configuration, remove the
databases in your Data Lake storage configuration and then
update your Data Lake storage
configuration with the old configuration.
By default, Data Lake samples data from only one randomly selected document in your non-wildcard collection or view to generate a JSON schema. If your collection or view contains polymorphic data, you can provide a larger sampling size to Data Lake to generate a new schema or you can manually construct and set the schema.
You can manually generate schemas for all collections and views using the
sqlGenerateSchema command, set or update the schema for your
collections or views using the
sqlSetSchema command, and view
the stored schema using the
You can manually delete a schema for a collection or view by running the
sqlSetSchema command with an empty schema document. Data Lake
automatically removes the schema for a collection or view when you: