Querying with SQL¶
The support for SQL format queries is available as a Beta feature. The feature and the corresponding documentation may change at any time during the Beta stage.
Atlas Data Lake supports SQL format queries through the JDBC driver for Atlas Data Lake and using the
aggregation pipeline stage. To
support SQL format queries, Atlas Data Lake automatically creates a JSON
schema that maps to a relational schema of columns, tables, and
databases for all new collections and views in the Data Lake storage
configuration. To learn more about the schema, see
SQL Schema Format.
Data Lake automatically generates a schema for a collection or view in the storage configuration when you:
- Create the collection or view in the storage configuration.
- Rename a collection or view that does not already have a schema. If you rename a collection or view that already has a schema, the schema is also renamed. Data Lake does not generate a new schema for a renamed collection or view if it already exists.
- Set the storage configuration.
In addition, for wildcard (
*) collections, Data Lake generates a
schema when it discovers the collections in the namespace catalog for the wildcard (
Data Lake automatically generates schemas for only new collections and
views in the storage configuration or namespace catalog. Existing namespaces will not have auto-generated
schemas. If you want Data Lake to automatically generate schemas for
your existing collections and views in the storage configuration,
databases in your Data Lake storage configuration and
then update your Data Lake storage
configuration with the old configuration.
Data Lake doesn't automatically generate or update schemas when you
update the storage configuration from the Atlas UI. If you
update your Data Lake storage configuration through the UI, you must
manually update schemas using the
By default, Data Lake samples data from only one randomly selected document in your collection or view to generate a JSON schema. If your collection or view contains polymorphic data, you can provide a larger sampling size to Data Lake to generate a new schema or you can manually construct and set the schema.
You can manually generate schemas for all collections and views using
sqlGenerateSchema command, set or update the schema for
your collections or views using the
and view the stored schema using the
You can manually delete a schema for a collection or view by running
sqlSetSchema command with an empty schema document.
Data Lake automatically removes the schema for a collection or view when
- Drop the collection or view from the storage configuration.
- Modify the storage configuration to remove the collection or view from the storage configuration.
- Drop the database that contains the collection or view from the storage configuration.
In addition, for a wildcard (
*) collection, Data Lake deletes the
schema when it discovers that the collection has been removed from
the namespace catalog.