Navigation

    Querying with SQL

    Beta

    The support for SQL format queries is available as a Beta feature. The feature and the corresponding documentation may change at any time during the Beta stage.

    Atlas Data Lake supports SQL format queries through the JDBC driver for Atlas Data Lake and using the $sql aggregation pipeline stage. To support SQL format queries, Atlas Data Lake automatically creates a JSON schema that maps to a relational schema of columns, tables, and databases for all new collections, except wildcard (*) collections, and views in the Data Lake storage configuration. To learn more about the schema, see SQL Schema Format.

    Data Lake automatically generates a schema for a new non-wildcard collection or view in the storage configuration when you:

    • Create the collection or view in the storage configuration.
    • Rename a collection or view that does not already have a schema. If you rename a collection or view that already has a schema, the schema is also renamed. Data Lake does not generate a new schema for a renamed collection or view if it already exists.
    • Set the storage configuration.

    note

    Data Lake automatically generates schemas for only new collections and views in the storage configuration. Existing namespaces will not have auto-generated schemas. If you want Data Lake to automatically generate schemas for your existing non-wildcard collections and views in the storage configuration, remove the databases in your Data Lake storage configuration and then update your Data Lake storage configuration with the old configuration.

    By default, Data Lake samples data from only one randomly selected document in your non-wildcard collection or view to generate a JSON schema. If your collection or view contains polymorphic data, you can provide a larger sampling size to Data Lake to generate a new schema or you can manually construct and set the schema.

    You can manually generate schemas for all collections and views using the sqlGenerateSchema command, set or update the schema for your collections or views using the sqlSetSchema command, and view the stored schema using the sqlGetSchema command.

    Once the SQL schema is set up, you can query your Atlas Data Lake collections or views through the JDBC driver for Atlas Data Lake and using the $sql aggregation pipeline stage.

    You can manually delete a schema for a collection or view by running the sqlSetSchema command with an empty schema document. Data Lake automatically removes the schema for a collection or view when you: