Navigation

Schema Generation and Management

Overview

Business intelligence tools connect to a data source and, given a fixed tabular schema, allow users to visually explore their data. MongoDB uses a flexible schema, so some business intelligence tools cannot use MongoDB as a native data source.

To use MongoDB with a relational business intelligence tool such as Tableau, you must define a relational mapping for your MongoDB data. MongoDB Connector for BI provides several methods for creating and managing a relational schema.

BI Connector‘s proxy server, mongosqld, has startup options which determine how it handles schema management. These are covered in detail in the mongosqld usage documentation.

Cached Sampling

New in version 2.3:

When mongosqld starts up, it can sample documents on a MongoDB instance to generate a relational representation of the schema, which it caches in memory. This is the default mode, which mongosqld uses unless you specify a schema file with the --schema option or a schema database with the --sampleSource option.

You can configure mongosqld to automatically resample on a fixed schedule with the --sampleRefreshIntervalSecs option. If you do not specify a resampling interval, mongosqld uses its initial schema for as long as the process runs.

If you need to manually edit your schema to ensure correct representation of your data, you can use a schema file instead.

Persist a Schema in MongoDB

The --sampleSource option directs mongosqld to use a particular database for schema storage. mongosqld can either create a new database to use for schema storage or use an existing database.

If you specify an existing database which has been used previously for schema storage, the --sampleMode option determines whether mongosqld writes new schema data to the specified database or only reads from it. If you specify a database which doesn’t currently exist, you must set --sampleMode to write.

The --sampleRefreshIntervalSecs option determines how frequently mongosqld resamples data to update the schema.

Load a Schema from a DRDL File

The --schema option allows you to specify a schema file for mongosqld to use.

The schema generation tool mongodrdl outputs a schema file in Document Relational Definition Language (DRDL) format, but you should verify that the schema it proposes matches your data requirements. Examine both the table structure and field types to ensure that you can generate your intended reports.

It is possible to manually edit the schema definition files to perform the following actions:

  • Add aggregation pipeline stages
  • Add fields that mongodrdl did not discover within the subset of documents that it sampled
  • Remove fields
  • Remove tables
  • Rename tables, so long as the collection field remains the same
  • Rename fields using the sqlname option

Important

To apply any changes you make to your DRDL files, you must restart mongosqld.