Navigation

Schema Generation and Management

Business intelligence tools connect to a data source and, given a fixed tabular schema, allow users to visually explore their data. MongoDB uses a flexible schema, so some business intelligence tools cannot use MongoDB as a native data source.

To use MongoDB with a relational business intelligence tool such as Tableau, you must define a relational mapping for your MongoDB data. MongoDB Connector for BI provides several methods for creating and managing a relational schema.

BI Connector‘s proxy server, mongosqld, has startup options which determine how it handles schema management. These are covered in detail in the mongosqld usage documentation.

Cached Sampling

New in version 2.3:

When mongosqld starts up, it can sample documents on a MongoDB instance to generate a relational representation of the schema, which it caches in memory. This is the default mode, which mongosqld uses unless you specify a schema file with the --schema option or a schema database with the --sampleSource option.

You can configure mongosqld to automatically resample on a fixed schedule with the --sampleRefreshIntervalSecs option. If you do not specify a resampling interval, mongosqld uses its initial schema for as long as the process runs.

If you need to manually edit your schema to ensure correct representation of your data, you can use a schema file instead.

User Permissions

To run BI Connector in cached sampling mode, a MongoDB user needs permission to create a schema from any database in the deployment. Give this user the readAnyDatabase role.

Example

Use the MongoDB shell to add a user to the admin database with the proper permissions to read any database and read and write the schema database.

db.getSiblingDB("admin").createUser(
  {
    user: "<username>",
    pwd: "<password>",
    roles: ["readAnyDatabase"]
  }
)

Resample Schema Data with FLUSH SAMPLE

When the mongosqld process starts it creates a schema, either from a schema file or by sampling data from a MongoDB instance. If the data in your MongoDB instance changes shape significantly with new fields or collections, you may wish to regenerate the schema BI Connector uses. You can regenerate the schema either by restarting mongosqld or by issuing the FLUSH SAMPLE command from within the MySQL shell.

The --sampleRefreshIntervalSecs option provides another way for mongosqld to regenerate the schema by resampling data at a user-specified interval.

Note

You cannot use the FLUSH SAMPLE command if mongosqld starts with a schema file via the --schema option.

See also

--sampleNamespaces

--sampleSource

--sampleMode

Persist a Schema in MongoDB

The --sampleSource option directs mongosqld to use a particular database for schema storage. mongosqld can either create a new database to use for schema storage or use an existing database.

If you specify an existing database which has been used previously for schema storage, the --sampleMode option determines whether mongosqld writes new schema data to the specified database or only reads from it. If you specify a database which doesn’t currently exist, you must set --sampleMode to write.

The --sampleRefreshIntervalSecs option determines how frequently mongosqld resamples data to update the schema.

To force a one-time update of the schema, use the FLUSH SAMPLE command from your SQL client.

User Permissions

To run BI Connector in persisted schema mode, a MongoDB user needs permission to create a schema from any database in the deployment and read and write entries into the schema database. Give this user the readAnyDatabase role and the readWrite role on the database that stores the schema (set using --sampleMode)

Example

Use the MongoDB shell to add a user to the admin database with the proper permissions to read any database and read and write the schema database.

db.getSiblingDB("admin").createUser(
  {
    user: "<username>",
    pwd: "<password>",
    roles: [
      "readAnyDatabase",
      { role: "readWrite", db: "<schemaDb>" }
    ]
  }
)

Load a Schema from a DRDL File

The --schema option allows you to specify a schema file for mongosqld to use.

The schema generation tool mongodrdl outputs a schema file in Document Relational Definition Language (DRDL) format, but you should verify that the schema it proposes matches your data requirements. Examine both the table structure and field types to ensure that you can generate your intended reports.

It is possible to manually edit the schema definition files to perform the following actions:

  • Add aggregation pipeline stages
  • Add fields that mongodrdl did not discover within the subset of documents that it sampled
  • Remove fields
  • Remove tables
  • Rename tables, so long as the collection field remains the same
  • Rename fields using the sqlname option

Important

To apply any changes you make to your DRDL files, you must restart mongosqld.