Map Relational Schemas to MongoDB¶

On this page

Overview
Cached Sampling
- User Permissions for Cached Sampling
Persist a Schema in MongoDB
- User Permissions for Persisted Schemas
Resample Schema Data with FLUSH SAMPLE
Managing Schema with MongoDB Views
- Examples
Load a Schema from a DRDL File
Geospatial Data
- Example

Overview¶

Business intelligence tools connect to a data source and, given a fixed tabular schema, allow users to visually explore their data. MongoDB uses a flexible schema, so some business intelligence tools cannot use MongoDB as a native data source.

To use MongoDB with a relational business intelligence tool such as Tableau, you must define a relational mapping for your MongoDB data. MongoDB Connector for BI provides several methods for creating and managing a relational schema.

BI Connector’s proxy server, mongosqld, has startup options which determine how it handles schema management. These are covered in detail in the mongosqld usage documentation.

Cached Sampling¶

New in version 2.3:

When mongosqld starts up, it can sample documents on a MongoDB instance to generate a relational representation of the schema, which it caches in memory. This is the default mode, which mongosqld uses unless you specify a schema file with the --schema option or a schema database with the --sampleSource option.

Note

If you have authentication enabled, ensure that your MongoDB user has the required permissions for cached sampling.

You can configure mongosqld to automatically resample on a fixed schedule with the --sampleRefreshIntervalSecs option. If you do not specify a resampling interval, mongosqld uses its initial schema for as long as the process runs.

If you need to manually edit your schema to ensure correct representation of your data, you can use a schema file instead.

See Sampling Mode Reference Chart for more information on sampling modes.

User Permissions for Cached Sampling¶

If your MongoDB instance uses authentication and you wish to use cached sampling, your BI Connector instance must also use authentication. The admin user that connects to MongoDB via the mongosqld program must have the following privileges:

listDatabases on the cluster
listCollections on each database in the cluster
find on each database you wish to sample

Alternatively, create a user with the built-in readAnyDatabase role to grant all of the required privileges:

copy

use admin

db.createUser(
  {
    user: "<username>",
    pwd: "<password>",
    roles: [
            { "role": "readAnyDatabase", "db": "admin"  }
           ]
  }
)

Note

Be aware of all privileges included with the readAnyDatabase role before granting it to a user.

To sample all namespaces, start mongosqld without the --sampleNamespaces option.

copy

mongosqld --auth --mongo-username <username> --mongo-password <password>

Persist a Schema in MongoDB¶

The --sampleSource option directs mongosqld to use a particular database for schema storage. mongosqld can either create a new database to use for schema storage or use an existing database.

If you specify an existing database which has been used previously for schema storage, the --sampleMode option determines whether mongosqld writes new schema data to the specified database or only reads from it. If you specify a database which doesn’t currently exist, you must set --sampleMode to write.

The --sampleRefreshIntervalSecs option determines how frequently mongosqld resamples data to update the schema.

To force a one-time update of the schema, use the FLUSH SAMPLE command from your SQL client.

See Sampling Mode Reference Chart for more information on sampling modes.

User Permissions for Persisted Schemas¶

When using a persisted schema, the required user permissions are similar to those for cached sampling, with the addition of a required write permission on the specified schema database. To create a user with the necessary permissions to run mongosqld with a persisted schema, execute the following command in the mongo shell, with the placeholder values <username>, <password>, and <dbname> filled in with your own values:

copy

db.getSiblingDB("admin").createUser(
  {
    user: "<username>",
    pwd: "<password>",
    roles: [
             { "role": "readAnyDatabase", "db": "admin"  },
             { "role": "readWrite", "db": "<dbname>"  }
           ]
  }
)

Resample Schema Data with `FLUSH SAMPLE`¶

When the mongosqld process starts it creates a schema, either from a schema file or by sampling data from a MongoDB instance. If the data in your MongoDB instance changes shape significantly with new fields or collections, you may wish to regenerate the schema BI Connector uses. You can regenerate the schema either by restarting mongosqld or by issuing the FLUSH SAMPLE command from within the MySQL shell.

The --sampleRefreshIntervalSecs option provides another way for mongosqld to regenerate the schema by resampling data at a user-specified interval.

Note

You cannot use the FLUSH SAMPLE command if mongosqld starts with a schema file via the --schema option.

Managing Schema with MongoDB Views¶

Another way to control the data visible to BI Connector without using a .drdl file is to read from a view. Views allow you to take a source collection and choose which fields to display, change the names of fields, sort data, and perform any kind of transformation that the aggregation pipeline can accomplish.

You can use the mongo shell method createView to define a view on a collection and then use that view as a specified namespace when starting mongosqld.

Examples¶

Consider a collection named salaries in the employees database. By creating a view on the salaries collection that excludes all personally identifiable information, BI Connector can safely facilitate data analysis on the view without compromising the private data in the source collection.

The following command starts mongosqld with data from a view named salaries_redacted in the employees database:

copy

mongosqld --sampleNamespaces "employees.salaries_redacted"

Load a Schema from a DRDL File¶

The --schema option allows you to specify a schema file for mongosqld to use.

The schema generation tool mongodrdl outputs a schema file in Document Relational Definition Language (DRDL) format, but you should verify that the schema it proposes matches your data requirements. Examine both the table structure and field types to ensure that you can generate your intended reports.

It is possible to manually edit the schema definition files to perform the following actions:

Add aggregation pipeline stages
Add fields that mongodrdl did not discover within the subset of documents that it sampled
Remove fields
Remove tables
Rename tables, so long as the collection field remains the same
Rename fields using the sqlname option

Important

To apply any changes you make to your DRDL files, you must restart mongosqld.

Geospatial Data¶

If a collection contains a 2d or 2dsphere geospatial index, BI Connector maps the indexed field into an array of numeric longitude-latitude coordinates.

Example¶

Given the following collection:

copy

db.points.createIndex( { pos : "2dsphere" } )
db.points.insert({
    pos : { type: "Point", coordinates: [ -73.97, 40.77 ] },
    name: "Central Park",
    category : "Parks"
})

BI Connector generates the following schema:

schema:
- db: test
  tables:
  - table: points
    collection: points
    pipeline: []
    columns:
    - Name: _id
      MongoType: bson.ObjectId
      SqlName: _id
      SqlType: varchar
    - Name: category
      MongoType: string
      SqlName: category
      SqlType: varchar
    - Name: name
      MongoType: string
      SqlName: name
      SqlType: varchar
    - Name: pos.coordinates
      MongoType: geo.2darray
      SqlName: pos.coordinates
      SqlType: numeric[]

Note

BI Connector does not recognize geospatial fields when reading from a view.

← Install the JDBC Authentication Plugin MongoDB Connector for BI Components →

Map Relational Schemas to MongoDB¶

Overview¶

Cached Sampling¶

User Permissions for Cached Sampling¶

Persist a Schema in MongoDB¶

User Permissions for Persisted Schemas¶

Resample Schema Data with FLUSH SAMPLE¶

Managing Schema with MongoDB Views¶

Examples¶

Load a Schema from a DRDL File¶

Geospatial Data¶

Example¶

Resample Schema Data with `FLUSH SAMPLE`¶