Navigation
This version of the documentation is archived and no longer supported.

Map Relational Schemas to MongoDB

Business intelligence tools connect to a data source and, given a fixed tabular schema, allow the user to visually explore their data. As MongoDB uses a flexible schema, these tools currently cannot use MongoDB as a native data source.

To use MongoDB with a relational business intelligence tool such as Tableau, you must define a relational mapping for your MongoDB data.

The schema generation tool mongodrdl outputs such a schema in the Document Relational Definition Language (DRDL) format, but you should verify that the schema it proposes matches your data requirements. Examine both the table structure and field types to ensure that you can generate your intended reports.

You may have to manually edit the schema definition files to remove unneeded fields; add fields that mongodrdl did not discover within the subset of documents that it sampled; and modify erroneous field types.

You may edit DRDL files to:

  • Add aggregation pipeline stages,
  • Add fields,
  • Remove fields,
  • Remove tables,
  • Rename tables, so long as the collection field remains the same,
  • Rename fields using the sqlname option

Important

Restart mongosqld to apply any changes you make to your DRDL files.

Document Relational Definition Language

The Document Relational Definition Language (DRDL) defines a relational view of a MongoDB schema.

mongodrdl samples documents from your MongoDB collections, and derives a DRDL file from those documents. mongosqld then uses the schema defined in DRDL files to allow MySQL clients to query your MongoDB data.

File Format

The DRDL file lists your databases, tables, and columns in YAML format.

schema:
- db: <database name>
  tables:
  - table: <SQL table name>
    collection: <MongoDB collection name>
    pipeline:
    - <optional pipeline elements>
    columns:
    - Name: <MongoDB field name>
      MongoType: <MongoDB field type>
      SqlName: <mapped SQL column name>
      SqlType: <mapped SQL column type>

Example

Given documents of the following shape in the collection abc in the database test:

{
    "_id": ObjectId(),
    "close": 7.45,
    "detail": { "a": 2, "b": 3 }
}

Run mongodrdl to generate a schema based on this collection:

mongodrdl -d test -c abc -o schema.drdl

The generated schema file (schema.drdl) looks similar to the following:

schema:
- db: test
  tables:
  - table: abc
    collection: abc
    pipeline: []
    columns:
    - Name: _id
      MongoType: bson.ObjectId
      SqlName: _id
      SqlType: varchar
    - Name: close
      MongoType: float64
      SqlName: close
      SqlType: numeric
    - Name: detail.a
      MongoType: float64
      SqlName: detail.a
      SqlType: numeric
    - Name: detail.b
      MongoType: float64
      SqlName: detail.b
      SqlType: numeric

Field Types

The MongoDB Connector for BI correctly maps fields that always contain the same data type into the relational model. Schema generation deals specially with the following cases:

Numeric

The MongoDB Connector for BI uses the most precise numeric type that matches the sampled documents.

If a field in a collection always has the same data type, the MongoDB Connector for BI uses that type.

If a field in a collection can contain either floating point values or integers, the MongoDB Connector for BI uses type numeric.

Dates The MongoDB Connector for BI treats any field of type data_date as the SQL timestamp type.
Timestamps The MongoDB Connector for BI ignores any field of type data_timestamp.
ObjectID The MongoDB Connector for BI treats any field of type data_oid as the SQL type varchar.
UUID The MongoDB Connector for BI treats any field of type UUID as the SQL type varchar.
Geospatial

If a collection contains a geospatial index, the connector maps the indexed field into an array of numeric longitude-latitude coordinates.

See Geospatial Data for an example.

mongodrdl will currently not recognize geospatial fields in a read-only view.

Heterogeneous Fields

If a field contains inconsistent types, the MongoDB Connector for BI chooses the most frequently sampled type.

If a field can contain either a type or an array of that type, the generated schema always specifies that the field contains an array.

Embedded Documents

The MongoDB Connector for BI maps embedded documents to simple fields that have a . separator character, making them appear similar to the way you would reference them using dot notation in a MongoDB query.

While Tableau properly quotes identifiers, within ad-hoc SQL expressions you must double-quote every identifier that contains . characters or mixed-case characters.

Example

Consider the following document:

{
    "_id": 1,
    "familyName": "Partridge",
    "hometown" : "Hollywood Hills",
    "address" : { "street": "123 Main Street",
                  "city" : "Hollywood",
                  "state" : "CA",
                  "zip" : "90210" },
    "members_since" : ISODate("2002-04-12T00:00:00Z")
}

Running mongodrdl on a collection containing this document results in the following fields in the generated schema:

_id numeric
familyName varchar
hometown varchar
address.street varchar
address.city varchar
address.state varchar
address.zip varchar
members_since timestamp

Arrays

The MongoDB Connector for BI exposes arrays to business intelligence tools using two collections: one without the array, and the other having one document per array element.

Example

If you run mongodrdl on a collection families with the following document:

{
   "_id": 1,
   "familyName": "Partridge",
   "hometown" : "Hollywood Hills",
   "familyMembers" : [
     {
         "firstname" : "Shirley",
         "age" : 42,
         "attributes" : [
             { "name" : "instrument", "value" : "singer" },
             { "name" : "role", "value" : "mom" }
         ]
     },
     {
         "firstname" : "Keith",
         "age" : 18,
         "attributes" : [
             { "name" : "instrument", "value" : "guitar" },
             { "name" : "role", "value" : "son" }
         ]
     },
     {
         "firstname" : "Laurie",
         "age" : 16,
         "attributes" : [
             { "name" : "instrument", "value" : "keyboard" },
             { "name" : "role", "value" : "sister" }
         ]
     }]
}

This results in the following three tables:

families
_id numeric
familyName varchar
hometown varchar
families_familyMembers
_id numeric
familyMembers.age numeric
familyMembers.firstname varchar
familyMembers_idx numeric
families_familyMembers_attributes
_id numeric
familyMembers.attributes.name varchar
familyMembers.attributes.value varchar
familyMembers.attributes_idx numeric
familyMembers_idx numeric

You can join these tables together to view the data in a denormalized format. For example, you can list the people named in the above schema together with their family information using the following query:

SELECT f.*, m.`familyMembers.firstname` FROM families_familyMembers m JOIN families f ON m._id = f._id;

Pre-Joining

If you provide the --preJoined option to mongodrdl, then the MongoDB Connector for BI adds the fields in the containing document to each array element’s document, thus “pre-joining” the table.

In the previous example, the tables would contain the following additional columns:

families_familyMembers
familyName varchar
hometown varchar
families_familyMembers_attributes
familyMembers.age numeric
familyMembers.firstname varchar
familyMembers_idx numeric
familyName varchar
hometown varchar

Custom Filters

You can add a column of type mongo.Filter to a collection in your DRDL file. This column type allows you to perform a custom $match query.

For example, given the following schema describing a cloud of points with up to three components:

schema:
- db: test
  tables:
  - table: points
    collection: points
    pipeline: []
    columns:
    - Name: _id
      MongoType: bson.ObjectId
      SqlName: _id
      SqlType: varchar
    - Name: x
      MongoType: float64
      SqlName: x
      SqlType: numeric
    - Name: "y"
      MongoType: float64
      SqlName: "y"
      SqlType: numeric
    - Name: z
      MongoType: float64
      SqlName: z
      SqlType: numeric
    - Name: filter
      MongoType: mongo.Filter
      SqlName: filter
      SqlType: varchar

You can select only three-dimensional points using the following query:

SELECT x, y, z FROM points WHERE filter='{"z": {"$exists": true}}';

Aggregation Pipelines

Aggregation Pipelines using Views

MongoDB 3.4 introduces Read-Only Views that you can use to filter incompatible data.

For example, you can create a view in the test database that contains only documents containing a number in the grade field of a grades collection:

db.runCommand( { create: "numericGrades", viewOn: "grades", pipeline: [ { "$match": { "grade": { "$type": "number" } } } ] } )

You can then use mongodrdl to generate a schema from this view as you would a collection:

mongodrdl -d test -c numericGrades

Aggregation Pipelines in DRDL

The MongoDB Connector for BI can use aggregation pipelines as part of the schema to transform documents from the collection into the proper form for the relational tables.

For example, consider a simple document in the collection simpleFamilies:

{
   "_id": 1,
   "familyName": "Partridge",
   "familyMembers" : [ "Shirley", "Keith", "Laurie"]
}

mongodrdl will generate a schema with the tables simpleFamilies and simpleFamilies_familyMembers.

The table simpleFamilies_familyMembers enumerates each family members, and has the following pipeline:

pipeline:
- $unwind:
    includeArrayIndex: familyMembers_idx
    path: $familyMembers

This pipeline uses $unwind to create a new record for each member of familyMembers. The schema tracks the array index in the field familyMembers_idx.

Geospatial Data

New in version 1.1.0.

If a collection contains a geospatial index, the connector maps the indexed field into an array of numeric longitude-latitude coordinates.

Example

Given the following collection:

db.points.createIndex( { pos : "2dsphere" } )
db.points.insert({
    pos : { type: "Point", coordinates: [ -73.97, 40.77 ] },
    name: "Central Park",
    category : "Parks"
})

mongodrdl generates the following schema:

schema:
- db: test
  tables:
  - table: points
    collection: points
    pipeline: []
    columns:
    - Name: _id
      MongoType: bson.ObjectId
      SqlName: _id
      SqlType: varchar
    - Name: category
      MongoType: string
      SqlName: category
      SqlType: varchar
    - Name: name
      MongoType: string
      SqlName: name
      SqlType: varchar
    - Name: pos.coordinates
      MongoType: geo.2darray
      SqlName: pos.coordinates
      SqlType: numeric[]

mongodrdl will currently not recognize geospatial fields in a read-only view.