Navigation
This version of the documentation is archived and no longer supported.

Schema Configuration

Business Intelligence tools connect to a data source and, given a fixed tabular schema, allow the user to visually explore their data. However, because MongoDB uses a flexible schema, these tools currently cannot use MongoDB as a native data source.

To use MongoDB with a relational business intelligence tool such as Tableau, you must define a relational mapping for your MongoDB data.

The schema generation tool mongodrdl will output such a schema in the Document Relational Definition Language (DRDL) format, but you should verify that the schema it proposes matches your data requirements. Examine both the table structure and field types to ensure that you will be able to generate your desired reports.

You may have to edit the schema definition files by hand to remove unneeded fields; add fields that mongodrdl did not discover within the subset of documents that it sampled; and modify erroneous field types.

You may edit DRDL files in the following ways:

  • Add aggregation pipeline stages,
  • Add fields,
  • Remove fields,
  • Remove tables,
  • Rename tables, so long as the collection field remains the same,
  • Rename fields using the sqlname option

New in version 1.1: Rename fields using the sqlname option.

Document Relational Definition Language

The Document Relational Definition Language (DRDL) defines a relational view of a MongoDB schema.

mongodrdl produces these files using one or more MongoDB collections as a guideline, and mongobischema imports them into the Business Intelligence Connector.

File Format

Using the YAML format, the DRDL file lists the databases, tables, and columns to be “exported” for use in Business Intelligence tools.

schema:
- db: <database name>
  tables:
  - table: <SQL table name>
    collection: <MongoDB collection name>
    pipeline:
    - <optional pipeline elements>
    columns:
    - name: <MongoDB field name>
      sqlname: <mapped SQL column name>
      mongotype: <MongoDB field type>
      sqltype: <mapped SQL column type>

Example

Given documents of the following shape in the collection abc in the database test:

{
    "_id": ObjectId("....")
    "close": 7.45,
    "detail": { "a": 2, "b": 3 }
}

You can use mongodrdl to generate a schema based on this collection by running the following command:

mongodrdl -d test -c abc -o schema.drdl

The generated schema file schema.drdl will look similar to the following:

schema:
- db: test
  tables:
  - table: abc
    collection: abc
    pipeline: []
    columns:
    - name: _id
      mongotype: float64
      sqltype: numeric
    - name: close
      mongotype: float64
      sqltype: numeric
    - name: detail.a
      mongotype: float64
      sqltype: numeric
    - name: detail.b
      mongotype: float64
      sqltype: numeric

Field Types

The MongoDB Connector for BI will correctly map fields that always contain the same data type into the relational model. Schema generation will deal specially with the following cases:

Dates
The connector will treat any field of type data_date as the SQL timestamp type.
Timestamps
The connector will treat any field of type data_timestamp as a varchar string type.
Geospatial

New in version 1.1.0.

If a collection contains a geospatial index, the connector will map the indexed field into an array of numeric longitude-latitude coordinates.

See Geospatial Data for an example.

Heterogeneous Fields

If a field can contain multiple types, mongodrdl will pick the type that appears most frequently. If a value does not match the type specified in the DRDL schema, the MongoDB Connector for BI will report an error. However, in the case where schema specifies a string, the MongoDB Connector for BI can convert the value into a string.

If a field can contain either a string or an array of strings, the generated schema will always specify that the field contains an array of strings.

Embedded Documents

The MongoDB Connector for BI will map embedded documents to simple fields that have a . separator character, making them appear similar to the way you would reference them using dot notation in a MongoDB query.

While Tableau properly quotes identifiers, within “ad-hoc” SQL expressions you must double-quote every identifier that contains . characters or mixed-case characters.

For example, consider the following document:

{
    "_id": 1,
    "familyName": "Partridge",
    "hometown" : "Hollywood Hills",
    "address" : { "street": "123 Main Street",
                  "city" : "Hollywood",
                  "state" : "CA",
                  "zip" : "90210" },
    "members_since" : ISODate("2002-04-12T00:00:00Z")
}

This will result in the following fields in the generated schema:

_id numeric
familyName varchar
hometown varchar
address.street varchar
address.city varchar
address.state varchar
address.zip varchar
members_since timestamp

Arrays

MongoDB Connector for BI exposes arrays to business intelligence tools using two collections: one without the array, and the other with the array “pre-joined” to its parent.

For example, consider the following document in the collection families:

{
   "_id": 1,
   "familyName": "Partridge",
   "hometown" : "Hollywood Hills",
   "familyMembers" : [
     {
         "firstname" : "Shirley",
         "age" : 42,
         "attributes" : [
             { "name" : "instrument", "value" : "singer" },
             { "name" : "role", "value" : "mom" }
         ]
     },
     {
         "firstname" : "Keith",
         "age" : 18,
         "attributes" : [
             { "name" : "instrument", "value" : "guitar" },
             { "name" : "role", "value" : "son" }
         ]
     },
     {
         "firstname" : "Laurie",
         "age" : 16,
         "attributes" : [
             { "name" : "instrument", "value" : "keyboard" },
             { "name" : "role", "value" : "sister" }
         ]
     }]
}

This will result in the following three tables:

families
_id numeric
familyName varchar
hometown varchar
families_familyMembers
_id numeric
familyMembers.age numeric
familyMembers.firstname varchar
familyMembers_idx numeric
familyName varchar
hometown varchar
families_familyMembers_attributes
_id numeric
familyMembers.age numeric
familyMembers.attributes.name varchar
familyMembers.attributes.value varchar
familyMembers.attributes_idx numeric
familyMembers.firstname varchar
familyMembers_idx numeric
familyName varchar
hometown varchar

Aggregation Pipelines

The MongoDB Connector for BI can use aggregation pipelines as part of the schema to transform documents from the collection into the proper form for the relational tables.

For example, consider a simple document in the collection simpleFamilies:

{
   "_id": 1,
   "familyName": "Partridge",
   "familyMembers" : [ "Shirley", "Keith", "Laurie"]
}

mongodrdl will generate a schema with the tables simpleFamilies and simpleFamilies_familyMembers.

The table simpleFamilies_familyMembers enumerates each family members, and has the following pipeline:

pipeline:
- $unwind:
    includeArrayIndex: familyMembers_idx
    path: $familyMembers

This pipeline uses $unwind to create a new record for each member of familyMembers. The schema will track the array index in the field familyMembers_idx.

Geospatial Data

New in version 1.1.0.

If a collection contains a geospatial index, the connector will map the indexed field into an array of numeric longitude-latitude coordinates.

For example, given the following collection:

db.points.createIndex( { pos : "2dsphere" } )
db.points.insert({
    pos : { type: "Point", coordinates: [ -73.97, 40.77 ] },
    name: "Central Park",
    category : "Parks"
})

mongodrdl will generate the following schema:

schema:
- db: test
  tables:
  - table: points
    collection: points
    pipeline: []
    columns:
    - name: _id
      mongotype: bson.ObjectId
      sqlname: _id
      sqltype: varchar
    - name: category
      mongotype: string
      sqlname: category
      sqltype: varchar
    - name: name
      mongotype: string
      sqlname: name
      sqltype: varchar
    - name: pos.coordinates
      mongotype: geo.2darray
      sqlname: pos.coordinates
      sqltype: numeric[]