Navigation

Cached Sampling

Overview

New in version 2.3:

By default, mongosqld samples each collection on the connected MongoDB instance and generates a relational representation of the schema which it then caches in memory.

Note

If you have authentication enabled, ensure that your MongoDB user has the correct permissions. See User Permissions below.

By default, mongosqld does not automatically resample data after generating the schema. Specify the --sampleRefreshIntervalSecs option to direct mongosqld to automatically resample the data and regenerate the schema on a fixed schedule.

If the schema which mongosqld creates does not meet your BI workload needs, you can manually generate a schema file file and edit it as necessary.

See Sampling Mode Reference Chart for more information on sampling modes.

User Permissions for Cached Sampling

If your MongoDB instance uses authentication and you wish to use cached sampling, your BI Connector instance must also use authentication. The admin user that connects to MongoDB via the mongosqld program must have permission to read from all the namespaces from which you want to sample data.

Sample All Namespaces

If you wish to sample all namespaces, the admin user requires the following privileges:

Alternatively, create a user with the built-in readAnyDatabase role:

use admin

db.createUser(
  {
    user: "<username>",
    pwd: "<password>",
    roles: [
            { "role": "readAnyDatabase", "db": "admin"  }
           ]
  }
)

Note

Be aware of all privileges included with the readAnyDatabase role before granting it to a user.

To sample all namespaces, start mongosqld without the --sampleNamespaces option.

mongosqld --auth --mongo-username <username> --mongo-password <password>

Sample Specific Namespaces

If you wish to sample specific namespaces, the admin user requires the following privileges:

  • listCollections for each database where all collections are sampled
  • find on each collection or each database where all collections are sampled

Alternatively, create a user with the built-in readAnyDatabase role. For an example of creating a user with this role, see the Sample All Namespaces section.

Note

Be aware of all privileges included with the readAnyDatabase role before granting it to a user.

The following example creates a custom role in the mongo shell with the minimum required privileges to sample every collection in the test database:

1

Create a custom role with the required privileges.

use admin

db.createRole(
  {
    role: "samplingReader",
    privileges: [
      {
        resource: {
          db: "test",
          collection: ""
        },
        actions: [ "find", "listCollections" ]
      }
    ],
    roles: []
  }
)
2

Create a new user and assign the newly created role to them

db.createUser(
  {
    user: "<username>",
    pwd: "<password>",
    roles: [ "samplingReader" ]
  }
)

Note

The user in the example above does not have the listDatabases privilege, so you must specify a database to sample data from with the --sampleNamespaces option when running mongosqld.

3

Start mongosqld with authentication enabled

Run mongosqld with authentication enabled and use the --sampleNamespaces option to sample data from all collections in the test database:

mongosqld --auth --mongo-username <username> --mongo-password <password> \
  --sampleNamespaces 'test.*'