Navigation

Sampling Type Conflicts

Changed in version 2.6.

The flexible schema model of MongoDB allows a given field to contain data of multiple types, while relational databases restrict columns to a single data type. The BI Connector samples data from MongoDB to generate a relational schema model, but type conversion conflicts may occur when the BI Connector samples different data types from a field in MongoDB documents.

There are two main categories of type conversion conflicts that may occur in the presence of multiple data types: conflicts between scalar types and conflicts involving composite types like documents and arrays.

Note

The BI Connector only uses the sampled subset of your data to detect type conflicts during schema generation. If your data set contains type conflicts on a field and the conflicting types were not both present in the sample, the generated schema will not be prepared to resolve these conflicts and may result in increased query latency.

For more information on sampling configuration, such as setting the --samplesize and sampleRefreshIntervalSecs, see Schema Options.

Scalar-Scalar Conflicts

When a scalar-scalar confict occurs, the BI Connector uses the following lattice to determine the least upper bound of the two different data types:

Scalar-scalar type lattice

Example

If a given field contains int, double, and string types, the least upper bound is string.

When a scalar-scalar conflict occurs, the BI Connector inserts a row into the information_schema.COLUMNS table and lists the different sampled types in the COLUMN_COMMENT column.

Composite Conflicts

Composite conflicts are type conversion conflicts involving a document or an array. The following sections describe how the BI Connector resolves both types of composite conflicts.

Document Conflicts

When a conflict occurs involving a document, the BI Connector displays the fields of the document type as separate columns using dot notation. For example, the conflict collection contains the following documents:

{ _id: 0, a: "foo" } // "a" is scalar (string)
{ _id: 1, a: { a: "bar", b: "baz" } } // "a" is composite (document)

The BI Connector detects a scalar-composite conflict and generates the following relational schema:

Table: conflict

_id a a.a a.b
0 “foo” NULL NULL
1 NULL “bar” “baz”

When a conflict occurs involving a document, the BI Connector inserts a row into the information_schema.COLUMNS table and lists the different sampled types in the COLUMN_COMMENT column.

Array Conflicts

When a conflict occurs involving an array, the BI Connector creates a new table for the conflict field with a column for the foreign key, array index, and value. The BI Connector unwinds the array into multiple rows in new table and fills in the columns accordingly. For example, the conflict collection contains the following documents:

{ _id: 0, a: "foo" } // "a" is scalar (string)
{ _id: 1, a: ["bar", "baz"] } // "a" is composite (array)

The BI Connector renders the above into the following two tables:

Table: conflict

_id
0
1

Table: conflict_a

_id a_idx a
0 NULL “foo”
1 0 “bar”
1 1 “baz”

When a conflict occurs involving an array, the BI Connector:

  • Inserts a row into the information_schema.COLUMNS table and lists any different types sampled in the COLUMN_COMMENT column
  • Inserts a row into the information_schema.TABLES table containing information on how the array maps to the row in the information_schema.COLUMNS table