Analyze Your Data Schema¶
The Schema tab provides an overview of the data type and shape of the fields in a particular collection. Databases and collections are visible in the left-side navigation.
The overview is based on sampling the documents in the collection. The schema overview may include additional data about the contents of the fields, such as the minimum and maximum values of dates and integers, the frequency of occurrence of particular values, and the cardinality of the data.
MongoDB has a flexible schema model, which means that some fields may contain different types of data from one document to the next.
In the case of heterogenous fields, the Schema tab shows a breakdown of the various data types contained within the field with the percentage of each data type represented.
The Schema tab shows size information about the
test.restaurants collection at the top, including the total
number of documents in the collection, the average document size,
and the total disk space occupied by the collection.
The following fields are shown with details:
_idfield is an ObjectId. Each ObjectId contains a timestamp, so Compass displays the range of creation times for the sampled documents.
addressfield contains four nested fields. You can expand the field panel to see analyses of each of the nested fields.
boroughfield contains a string indicating the borough in which the restaurant is located. The cardinality is low enough that Compass can provide a graded bar of the field contents, with the most-frequently occurring string on the left.
categoriesfield contains arrays of strings. The analysis shows the minimum, maximum, and average array lengths.
Using the query bar in the Schema tab, you can create a query filter to limit your result set. Click the Options button to specify query options, such as the particular fields to display and the number of results to return.
In the Schema tab, you can also use the Query Builder to enter a query into the query bar.
For each field, Compass displays summary information about the data type or types the field contains and the range of values. Depending on the data type and the level of cardinality, Compass displays histograms, graded bars, geographical maps, and sample data to provide a sense of the shape and scope of the data contained in each field.
Field with a Single Data Type¶
Below is an example of the data type summary for a field called
last_login which contains data of type
Field with Multiple Data Types¶
For fields that contain multiple data types,
Compass displays a percentage breakdown of
the various data types across documents. In the example below,
the chart shows the contents of a field called
phone_no in which
81% of documents are of type
string, and the remaining 19% are of
If a collection contains documents in which not all fields contain
a value, the missing values display as
undefined. In the example
below, the field
age has no recorded value in 40% of the sampled
Strings can appear in three different ways. If there are entirely unique strings in a field, Compass shows a random selection of string values from the specified field. Click the circular refresh icon to see a new set of randomly selected values from the field.
If there are only a few different string values, Compass shows the strings in a single graded bar which shows the percentage of the population of the string values.
If there are multiple string values with some duplicates, Compass shows a histogram indicating the frequency of each string found within the field.
Move the mouse over each bar to display a tooltip which shows the value of the string.
Numbers are similar to strings in their representation. Unique numbers are shown in the following manner:
Duplicate numbers are shown in a histogram that indicates their frequency:
Dates and ObjectIDs¶
Fields that represent dates (and fields that contain the ObjectID data type, which includes a timestamp) are shown across multiple bar charts. The two charts on the top row represent the day of the week and time of day of the timestamp value.
The single chart on the bottom shows the first and last timestamp value, and the vertical lines represent the distribution of the timestamp across the range of first to last.
Embedded Documents and Arrays¶
Fields that contain a sub-document or an array are displayed with a small triangle next to them and a visual representation of the data contained within the sub-document or array.
Click on the triangle to expand the field and view the embedded documents:
Fields that contain GeoJSON data or
[longitude,latitude] arrays are
displayed with interactive maps. For more information on interacting
with location data in Compass, see Analyze Location Data.
Third party mapping services are not available in Compass Isolated Edition.
View Charts of Mixed Types¶
If a field has mixed types, you can view different charts of each type
by clicking on the
type field. In the example below, the
field shows the values that are strings:
Clicking on the
number type causes the chart to show its numeric
In the Schema tab, you can type the filter manually into the query bar or generate the filter with the Compass query builder. The query builder allows you to select data elements from one or more fields in your schema and construct a query matching the selected elements.
You can compose the initial query filter by using the clickable query builder and then manually edit the generated filter to your exact requirements.
The following procedure describes the steps involved in building a complex query with the query bar.
Click the chart value.¶
In the Schema view, you can click on a chart value to
build a query. For example, the following image shows the query
filter built by clicking the
EWR value for the
Optional. Select multiple values.¶
To select multiple values for a field, click and drag the cursor over a selection of values, or press shift+click on the desired values.
Optional. Click on other field values to create a compound query.¶
For example, the following image shows shows the compound query
built by selecting a value in the
Optional. Deselect a value.¶
To deselect a previously selected value, shift+click on the selected value:
Run the query¶
To run the query, click Analyze. Click Reset to clear your query.
Analyze Location Data¶
In the Schema tab, you can use interactive maps to filter
and analyze location data. If your field contains
GeoJSON data or
[longitude,latitude] arrays, the Schema tab displays
a map containing the points from the field. The data type
for location fields is
Apply a Location Filter¶
You can apply a filter to the map to only analyze a specific range of points. To define a location filter:
- Click the Circle button at the top-right of the map.
- Click and drag on the map to draw a circle containing the area of the map you want to analyze.
- Repeat this process as desired to include additional areas of the map in the schema analysis.
The query bar updates as you draw location filters to show the exact coordinates used in the $geoWithin query applied to the schema analysis.
If you specify multiple location filters, the query becomes
an $or query
Edit a Location Filter¶
To move or resize a location filter, click on the right side of the map. You will enter the filter editing mode, which looks like this:
- To move a filter
- Click and drag the square in the center of the circle.
- To resize a filter
- Click and drag the square at the edge of the circle.
After modifying your filters, click Save.
Delete a Location Filter¶
To delete a location filter from the map:
- Click on the right side of the map.
- A location filter to delete that filter.
- Clear All to delete all location filters.
- Click Save.
If the analysis of your schema times out, it might be because the
collection you are analyzing is very large, causing MongoDB to stop the
operation before the analysis is complete. Increase the value of
MAX TIME MS to allow the operation time to complete.
To increase the value of MAX TIME MS:
In the query bar, expand Options.
- Increase the value of MAX TIME MS to accommodate your collection. MAX TIME MS defaults to 60000 milliseconds, or 60 seconds, but large collections might take tens of seconds to analyze.
Once you have increased the value of MAX TIME MS, retry your schema analysis by clicking Analyze.