Specify a Language for Text Index¶

On this page

Specify the Default Language for a text Index
Create a text Index for a Collection in Multiple Languages

This tutorial describes how to specify the default language associated with the text index and also how to create text indexes for collections that contain documents in different languages.

Specify the Default Language for a `text` Index¶

The default language associated with the indexed data determines the list of stop words and the rules for the stemmer and tokenizer. The default language for the indexed data is english.

To specify a different language, use the default_language option when creating the text index. See Text Search Languages for the languages available for default_language.

The following example creates a text index on the content field and sets the default_language to spanish:

copy

db.collection.ensureIndex(
                           { content : "text" },
                           { default_language: "spanish" }
                         )

Create a `text` Index for a Collection in Multiple Languages¶

Specify the Index Language within the Document¶

If a collection contains documents that are in different languages, include a field in the documents that contain the language to use:

If you include a field named language in the document, by default, the ensureIndex() method will use the value of this field to override the default language.
To use a field with a name other than language, you must specify the name of this field to the ensureIndex() method with the language_override option.

See Text Search Languages for a list of supported languages.

Include the `language` Field¶

Include a field language that specifies the language to use for the individual documents.

For example, the documents of a multi-language collection quotes contain the field language:

copy

{ _id: 1, language: "portuguese", quote: "A sorte protege os audazes" }
{ _id: 2, language: "spanish", quote: "Nada hay más surreal que la realidad." }
{ _id: 3, language: "english", quote: "is this a dagger which I see before me" }

Create a text index on the field quote:

copy

db.quotes.ensureIndex( { quote: "text" } )

For the documents that contain the language field, the text index uses that language to determine the stop words and the rules for the stemmer and the tokenizer.
For documents that do not contain the language field, the index uses the default language, which is English, to determine the stop words and rules for the stemmer and the tokenizer.

For example, the Spanish word que is a stop word. So the following text command would not match any document:

copy

db.quotes.runCommand( "text", { search: "que", language: "spanish" } )

Use any Field to Specify the Language for a Document¶

Include a field that specifies the language to use for the individual documents. To use a field with a name other than language, include the language_override option when creating the index.

For example, the documents of a multi-language collection quotes contain the field idioma:

copy

{ _id: 1, idioma: "portuguese", quote: "A sorte protege os audazes" }
{ _id: 2, idioma: "spanish", quote: "Nada hay más surreal que la realidad." }
{ _id: 3, idioma: "english", quote: "is this a dagger which I see before me" }

Create a text index on the field quote with the language_override option:

copy

db.quotes.ensureIndex( { quote : "text" },
                       { language_override: "idioma" } )

For the documents that contain the idioma field, the text index uses that language to determine the stop words and the rules for the stemmer and the tokenizer.
For documents that do not contain the idioma field, the index uses the default language, which is English, to determine the stop words and rules for the stemmer and the tokenizer.

For example, the Spanish word que is a stop word. So the following text command would not match any document:

copy

db.quotes.runCommand( "text", { search: "que", language: "spanish" } )

← Search String Content for Text Create text Index with Long Name →

Specify a Language for Text Index¶

Specify the Default Language for a text Index¶

Create a text Index for a Collection in Multiple Languages¶

Specify the Index Language within the Document¶

Include the language Field¶

Use any Field to Specify the Language for a Document¶

Specify the Default Language for a `text` Index¶

Create a `text` Index for a Collection in Multiple Languages¶

Include the `language` Field¶