- Indexes >
- Indexing Tutorials >
- Text Search Tutorials >
- Specify a Language for Text Index
Specify a Language for Text Index¶
On this page
This tutorial describes how to specify the default language associated with the text index and also how to create text indexes for collections that contain documents in different languages.
Specify the Default Language for a text
Index¶
The default language associated with the indexed data determines the
list of stop words and the rules for the stemmer and tokenizer. The
default language for the indexed data is english
.
To specify a different language, use the default_language
option
when creating the text
index. See Text Search Languages for
the languages available for default_language
.
The following example creates a text
index on the
content
field and sets the default_language
to
spanish
:
Create a text
Index for a Collection in Multiple Languages¶
Specify the Index Language within the Document¶
If a collection contains documents that are in different languages, include a field in the documents that contain the language to use:
- If you include a field named
language
in the document, by default, theensureIndex()
method will use the value of this field to override the default language. - To use a field with a name other than
language
, you must specify the name of this field to theensureIndex()
method with thelanguage_override
option.
See Text Search Languages for a list of supported languages.
Include the language
Field¶
Include a field language
that specifies the language to use for the
individual documents.
For example, the documents of a multi-language collection quotes
contain the field language
:
Create a text
index on the field quote
:
- For the documents that contain the
language
field, thetext
index uses that language to determine the stop words and the rules for the stemmer and the tokenizer. - For documents that do not contain the
language
field, the index uses the default language, which is English, to determine the stop words and rules for the stemmer and the tokenizer.
For example, the Spanish word que
is a stop word. So the
following text
command would not match any document:
Use any Field to Specify the Language for a Document¶
Include a field that specifies the language to use for the individual
documents. To use a field with a name other than language
, include
the language_override
option when creating the index.
For example, the documents of a multi-language collection quotes
contain the field idioma
:
Create a text
index on the field quote
with the
language_override
option:
- For the documents that contain the
idioma
field, thetext
index uses that language to determine the stop words and the rules for the stemmer and the tokenizer. - For documents that do not contain the
idioma
field, the index uses the default language, which is English, to determine the stop words and rules for the stemmer and the tokenizer.
For example, the Spanish word que
is a stop word. So the
following text
command would not match any document: