Docs Home → PHP Library Manual
Collation
On this page
New in version 1.1.
Overview
MongoDB 3.4 introduced support for collations, which provide a set of rules to comply with the conventions of a particular language when comparing strings.
For example, in Canadian French, the last accent in a given word determines the sorting order. Consider the following French words:
cote < coté < côte < côté
The sort order using the Canadian French collation would result in the following:
cote < côte < coté < côté
If collation is unspecified, MongoDB uses simple binary comparison for strings. As such, the sort order of the words would be:
cote < coté < côte < côté
Usage
You can specify a default collation for collections and indexes when they are created, or specify a collation for CRUD operations and aggregations. For operations that support collation, MongoDB uses the collection's default collation unless the operation specifies a different collation.
Collation Parameters
'collation' => [ 'locale' => <string>, 'caseLevel' => <boolean>, 'caseFirst' => <string>, 'strength' => <integer>, 'numericOrdering' => <boolean>, 'alternate' => <string>, 'maxVariable' => <string>, 'normalization' => <boolean>, 'backwards' => <boolean>, ]
The only required parameter is locale
, which the server parses as an ICU
format locale ID. For example, set
locale
to en_US
to represent US English or fr_CA
to represent
Canadian French.
For a complete description of the available parameters, see Collation Document in the MongoDB manual.
Assign a Default Collation to a Collection
The following example creates a new collection called contacts
on the
test
database and assigns a default collation with the fr_CA
locale.
Specifying a collation when you create the collection ensures that all
operations involving a query that are run against the contacts
collection
use the fr_CA
collation, unless the query specifies another collation. Any
indexes on the new collection also inherit the default collation, unless the
creation command specifies another collation.
$database = (new MongoDB\Client)->test; $database->createCollection('contacts', [ 'collation' => ['locale' => 'fr_CA'], ]);
Assign a Collation to an Index
To specify a collation for an index, use the collation
option when you
create the index.
The following example creates an index on the name
field of the
address_book
collection, with the unique
parameter enabled and a default
collation with locale
set to en_US
.
$collection = (new MongoDB\Client)->test->address_book; $collection->createIndex( ['first_name' => 1], [ 'collation' => ['locale' => 'en_US'], 'unique' => true, ] );
To use this index, make sure your queries also specify the same collation. The following query uses the above index:
$collection = (new MongoDB\Client)->test->address_book; $cursor = $collection->find( ['first_name' => 'Adam'], [ 'collation' => ['locale' => 'en_US'], ] );
The following queries do NOT use the index. The first query uses no
collation, and the second uses a collation with a different strength
value
than the collation on the index.
$collection = (new MongoDB\Client)->test->address_book; $cursor1 = $collection->find(['first_name' => 'Adam']); $cursor2 = $collection->find( ['first_name' => 'Adam'], [ 'collation' => [ 'locale' => 'en_US', 'strength' => 2, ], ] );
Operations that Support Collation
All reading, updating, and deleting methods support collation. Some examples are listed below.
find()
with sort
Individual queries can specify a collation to use when matching and sorting
results. The following query and sort operation uses a German collation with the
locale
parameter set to de
.
$collection = (new MongoDB\Client)->test->contacts; $cursor = $collection->find( ['city' => 'New York'], [ 'collation' => ['locale' => 'de'], 'sort' => ['name' => 1], ] );
findOneAndUpdate()
A collection called names
contains the following documents:
{ "_id" : 1, "first_name" : "Hans" } { "_id" : 2, "first_name" : "Gunter" } { "_id" : 3, "first_name" : "Günter" } { "_id" : 4, "first_name" : "Jürgen" }
The following findOneAndUpdate()
operation on the collection does not
specify a collation.
$collection = (new MongoDB\Client)->test->names; $document = $collection->findOneAndUpdate( ['first_name' => ['$lt' => 'Gunter']], ['$set' => ['verified' => true]] );
Because Gunter
is lexically first in the collection, the above operation
returns no results and updates no documents.
Consider the same findOneAndUpdate()
operation but with a collation
specified, which uses the locale de@collation=phonebook
.
Note
Some locales have a collation=phonebook
option available for use with
languages which sort proper nouns differently from other words. According to
the de@collation=phonebook
collation, characters with umlauts come before
the same characters without umlauts.
$collection = (new MongoDB\Client)->test->names; $document = $collection->findOneAndUpdate( ['first_name' => ['$lt' => 'Gunter']], ['$set' => ['verified' => true]], [ 'collation' => ['locale' => 'de@collation=phonebook'], 'returnDocument' => MongoDB\Operation\FindOneAndUpdate::RETURN_DOCUMENT_AFTER, ] );
The operation returns the following updated document:
{ "_id" => 3, "first_name" => "Günter", "verified" => true }
findOneAndDelete()
Set the numericOrdering
collation parameter to true
to compare numeric
strings by their numeric values.
The collection numbers
contains the following documents:
{ "_id" : 1, "a" : "16" } { "_id" : 2, "a" : "84" } { "_id" : 3, "a" : "179" }
The following example matches the first document in which field a
has a
numeric value greater than 100 and deletes it.
$collection = (new MongoDB\Client)->test->numbers; $document = $collection->findOneAndDelete( ['a' => ['$gt' =-> '100']], [ 'collation' => [ 'locale' => 'en', 'numericOrdering' => true, ], ] );
After the above operation, the following documents remain in the collection:
{ "_id" : 1, "a" : "16" } { "_id" : 2, "a" : "84" }
If you perform the same operation without collation, the server deletes the
first document it finds in which the lexical value of a
is greater than
"100"
.
$collection = (new MongoDB\Client)->test->numbers; $document = $collection->findOneAndDelete(['a' => ['$gt' =-> '100']]);
After the above operation is executed, the document in which a
was equal to
"16"
has been deleted, and the following documents remain in the collection:
{ "_id" : 2, "a" : "84" } { "_id" : 3, "a" : "179" }
deleteMany()
You can use collations with all the various CRUD operations which exist in the MongoDB PHP Library.
The collection recipes
contains the following documents:
{ "_id" : 1, "dish" : "veggie empanadas", "cuisine" : "Spanish" } { "_id" : 2, "dish" : "beef bourgignon", "cuisine" : "French" } { "_id" : 3, "dish" : "chicken molé", "cuisine" : "Mexican" } { "_id" : 4, "dish" : "chicken paillard", "cuisine" : "french" } { "_id" : 5, "dish" : "pozole verde", "cuisine" : "Mexican" }
Setting the strength
parameter of the collation document to 1
or 2
causes the server to disregard case in the query filter. The following example
uses a case-insensitive query filter to delete all records in which the
cuisine
field matches French
.
$collection = (new MongoDB\Client)->test->recipes; $collection->deleteMany( ['cuisine' => 'French'], [ 'collation' => [ 'locale' => 'en_US', 'strength' => 1, ], ] );
After the above operation runs, the documents with _id
values of 2
and
4
are deleted from the collection.
Aggregation
To use collation with an aggregate()
operation, specify a collation in the
aggregation options.
The following aggregation example uses a collection called names
and groups
the first_name
field together, counts the total number of results in each
group, and sorts the results by German phonebook order.
$collection = (new MongoDB\Client)->test->names; $cursor = $collection->aggregate( [ ['$group' => ['_id' => '$first_name', 'name_count' => ['$sum' => 1]]], ['$sort' => ['_id' => 1]], ], [ 'collation' => ['locale' => 'de@collation=phonebook'], ] );