Docs Menu

Docs HomePHP Library Manual

Collation

On this page

  • Overview
  • Usage
  • Collation Parameters
  • Assign a Default Collation to a Collection
  • Assign a Collation to an Index
  • Operations that Support Collation
  • find() with sort
  • findOneAndUpdate()
  • findOneAndDelete()
  • deleteMany()
  • Aggregation

New in version 1.1.

MongoDB 3.4 introduced support for collations, which provide a set of rules to comply with the conventions of a particular language when comparing strings.

For example, in Canadian French, the last accent in a given word determines the sorting order. Consider the following French words:

cote < coté < côte < côté

The sort order using the Canadian French collation would result in the following:

cote < côte < coté < côté

If collation is unspecified, MongoDB uses simple binary comparison for strings. As such, the sort order of the words would be:

cote < coté < côte < côté

You can specify a default collation for collections and indexes when they are created, or specify a collation for CRUD operations and aggregations. For operations that support collation, MongoDB uses the collection's default collation unless the operation specifies a different collation.

'collation' => [
'locale' => <string>,
'caseLevel' => <boolean>,
'caseFirst' => <string>,
'strength' => <integer>,
'numericOrdering' => <boolean>,
'alternate' => <string>,
'maxVariable' => <string>,
'normalization' => <boolean>,
'backwards' => <boolean>,
]

The only required parameter is locale, which the server parses as an ICU format locale ID. For example, set locale to en_US to represent US English or fr_CA to represent Canadian French.

For a complete description of the available parameters, see Collation Document in the MongoDB manual.

The following example creates a new collection called contacts on the test database and assigns a default collation with the fr_CA locale. Specifying a collation when you create the collection ensures that all operations involving a query that are run against the contacts collection use the fr_CA collation, unless the query specifies another collation. Any indexes on the new collection also inherit the default collation, unless the creation command specifies another collation.

<?php
$database = (new MongoDB\Client)->test;
$database->createCollection('contacts', [
'collation' => ['locale' => 'fr_CA'],
]);

To specify a collation for an index, use the collation option when you create the index.

The following example creates an index on the name field of the address_book collection, with the unique parameter enabled and a default collation with locale set to en_US.

<?php
$collection = (new MongoDB\Client)->test->address_book;
$collection->createIndex(
['first_name' => 1],
[
'collation' => ['locale' => 'en_US'],
'unique' => true,
]
);

To use this index, make sure your queries also specify the same collation. The following query uses the above index:

<?php
$collection = (new MongoDB\Client)->test->address_book;
$cursor = $collection->find(
['first_name' => 'Adam'],
[
'collation' => ['locale' => 'en_US'],
]
);

The following queries do NOT use the index. The first query uses no collation, and the second uses a collation with a different strength value than the collation on the index.

<?php
$collection = (new MongoDB\Client)->test->address_book;
$cursor1 = $collection->find(['first_name' => 'Adam']);
$cursor2 = $collection->find(
['first_name' => 'Adam'],
[
'collation' => [
'locale' => 'en_US',
'strength' => 2,
],
]
);

All reading, updating, and deleting methods support collation. Some examples are listed below.

Individual queries can specify a collation to use when matching and sorting results. The following query and sort operation uses a German collation with the locale parameter set to de.

<?php
$collection = (new MongoDB\Client)->test->contacts;
$cursor = $collection->find(
['city' => 'New York'],
[
'collation' => ['locale' => 'de'],
'sort' => ['name' => 1],
]
);

A collection called names contains the following documents:

{ "_id" : 1, "first_name" : "Hans" }
{ "_id" : 2, "first_name" : "Gunter" }
{ "_id" : 3, "first_name" : "Günter" }
{ "_id" : 4, "first_name" : "Jürgen" }

The following findOneAndUpdate() operation on the collection does not specify a collation.

<?php
$collection = (new MongoDB\Client)->test->names;
$document = $collection->findOneAndUpdate(
['first_name' => ['$lt' => 'Gunter']],
['$set' => ['verified' => true]]
);

Because Gunter is lexically first in the collection, the above operation returns no results and updates no documents.

Consider the same findOneAndUpdate() operation but with a collation specified, which uses the locale de@collation=phonebook.

Note

Some locales have a collation=phonebook option available for use with languages which sort proper nouns differently from other words. According to the de@collation=phonebook collation, characters with umlauts come before the same characters without umlauts.

<?php
$collection = (new MongoDB\Client)->test->names;
$document = $collection->findOneAndUpdate(
['first_name' => ['$lt' => 'Gunter']],
['$set' => ['verified' => true]],
[
'collation' => ['locale' => 'de@collation=phonebook'],
'returnDocument' => MongoDB\Operation\FindOneAndUpdate::RETURN_DOCUMENT_AFTER,
]
);

The operation returns the following updated document:

{ "_id" => 3, "first_name" => "Günter", "verified" => true }

Set the numericOrdering collation parameter to true to compare numeric strings by their numeric values.

The collection numbers contains the following documents:

{ "_id" : 1, "a" : "16" }
{ "_id" : 2, "a" : "84" }
{ "_id" : 3, "a" : "179" }

The following example matches the first document in which field a has a numeric value greater than 100 and deletes it.

<?php
$collection = (new MongoDB\Client)->test->numbers;
$document = $collection->findOneAndDelete(
['a' => ['$gt' =-> '100']],
[
'collation' => [
'locale' => 'en',
'numericOrdering' => true,
],
]
);

After the above operation, the following documents remain in the collection:

{ "_id" : 1, "a" : "16" }
{ "_id" : 2, "a" : "84" }

If you perform the same operation without collation, the server deletes the first document it finds in which the lexical value of a is greater than "100".

<?php
$collection = (new MongoDB\Client)->test->numbers;
$document = $collection->findOneAndDelete(['a' => ['$gt' =-> '100']]);

After the above operation is executed, the document in which a was equal to "16" has been deleted, and the following documents remain in the collection:

{ "_id" : 2, "a" : "84" }
{ "_id" : 3, "a" : "179" }

You can use collations with all the various CRUD operations which exist in the MongoDB PHP Library.

The collection recipes contains the following documents:

{ "_id" : 1, "dish" : "veggie empanadas", "cuisine" : "Spanish" }
{ "_id" : 2, "dish" : "beef bourgignon", "cuisine" : "French" }
{ "_id" : 3, "dish" : "chicken molé", "cuisine" : "Mexican" }
{ "_id" : 4, "dish" : "chicken paillard", "cuisine" : "french" }
{ "_id" : 5, "dish" : "pozole verde", "cuisine" : "Mexican" }

Setting the strength parameter of the collation document to 1 or 2 causes the server to disregard case in the query filter. The following example uses a case-insensitive query filter to delete all records in which the cuisine field matches French.

<?php
$collection = (new MongoDB\Client)->test->recipes;
$collection->deleteMany(
['cuisine' => 'French'],
[
'collation' => [
'locale' => 'en_US',
'strength' => 1,
],
]
);

After the above operation runs, the documents with _id values of 2 and 4 are deleted from the collection.

To use collation with an aggregate() operation, specify a collation in the aggregation options.

The following aggregation example uses a collection called names and groups the first_name field together, counts the total number of results in each group, and sorts the results by German phonebook order.

<?php
$collection = (new MongoDB\Client)->test->names;
$cursor = $collection->aggregate(
[
['$group' => ['_id' => '$first_name', 'name_count' => ['$sum' => 1]]],
['$sort' => ['_id' => 1]],
],
[
'collation' => ['locale' => 'de@collation=phonebook'],
]
);
←  CodecsExecute Database Commands →