Navigation

Replica Set Protocol Versions

MongoDB provides replica set protocol version 0 (pv0) and replica set protocol version 1 (pv1):

  • pv0 prioritizes lowering the likelihood of rollbacks of w:1 writes. However, pv0 can lead to the loss of confirmed w: "majority" writes in certain network partition scenarios.

  • pv1 guarantees the preservation of confirmed w: "majority" writes. While pv1, by default, prioritizes faster failovers over the preservation of w:1 writes, it can be configured to prioritize preservation of most w:1 writes at the expense of slower failover time.

    Note

    pv1 is the default for all new replica sets created with MongoDB 3.2 or later.

The following outlines some differences between pv0 and pv1.

Availability

  • pv0 is available in all MongoDB versions.
  • pv1 is available in MongoDB version 3.2 or later and is the default for all new replica sets created with version 3.2 or later.
MongoDB Versions pv0 pv1
3.2+
< 3.2  

Read Concern

Read Concern pv0 pv1
"local"
"majority"  
"linearizable"

Arbiters

For replica sets with an arbiter, pv1 increases the likelihood of rollback of w:1 writes compared to pv0.

Vetoes

pv0 allows members to veto elections based on member’s optime and priority values.

pv1 does not use vetoes. Individual members can vote for or against a candidate in a particular election, but cannot individually veto (abort) an election unilaterally.

Detection of Simultaneous Primaries

In some circumstances, two nodes in a replica set may transiently believe that they are the primary, but at most, one of them will be able to complete writes with { w: "majority" } write concern. The node that can complete { w: "majority" } writes is the current primary, and the other node is a former primary that has not yet recognized its demotion, typically due to a network partition. When this occurs, clients that connect to the former primary may observe stale data despite having requested read preference primary, and new writes to the former primary will eventually roll back.

pv0 relies on clock synchronization to disambiguate when two members both think they are primary. Reliance on clock synchronization can lead to the loss of confirmed w: "majority" writes.

Instead of clock synchronization, pv1 uses the concept of term. This allows for a faster detection of simultaneous primaries and for multiple successful elections in a short period of time. pv0 can leave a replica set with no primary if multiple elections are needed in a short period of time.

Back to Back Elections

To maximize write availability, pv1 does not consider priority when conducting an election. Instead, after a replica set has a stable primary, pv1 makes a “best-effort” attempt to have the secondary with the highest priority available call an election. This could lead to back-to-back elections as eligible members with higher priority can call an election. But unlike pv0, which must include a 30 second buffer between back-to-back elections, the use of terms in pv1 allows for faster occurence of back-to-back elections.

Both the increased frequency and the lack of a time buffer between back-to-back elections with pv1 increase the likelihood of rollback of w:1 writes. However, you can reduce the number of rollbacks by raising the catchUpTimeoutMillis setting.

During an election, pv0 allows nodes to veto based on priority values. As such, after a replica set has a stable primary, pv0 would lead to less back-to-back elections than pv1. Because pv0 relies on clock synchronization to detect multiple primaries, pv0 includes a 30 seconds buffer between back-to-back elections as a precaution against poor clock synchronization.

Warning

Reliance on clock synchronization can lead to the loss of confirmed w: "majority" writes.

Double Voting

With its use of terms, pv1 prevents double voting in one member’s call for election.

pv0 lessens the likelihood of double-voting via the 30-second buffer, but cannot guarantee that a member will not double vote if an election exceeds 30-seconds.

Summary

  pv0 pv1
w: 1 Writes Prioritized preservation of w: 1 writes. Increased likelihood of w: 1 rollbacks, but behavior can be adjusted through catchUpTimeoutMillis
w: "majority" Writes Can lose confirmed w: "majority" writes. Guarantees the preservation of confirmed w: "majority" writes
No Primary More likely Less likely
Vetoes Supported Not Needed
Back-to-back Elections Less frequent, 30 second buffer More likely, no buffer
Arbiter Less likely to lose w: 1 writes More likely to lose w: 1 writes

Modify Replica Set Protocol Version

To change the replica set protocol version, reconfigure (rs.reconfig) the replica set with the new protocolVersion. For example, to upgrade to pv1, connect a mongo shell to the current primary and perform the following sequence of operations:

cfg = rs.conf();
cfg.protocolVersion=1;
rs.reconfig(cfg);

To reduce the likelihood of w:1 rollbacks, you can also reconfigure the replica set to a higher settings.catchUpTimeoutMillis setting.