Navigation

Known Issues in the MongoDB Enterprise Kubernetes Operator

Update Google Firewall Rules to Fix WebHook Issues

When you deploy Kubernetes Operator to GKE private clusters, the MongoDB Kubernetes resources or MongoDBOpsManager resource creation could time out. The following message might appear in the logs:

Error setting state to reconciling: Timeout: request did not complete within requested timeout 30s”.

Google configures its firewalls to restrict access to your Kubernetes Pods. To use the webhook service, add a new firewall rule to grant GKE control plane access to your webhook service.

The Kubernetes Operator webhook service runs on port 443.

Configure Persistent Storage Correctly

If there are no persistent volumes available when you create a resource, the resulting Pod stays in transient state and the Operator fails (after 20 retries) with the following error:

Failed to update Ops Manager automation config: Some agents failed to register

To prevent this error, either:

For testing only, you may also set persistent : false. This must not be used in production, as data is not preserved between restarts.

Remove Resources before Removing Kubernetes

Sometimes Ops Manager can diverge from Kubernetes. This mostly occurs when Kubernetes resources are removed manually. Ops Manager can keep displaying an Automation Agent which has been shut down.

If you want to remove deployments of MongoDB on Kubernetes, use the resource specification to delete resources first so no dead Automation Agents remain.

Create Separate Namespaces for Kubernetes Operator and MongoDB Resources

The best strategy is to create Kubernetes Operator and its resources in different namespaces so that the following operations would work correctly:

kubectl delete pods --all

or

kubectl delete namespace mongodb

If the Kubernetes Operator and resources sit in the same mongodb namespace, then operator would also be removed in the same operation. This would mean that it could not clean the configurations, which would have to be done in the Ops Manager Application.

HTTPS Enabled After Deployment

We recommend that you enable HTTPS before deploying your Ops Manager resources. However, if you enable HTTPS after deployment, your managed resources can no longer communicate with Ops Manager and the Kubernetes Operator reports your resources’ status as Failed.

To resolve this issue, you must delete your Pods by running the following command for each Pod:

kubectl delete pod <replicaset-pod-name>

After deletion, Kubernetes automatically restarts the deleted Pods. During this period, the resource is unreachable and incurs downtime.

Difficulties with Updates

In some cases, the Kubernetes Operator can stop receiving change events. As this problem is hard to reproduce, the recommended workaround is to delete the operator pod. Kubernetes starts the new Kubernetes Operator automatically and starts working correctly:

kubectl get pods;
kubectl delete pod mongodb-enterprise-operator-<podId>`

Unable to Update the MongoDB Agent on Application Database Pods

You can’t use Ops Manager to upgrade the MongoDB Agents that run on the Application Database Pods. The MongoDB Agent version that runs on these Pods is embedded in the Application Database Docker image.

You can use the Kubernetes Operator to upgrade the MongoDB Agent version on Application Database Pods as MongoDB publishes new images.

Machine Memory vs. Container Memory

MongoDB versions older than 3.6.13, 4.0.9, and 4.1.9 report host system RAM, not container RAM.