Reference >
Troubleshooting the Kubernetes Operator

Troubleshooting the Kubernetes Operator¶

On this page

Get Status of a Deployed Resource
Review the Logs
View All MongoDB Kubernetes resource Specifications
Restore StatefulSet that Failed to Deploy
Replace a ConfigMap to Reflect Changes
Remove Kubernetes Components
Disable Ops Manager Feature Controls
Debugging a Failing Container

Get Status of a Deployed Resource¶

To find the status of a resource deployed with the Kubernetes Operator, invoke one of the following commands:

For Ops Manager resource deployments:
copy
```
kubectl get <resource-name> -n <namespace> -o yaml
```
- The status.applicationDatabase.phase field displays the Application Database resource deployment status.
- The status.backup.phase displays the backup daemon resource deployment status.
- The status.opsManager.phase field displays the Ops Manager resource deployment status.
Note

The opsManager controller watches the database resources defined in the following settings:
- spec.backup.opLogStores
- spec.backup.s3Stores
- spec.backup.blockStores
For MongoDB resource deployments:
copy
```
kubectl get mdb <resource-name> -n <namespace> -o yaml
```
The status.phase field displays the MongoDB resource deployment status.

The following key-value pairs describe the resource deployment statuses:

Key Value

message Message explaining why the resource is in a Pending or Failed state.

phase

Status	Meaning
`Pending`	The Kubernetes Operator is unable to reconcile the resource deployment state. This happens when a reconciliation times out or if the Kubernetes Operator requires you to take action for the resource to enter a running state. If a resource is pending because a reconciliation timed out, the Kubernetes Operator attempts to reconcile the resource state in 10 seconds.
`Reconciling`	The Kubernetes Operator is reconciling the resource state. Resources enter this state after you create or update them or if the Kubernetes Operator is attempting to reconcile a resource previously in a `Pending` or `Failed` state. The Kubernetes Operator attempts to reconcile the resource state in 10 seconds.
`Running`	The resource is running properly.
`Failed`	The resource is not running properly. The `message` field provides additional details. The Kubernetes Operator attempts to reconcile the resource state in 10 seconds.

lastTransition Timestamp in ISO 8601 date and time format in UTC when the last reconciliation happened.

link Deployment URL in Ops Manager.

Resource specific fields For descriptions of these fields, see MongoDB Database Resource Specification.

Example

If you want to see what the status of a replica set named my-replica-set in the developer namespace, run:

copy

kubectl get mdb my-replica-set -n developer -o yaml

If my-replica-set is running, you should see:

copy

status:
    lastTransition: "2019-01-30T10:51:40Z"
    link: http://ec2-3-84-128-187.compute-1.amazonaws.com:9080/v2/5c503a8a1b90141cbdc60a77
    members: 1
    phase: Running
    version: 4.2.2-ent

If my-replica-set is not running, you should see:

copy

status:
  lastTransition: 2019-02-01T13:00:24Z
  link: http://ec2-34-204-36-217.compute-1.amazonaws.com:9080/v2/5c51c040d6853d1f50a51678
  members: 1
  message: 'Failed to create/update replica set in Ops Manager: Status: 400 (Bad Request),
    Detail: Something went wrong validating your Automation Config. Sorry!'
  phase: Failed
  version: 4.2.2-ent

Review the Logs¶

Review Logs from the Kubernetes Operator¶

To review the Kubernetes Operator logs, invoke this command:

copy

kubectl logs -f deployment/mongodb-enterprise-operator -n <metadata.namespace>

You could check the Ops Manager Logs as well to see if any issues were reported to Ops Manager.

Find a Specific Pod¶

To find which pods are available, invoke this command first:

copy

kubectl get pods -n <metadata.namespace>

Review Logs from Specific Pod¶

If you want to narrow your review to a specific pod, you can invoke this command:

copy

kubectl logs <podName> -n <metadata.namespace>

Example

If your replica set is labeled myrs, the pod log command is invoked as:

copy

kubectl logs myrs-0 -n <metadata.namespace>

This returns the Automation Agent Log for this replica set.

View All MongoDB Kubernetes resource Specifications¶

To view all MongoDB Kubernetes resource specifications in the provided namespace:

copy

kubectl get mdb -n <namespace>

Example

To read details about the dublin standalone resource, invoke this command:

copy

kubectl get mdb dublin -n <namespace> -o yaml

This returns the following response:

apiVersion: mongodb.com/v1
kind: MongoDB
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"mongodb.com/v1","kind":"MongoDB","metadata":{"annotations":{},"name":"dublin","namespace":"mongodb"},"spec":{"credentials":"credentials","persistent":false,"podSpec":{"memory":"1Gi"},"project":"my-om-config","type":"Standalone","version":"4.0.0-ent"}}
  clusterDomain: ""
  creationTimestamp: 2018-09-12T17:15:32Z
  generation: 1
  name: dublin
  namespace: mongodb
  resourceVersion: "337269"
  selfLink: /apis/mongodb.com/v1/namespaces/mongodb/mongodbstandalones/dublin
  uid: 7442095b-b6af-11e8-87df-0800271b001d
spec:
  credentials: my-credentials
  type: Standalone
  persistent: false
  podSpec:
    memory: 1Gi
  project: my-om-config
  version: 4.2.2-ent

Restore StatefulSet that Failed to Deploy¶

A StatefulSet pod may hang with a status of Pending if it encounters an error during deployment.

Pending pods do not automatically terminate, even if you make and apply configuration changes to resolve the error.

To return the StatefulSet to a healthy state, apply the configuration changes to the MongoDB resource in the Pending state, then delete those pods.

Example

A host system has a number of running pods:

kubectl get pods

my-replica-set-0     1/1 Running 2 2h
my-replica-set-1     1/1 Running 2 2h
my-replica-set-2     0/1 Pending 0 2h

my-replica-set-2 is stuck in the Pending stage. To gather more data on the error, run the following:

kubectl describe pod my-replica-set-2

<describe output omitted>

Warning FailedScheduling 15s (x3691 over 3h) default-scheduler 0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient memory.

The output indicates an error in memory allocation.

Updating the memory allocations in the MongoDB resource is insufficient, as the pod does not terminate automatically after applying configuration updates.

To remedy this issue, update the configuration, apply the configuration, then delete the hung pod:

copy

vi <my-replica-set>.yaml

kubectl apply -f <my-replica-set>.yaml

kubectl delete pod my-replica-set-2

Once this hung pod is deleted, the other pods restart with your new configuration as part of rolling upgrade of the Statefulset.

Note

To learn more about this issue, see Kubernetes Issue 67250.

Replace a ConfigMap to Reflect Changes¶

If you are unable to modify or redeploy an already-deployed configMap file using the kubectl apply command, invoke the following:

copy

kubectl replace -f <my-config-map>.yaml

This will delete and re-create a resource. This command is useful in cases where you need to update resource files that cannot be updated once initialized or want to make an immediate recursive change.

Remove Kubernetes Components¶

Important

To remove any component, you need the following permissions:

Cluster Roles	`mongodb-enterprise-operator-mongodb-webhook` `mongodb-enterprise-operator-mongodb-certs`
Cluster Role Bindings	`mongodb-enterprise-operator-mongodb-webhook-binding` `mongodb-enterprise-operator-mongodb-certs`

Remove a MongoDB Kubernetes resource¶

To remove any instance that Kubernetes deployed, you must use Kubernetes.

Important

You can only use the Kubernetes Operator to remove Kubernetes-deployed instances. If you use Ops Manager to remove the instance, Ops Manager throws an error.

Example

To remove a single MongoDB instance you created using Kubernetes:

copy

kubectl delete mdb <name> -n <metadata.namespace>

To remove all MongoDB instances you created using Kubernetes:

copy

kubectl delete mdb --all -n <metadata.namespace>

Remove the Kubernetes Operator¶

To remove the Kubernetes Operator:

Remove all Kubernetes resources:

copy

kubectl delete mdb --all -n <metadata.namespace>

Remove the Kubernetes Operator:

copy

kubectl delete deployment mongodb-enterprise-operator -n <metadata.namespace>

Remove the CustomResourceDefinitions ¶

To remove the CustomResourceDefinitions:

Remove all Kubernetes resources:

copy

kubectl delete mdb --all -n <metadata.namespace>

Remove the CustomResourceDefinitions:

copy

kubectl delete crd MongoDB
kubectl delete crd MongoDBUSer
kubectl delete crd MongoDBOpsManager

Remove the namespace ¶

To remove the namespace:

Remove all Kubernetes resources:

copy

kubectl delete mdb --all -n <metadata.namespace>

Remove the namespace:

copy

kubectl delete namespace <metadata.namespace>

Disable Ops Manager Feature Controls¶

When you manage an Ops Manager project through the Kubernetes Operator, the Kubernetes Operator places the EXTERNALLY_MANAGED_LOCK feature control policy on the project. This policy disables certain features in the Ops Manager application that might compromise your Kubernetes Operator configuration. If you need to use these blocked features, you can remove the policy through the feature controls API, make changes in the Ops Manager application, and then restore the original policy through the API.

Warning

The following procedure enables you to use features in the Ops Manager application that are otherwise blocked by the Kubernetes Operator.

Retrieve the feature control policies for your Ops Manager project.

copy

curl --user "{USERNAME}:{APIKEY}" --digest \
     --header "Accept: application/json" \
     --header "Content-Type: application/json" \
     --include \
     --request GET "https://{OPSMANAGER-HOST}:{PORT}/api/public/v1.0/groups/{PROJECT-ID}/controlledFeature?pretty=true"

Save the response that the API returns. After you make changes in the Ops Manager application, you must add these policies back to the project.

Important

Note the highlighted fields and values in the following sample response. You must send these same fields and values in later steps when you remove and add feature control policies.

The externalManagementSystem.version field corresponds to the Kubernetes Operator version. You must send the exact same field value in your requests later in this task.

Your response should be similar to:

copy

{
 "created": "2020-02-25T04:09:42Z",
 "externalManagementSystem": {
   "name": "mongodb-enterprise-operator",
   "systemId": null,
   "version": "1.4.2"
 },
 "policies": [
   {
     "disabledParams": [],
     "policy": "EXTERNALLY_MANAGED_LOCK"
   },
   {
     "disabledParams": [],
     "policy": "DISABLE_AUTHENTICATION_MECHANISMS"
   }
 ],
 "updated": "2020-02-25T04:10:12Z"
}

Update the policies array with an empty list:

Note

The values you provide for the externalManagementSystem object, like the externalManagementSystem.version field, must match values that you received in the response in Step 1.

copy

curl --user "{USERNAME}:{APIKEY}" --digest \
     --header "Accept: application/json" \
     --header "Content-Type: application/json" \
     --include \
     --request PUT "https://{OPSMANAGER-HOST}:{PORT}/api/public/v1.0/groups/{PROJECT-ID}/controlledFeature?pretty=true" \
     --data
       '{
         "externalManagementSystem": {
           "name": "mongodb-enterprise-operator",
           "systemId": null,
           "version": "1.4.2"
         },
         "policies": []
       }'

The previously blocked features are now available in the Ops Manager application.

Make your changes in the Ops Manager application.

Update the policies array with the original feature control policies:

Note

The values you provide for the externalManagementSystem object, like the externalManagementSystem.version field, must match values that you received in the response in Step 1.

copy

curl --user "{USERNAME}:{APIKEY}" --digest \
     --header "Accept: application/json" \
     --header "Content-Type: application/json" \
     --include \
     --request PUT "https://{OPSMANAGER-HOST}:{PORT}/api/public/v1.0/groups/{PROJECT-ID}/controlledFeature?pretty=true" \
     --data
       '{
         "externalManagementSystem": {
           "name": "mongodb-enterprise-operator",
           "systemId": null,
           "version": "1.4.2"
         },
         "policies": [
           {
             "disabledParams": [],
             "policy": "EXTERNALLY_MANAGED_LOCK"
           },
           {
             "disabledParams": [],
             "policy": "DISABLE_AUTHENTICATION_MECHANISMS"
           }
         ]
       }'

The features are now blocked again, preventing you from making further changes through the Ops Manager application. However, the Kubernetes Operator retains any changes you made in the Ops Manager application while features were available.

Debugging a Failing Container¶

A container might fail with an error that results in Kubernetes restarting that container in a loop.

You may need to interact with that container to inspect files or run commands. This requires you to prevent the container from restarting.

In your preferred text editor, open the MongoDB resource you need to repair.
To this resource, add a podSpec collection that resembles the following.
copy
```
podSpec:
  podTemplate:
    spec:
      containers:
      - name: mongodb-enterprise-database
        command: ['sh', '-c', 'echo "Hello!" && sleep 3600' ]
```
The sleep command in the spec.podSpec.podTemplate.spec instructs the container to wait for the number of seconds you specify. In this example, the container will wait for 1 hour.
Apply this change to the resource.
copy
```
kubectl apply -f <resource>.yaml
```
Invoke the shell inside the container.
copy
```
kubectl exec -it <pod-name> bash
```

← MongoDB Kubernetes Operator Exclusive Settings Support Lifecycle →

Troubleshooting the Kubernetes Operator¶

Get Status of a Deployed Resource¶

Review the Logs¶

Review Logs from the Kubernetes Operator¶

Find a Specific Pod¶

Review Logs from Specific Pod¶

View All MongoDB Kubernetes resource Specifications¶

Restore StatefulSet that Failed to Deploy¶

Replace a ConfigMap to Reflect Changes¶

Remove Kubernetes Components¶

Remove a MongoDB Kubernetes resource¶

Remove the Kubernetes Operator¶

Remove the CustomResourceDefinitions¶

Remove the namespace¶

Disable Ops Manager Feature Controls¶

Debugging a Failing Container¶

Remove the CustomResourceDefinitions ¶

Remove the namespace ¶