Introducing the MongoDB Enterprise Operator for Kubernetes

Today more DevOps teams are leveraging the power of containerization and technologies like Kubernetes to manage containerized database clusters. To support teams building cloud-native apps with Kubernetes, MongoDB has made the MongoDB Enterprise Operator for Kubernetes generally available. This native Kubernetes Operator enables a user to deploy and manage MongoDB clusters from within the Kubernetes API.

With the MongoDB Enterprise Operator for Kubernetes you can consistently and effortlessly deploy and manage MongoDB workloads all through a simple, declarative configuration. Kubernetes Operators are application-specific controllers that extend the Kubernetes API to create, configure, and manage instances of stateful applications such as databases. On self-managed infrastructure – whether on-premises or in the cloud – Kubernetes users can use the MongoDB Enterprise Operator for Kubernetes and MongoDB Ops Manager to automate and manage MongoDB clusters. You have full control over your MongoDB deployment from a single Kubernetes control plane. You can use the operator with upstream Kubernetes, or with any popular distributions such as Red Hat OpenShift and Pivotal Container Service (PKS).

The MongoDB Enterprise Operator for Kubernetes works with MongoDB Ops Manager or MongoDB Cloud Manager and provides your Kubernetes environment with the following benefits:

  • Deploy, scale and automate MongoDB clusters of any type or size from standalone to sharded clusters
  • Enforce standardized cluster configurations such as security settings, resilience, resource limits, and more
  • Centralized logging that enables you to collect, analyze and store logs from across the cluster all in the tool of your choice

In this blog, we’ll cover the following:

  • MongoDB Ops Manager Overview

  • MongoDB Enterprise Operator for Kubernetes Overview

  • How to Install and configure the MongoDB Enterprise Operator for Kubernetes

  • Persistent Storage

  • Connecting your application

  • Troubleshooting

  • Where to go for more information

To properly understand the functionality of the MongoDB Enterprise Operator for Kubernetes, you need to have a solid grasp of what MongoDB Ops Manager does. That's because everything the Operator does is performed on its behalf by Ops Manager.

MongoDB Ops Manager Overview

Ops Manager is an enterprise class management platform for MongoDB clusters that you run on your own infrastructure. The capabilities of Ops Manager include monitoring, alerting, disaster recovery, scaling, deploying and upgrading of replica sets and sharded clusters, and managing other MongoDB products, such as the BI Connector. While a thorough discussion of Ops Manager is out of scope of this blog it is important to understand the basic components that make up Ops Manager as they will be used by the Kubernetes Operator to create your deployments..

Ops Manager Deployment Screen
Figure 10: MongoDB Ops Manager deployment screen

A simplified Ops Manager architecture is shown in figure 11 below. For complete information on MongoDB Ops Manager architecture see the online documentation found at the following URL: https://docs.opsmanager.mongodb.com/current/

Simplified Ops Manager deployment
Figure 11: Simplified Ops Manager deployment

The MongoDB HTTP Service provides a web application for administration. These pages are simply a front end to a robust set of Ops Manager REST APIs that are hosted in the Ops Manager HTTP Service. It is through these REST APIs that the Kubernetes Operator will interact with Ops Manager.

If using Cloud Manager or OpsManager version 4.1 and above, the Monitoring Agent, Backup Agent and Automation agents have been combined into a single agent called the MongoDB Agent that performs all three functions.

MongoDB Automation Agent

With a typical Ops Manager deployment there are many management options including upgrading the cluster to a different version, adding secondaries to an existing replica set and converting an existing replica set into a sharded cluster. So how does Ops Manager go about upgrading each node of a cluster or spinning up new MongoD instances?

It does this by relying on a locally installed service called the Ops Manager Automation Agent which runs on every single MongoDB node in the cluster. This lightweight service is available on multiple operating systems so regardless if your MongoDB nodes are running in a Linux Container or Windows Server virtual machine or your on-prem PowerPC Server, there is an Automation Agent available for that platform. The Automation Agents receive instructions from Ops Manager REST APIs to perform work on the cluster node.

MongoDB Monitoring Agent

When Ops Manager shows statistics such as database size and inserts per second it is receiving this telemetry from the individual nodes running MongoDB. Ops Manager relies on the Monitoring Agent to connect to your MongoDB processes, collect data about the state of your deployment, then send that data to Ops Manager. There can be one or more Monitoring Agents deployed in your infrastructure for reliability but only one primary agent per Ops Manager Project is collecting data.

Ops Manager is all about automation and as soon as you have the automation agent deployed, other supporting agents like the Monitoring agent are deployed for you. In the scenario where the Kubernetes Operator has issued a command to deploy a new MongoDB cluster in a new project, Ops Manager will take care of deploying the monitoring agent into the containers running your new MongoDB cluster.

MongoDB Enterprise Operator for Kubernetes Overview

Through the use of Custom Resources, Kubernetes enables users to add custom objects to the Kubernetes cluster and leverage them like any other native Kubernetes object. These third-party custom resources are defined within Kubernetes Operators. The MongoDB Enterprise Operator for Kubernetes exposes two new custom resources: MongoDB and MongoDB User. This capability allows for integration of third-party resources like MongoDB to be managed through a single Kubernetes API.

Below is an example of deploying a MongoDB replica set using the MongoDB custom resource:

apiVersion: mongodb.com/v1
kind: MongoDB
metadata:
  name: my-replica-set-name
spec:
  members: 3
  version: 4.0.10-ent
  type: ReplicaSet

project: my-project
credentials: my-credentials

persistent: true  

Granular control

While the above example is simple in nature, the MongoDB Enterprise Operator for Kubernetes allows you granular control over your deployments. For example, you can control how many resources are provisioned and allocated for each Pod by defining the podSpec section of the custom resource definition file. For example, you can define the CPU, memory sizes and persistent storage configuration parameters as shown in the following example:

  podSpec:
    cpu: '0.25'
    memory: 512M
    persistence:
      multiple:
        data:
          storage: 10Gi
        journal:
          storage: 1Gi
          labelSelector:
            matchLabels:
              app: "my-app"
        logs:
          storage: 500M
          storageClass: standard

Changing any of these parameters will trigger the MongoDB Enterprise Operator for Kubernetes to automatically change the MongoDB Cluster or Pods to correctly match the new requirements. Changes are performed as a rolling update without downtime to your database and application.

Consistency

Every MongoDB cluster will be continuously monitored. The MongoDB Enterprise Operator for Kubernetes continuously collects information about each MongoDB deployment and assesses its state. Any deviations from a desired state triggers an automatic failure recovery to ensure MongoDB operates within its desired configuration. Consider the scenario where a pod failure occurs. The operator will connect to the Kubernetes cluster controller and deploy a new pod, remount any persistent volumes, and reinitialize the MongoDB node configuration.

The ability to keep the configuration of a MongoDB cluster consistent is not only important from an infrastructure perspective but also from a security perspective. For example, MongoDB publishes a security checklist that lists recommendations for settings on MongoDB clusters. The operator can help you enforce these recommended security settings across your MongoDB estate. The Custom Resource file below defines the use of TLS for a cluster.

apiVersion: mongodb.com/v1
kind: MongoDB
metadata:
  name: my-tls-enabled-rs
spec:
  members: 3
  version: 4.0.11

  project: my-project
  credentials: my-credentials
  security:
    tls:
      enabled: true
      authenticationMode: x509

Log anywhere

Kubernetes provides a centralized logging infrastructure for all applications and services to funnel logs into a single pipeline. This allows users to have a centralized subsystem that can collect, analyze and store logs from across the Kubernetes cluster and all applications running in it. Each MongoDB cluster can output its logs into this pipeline making it available for your favorite log aggregators. You can also push events into MongoDB Ops Manager for viewing, alerting and making performance tuning recommendations.

Runs on any upstream Kubernetes Distribution

The MongoDB Enterprise Kubernetes Operator for Kubernetes can run on the majority of Kubernetes distributions from versions 1.11 and higher. Any Kubernetes distribution taken from upstream will support the MongoDB Enterprise Operator for Kubernetes. These distributions include implementations such as Amazon Elastic Container Service for Kubernetes, Google’s Kubernetes Engine, RedHat OpenShift, and Pivotal Cloud Foundry.

Getting started with MongoDB Enterprise Operator for Kubernetes

Ops Manager is an integral part of automating a MongoDB cluster with Kubernetes. To get started you will need access to an Ops Manager 4.0.11+ environment or to MongoDB Cloud Manager, which is a fully managed version of Ops Manager which is hosted in the cloud.

The MongoDB Enterprise Operator for Kubernetes is compatible with Kubernetes v1.11 and above. You will need access to a Kubernetes environment. If you do not have access to a Kubernetes environment, or just want to stand up a test environment, you can use minikube which deploys a local single node Kubernetes cluster on your machine. For additional information and setup instructions check out the following URL: https://kubernetes.io/docs/setup/minikube.

The following sections will cover the three step installation and configuration of the MongoDB Enterprise Operator for Kubernetes. The installation order is as follows:

  • Step 1: Installing the MongoDB Enterprise Operator via a helm or yaml file
  • Step 2: Creating and applying a Kubernetes ConfigMap file
  • Step 3: Create the Kubernetes secret object which will store the Ops Manager API Key
  • Step 4: Deploying a MongoDB Replica Set

Step 1: Installing MongoDB Enterprise Operator for Kubernetes

To install the MongoDB Enterprise Operator for Kubernetes you can use helm, the Kubernetes package manager, or pass a yaml file to kubectl. Before we do that, we need to let Kubernetes know about new Custom Resources that Operator will manage.

kubectl apply -f crds.yaml

The instructions for both of these methods is as follows, pick one and continue to step 2.

To install the operator via Helm:

To install with Helm you will first need to clone the public repository.

https://github.com/mongodb/mongodb-enterprise-kubernetes.git

Change directory into the local copy and run the following command on the command line:

helm install helm_chart/ --name mongodb-enterprise

To install the operator via a yaml file:

To install with using yaml, run the following command from the command line:

kubectl apply -f https://raw.githubusercontent.com/mongodb/mongodb-enterprise-kubernetes/master/mongodb-enterprise.yaml

This retrieves the operator configuration from the MongoDB Github repository and applies it to your Kubernetes system.

At this point the MongoDB Enterprise Operator for Kubernetes is installed and will now need to be configured. First, we must create and apply a Kubernetes ConfigMap file. A Kubernetes ConfigMap file holds key-value pairs of configuration data that can be consumed in pods or used to store configuration data. In this use case, the ConfigMap file will store configuration information about the Ops Manager deployment we want to use.

Step 2: Creating the Kubernetes ConfigMap file

For the Kubernetes Operator to know what Ops Manager you want to use you will need to obtain some properties from the Ops Manager console and create a ConfigMap file. These properties are as follows:

ValueDescription
Base UrlThe URL of your Ops Manager or Cloud Manager.
Project IdThe id of an Ops Manager Project which the Kubernetes Operator will deploy into.
UserAn existing Ops Manager username
Public API KeyUsed by the Kubernetes Operator to connect to the Ops Manager REST API endpoint

If you already know how to obtain these values copy them down and proceed to Step 3.

Base Url

The Base Url is the URL of your Ops Manager or Cloud Manager.

Note: If you are using Cloud Manager the Base Url is https://cloud.mongodb.com

To obtain the Base Url in Ops Manager, copy the Url used to connect to your Ops Manager server from your browser's navigation bar. It should be something similar to http://servername:8080. You can also perform the following:

Login to Ops Manager and click on the Admin button. Next, select the “Ops Manager Config” menu item. You will be presented with a screen similar to the figure below:

Ops Manager Config Page
Figure 1: Ops Manager Config page

Copy down the value displayed in the URL To Access Ops Manager input field. Note that if you don’t have access to the Admin drop down, you will have to copy the Url used to connect to your Ops Manager server from your browser's navigation bar.

Project Id

The Project Id is the id of an Ops Manager Project which the Kubernetes Operator will deploy into.

An Ops Manager Project is a logical organization of MongoDB clusters and also provides a security boundary. One or more Projects make up an Ops Manager Organization. If you need to create an Organization click on your user name at the upper right side of the screen and select, “Organizations”. Next click on the “+ New Organization” button and provide a name for your Organization. Once you have an Organization you can create a Project.

Ops Manager Organizations
Figure 2: Ops Manager Organizations page

To create a new Project, click on your Organization name. This will bring you to the Projects page and from here click on the “+ New Project” button and provide a unique name for your Project. If you are not an Ops Manager administrator you may not have this option and will have to ask your administrator to create a Project.

Once the Project is created or if you already have a Project created on your behalf by an administrator you can obtain the Project Id by clicking on the Settings menu option as shown below.

Ops Manager Project Settings
Figure 3: Project Settings page

Copy the Project ID.

User

The User is an existing Ops Manager username

To see the list of Ops Manager users, return to the Project and click on the “Users & Teams” menu. You can use any Ops Manager user who has at least Project Owner access. If you’d like to create another username click on the “Add Users & Team” button as shown in figure 4.

Users and Teams page
Figure 4: Users & Teams page

Copy down the email address of the user you would like the Kubernetes Operator to use when connecting to Ops Manager.

Public API Key

The Ops Manager API Key is used by the Kubernetes Operator to connect to the Ops Manager REST API endpoint.

Note: If you are using Ops Manager version 4.1 and above or Cloud Manager follow the steps in the Programmatic API key documentation to create the API key.

You can create an API Key by clicking on your username on the upper right hand corner of the Ops Manager console and selecting, “Account” from the drop down menu. This will open the Account Settings page as shown in figure 5.

Public API Access page
Figure 5: Public API Access page

Click on the “Public API Access” tab. To create a new API key click on the “Generate” button and provide a description. Upon completion you will receive an API key as shown in figure 6.

Confirm API Key dialog
Figure 6: Confirm API Key dialog

Be sure to copy the API Key as it will be used later as a value in a configuration file. It is important to copy this value while the dialog is visible because you will not be able to retrieve it once you close the dialog. If you missed making a note of the value, you will have to delete the API Key and create a new one.

Note: If you are using MongoDB Cloud Manager or have Ops Manager deployed in a secured network you may need to whitelist the IP range of your Kubernetes cluster so that the Operator can make requests to Ops Manager using this API Key.

Creating the ConfigMap file

Now that we have acquired the necessary Ops Manager configuration information we need to create a Kubernetes ConfigMap file for the Kubernetes Project. To do this use a text editor of your choice and create the following yaml file, substituting the bold placeholders for the values you obtained in the Ops Manager console. For sample purposes we can call this file “my-project.yaml”.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: <<Name i.e. name of OpsManager Project>>
  namespace: mongodb
data:
  projectId: <<Project ID>>
  baseUrl: <<OpsManager URL>>
Figure 7: Sample ConfigMap file

Note: If you ever notice errors in the Operator logs or if your clusters that remain in Pending or Failed status, be sure to check with the MongoDB documentation for the latest ConfigMap file format. The format of the ConfigMap file may change over time as features and capabilities get added to the Operator.

Once you create this file you can apply the ConfigMap to Kubernetes using the following command:

kubectl apply -f my-project.yaml

Step 3: Creating the Kubernetes Secret

For a user to be able to create or update objects in an Ops Manager Project they need a Public API Key. Earlier in this section we created a new API Key and you hopefully wrote it down. This API Key will be held by Kubernetes as a Secret object.

You can create this Secret with the following command:

kubectl -n mongodb create secret generic <<Name of credentials>> --from-literal="user=<<User>>" --from-literal="publicApiKey=<<public-api-key>>"

Make sure you replace the User and Public API key values with those you obtained from your Ops Manager console. You can pick any name for the credentials – just make a note of it as you will need it later when you start creating MongoDB clusters.

Now we're ready to start deploying MongoDB!

Step 4: Deploying a MongoDB Replica Set

Kubernetes can deploy a MongoDB standalone, replica set or a sharded cluster. To deploy a 3 node replica set create the following yaml file:

---
apiVersion: mongodb.com/v1
kind: MongoDB
metadata:
  name: <<Name of your new MongoDB replica set>>
  namespace: mongodb
spec:
  members: 3
  type: replicaset
  version: 4.0.11

  persistent: false

  project: <<Name value specified in metadata.name of ConfigMap file>>
  credentials: <<Name of credentials secret>>
Figure 8: simple-rs.yaml file describing a three node replica set

The name of your new cluster can be any name you chose. The name of the OpsManager Project config map and the name of credentials secret were defined previously.

To submit the request for Kubernetes to create this cluster simply pass the name of the yaml file you created to the following kubectl command:

kubectl apply -f simple-rs.yaml

After a few minutes your new cluster will show up in Ops Manager as shown in Figure 9.

Servers tab of the Deployment page in Ops Manager
Figure 9: Servers tab of the Deployment page in Ops Manager

Notice that Ops Manager installed not only the Automation Agents on these three containers running MongoDB, it also installed Monitoring Agent and Backup Agents.

Starting with MongoDB 4.2, Ops Manager has combined the Automation Agent, Backup Agent and Monitoring Agent into a single agent.

A word on persistent storage

What good would a database be if anytime the container died your data went to the grave as well? Probably not a good situation and maybe one where tuning up the resumé might be a good thing to do as well. Historically, the lack of persistent storage and consistent DNS mappings were major issues with running databases within containers. The Kubernetes ecosystem addressed these concerns with features like PersistentVolumes and StatefulSets that allow you to deploy databases like MongoDB without worrying about losing data because of hardware failure or the container moved elsewhere in your datacenter.

Additional configuration of the storage is required on the Kubernetes cluster before you can deploy a MongoDB cluster that uses persistent storage. In Kubernetes there are two types of persistent volumes: static and dynamic. The Kubernetes Operator can provision MongoDB objects (i.e. standalone, replica set and sharded clusters) using either type.

For the best performance of MongoDB, persistent local storage could be used. Promoted to general availability in Kubernetes 1.14. A local persistent volume represents a local disk directly-attached to a single Kubernetes Node.

Connecting your application

Connecting to MongoDB deployments in Kubernetes is no different than other deployment topologies. However, it is likely that you'll need to address the network specifics of your Kubernetes configuration. To abstract the deployment specific information such as hostnames and ports of your MongoDB deployment, the MongoDB Enterprise Operator for Kubernetes uses Kubernetes Services.

Services

Each MongoDB deployment type will have two Kubernetes services generated automatically during provisioning. For example, suppose we have a single 3 node replica set called "my-replica-set", then you can enumerate the services using the following statement:

kubectl get all -n mongodb --selector=app=my-replica-set-svc

This statement yields the following results:

NAME                   READY     STATUS    RESTARTS   AGE
pod/my-replica-set-0   1/1       Running   0          29m
pod/my-replica-set-1   1/1       Running   0          29m
pod/my-replica-set-2   1/1       Running   0          29m

NAME                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
service/my-replica-set-svc            ClusterIP   None             <none>        27017/TCP         29m
service/my-replica-set-svc-external   NodePort    10.103.220.236   <none>        27017:30057/TCP   29m

NAME                              DESIRED   CURRENT   AGE
statefulset.apps/my-replica-set   3         3         29m

Note the appended string "-svc" to the name of the replica set.

The service with "-external" is a NodePort - which means it's exposed to the overall cluster DNS name on port 30057.

Note: If you are using Minikube you can obtain the IP address of the running replica set by issuing the following: minikube service list

In our example which used minikube the result set contained the following information:

 mongodb     my-replica-set-svc-external   http://192.168.39.95:30057 

Now that we know the IP of our MongoDB cluster we can connect using the Mongo Shell or whatever application or tool you would like to use.

Basic Troubleshooting

If you are having problems submitting a deployment you should read the logs. Issues like authentication issues and other common problems can be easily detected in the log files. You can view the MongoDB Enterprise Operator for Kubernetes log files via the following command:

kubectl logs -f deployment/mongodb-enterprise-operator -n mongodb

You can also use kubectl to see the logs of the database pods. The main container processes is continually tailing the Automation Agent and mongod logs and can be seen with the following statement:

kubectl logs <<name of pod>> -n mongodb 

It's worth knowing that you can also enumerate the list of pods using kubectl get pods -n mongodb

Another common troubleshooting technique is to ssh into one of the containers running MongoDB. Once there, you can use common Linux tools to view the processes, troubleshoot, or even check mongo shell connections (sometimes helpful in diagnosing network issues).

kubectl exec -it <<name of pod>> -n mongodb -- /bin/bash

An example output of this command is as follows:

UID        PID  PPID  C STIME TTY          TIME CMD
mongodb      1     0  0 16:23 ?        00:00:00 /bin/sh -c supervisord -c /mongo
mongodb      6     1  0 16:23 ?        00:00:01 /usr/bin/python /usr/bin/supervi
mongodb      9     6  0 16:23 ?        00:00:00 bash /mongodb-automation/files/a
mongodb     25     9  0 16:23 ?        00:00:00 tail -n 1000 -F /var/log/mongodb
mongodb     26     1  4 16:23 ?        00:04:17 /mongodb-automation/files/mongod
mongodb     45     1  0 16:23 ?        00:00:01 /var/lib/mongodb-mms-automation/
mongodb     56     1  0 16:23 ?        00:00:44 /var/lib/mongodb-mms-automation/
mongodb     76     1  1 16:23 ?        00:01:23 /var/lib/mongodb-mms-automation/
mongodb   8435     0  0 18:07 pts/0    00:00:00 /bin/bash

From inside the container we can make a connection to the local MongoDB node easily by running the mongo shell via the following command:

/var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.6.5/bin/mongo --port 27017

Note: The version of the automation agent may be different than mentioned in the above example, be sure to check the directory path. As of OpsManager 4.2, all the agents have been combined into a single automation agent called the MongoDB Agent.

Where to go for more information

For more information check out the MongoDB Enterprise Operator for Kubernetes documentation website.

MongoDB Community Slack: #enterprise-kubernetes

GitHub: https://github.com/mongodb/mongodb-enterprise-kubernetes

To learn more about Kubernetes Operators: https://coreos.com/operators/

To see all MongoDB operations best practices, download our whitepaper: https://www.mongodb.com/collateral/mongodb-operations-best-practices