Building Your First Operator

by Adams Ombonga

Full-Stack Engineer, Red Hat Marketplace

A Peek into the world of Operators in Kubernetes

Over the past decade Kubernetes, containers, and micro-services have changed how we build, deploy, and manage applications at scale. These technologies, along with several other key innovations along the way, have transformed the cloud landscape, providing greater value through improved resource management, development speed, cost management, and more.

While all of this was game-changing, one thing that was lacking was robust management of stateful applications, such as databases, which usually required additional intervention for lifecycle management tasks or even deployment tasks; this is where Operators come in. In this article, we will explore how to make use of the Operator pattern, particularly with the help of the Operator SDK, to solve the problem of managing stateful application within Kubernetes in a more automated fashion. In our example, we will be building an Operator that helps us deploy an Arcade web application containing a single game, Snake, onto a Kubernetes cluster.

What is an Operator

In the Kubernetes (k8s) world, Operators are software that run in an OpenShift (or plain k8s) cluster and allow administrators to automate the deployment and lifecycle management of complex stateful applications on top of k8s. In short, Operators allow for the ability to code operational knowledge from an administrator that is in turn used to manage a given application’s lifecycle; this can be critical for Day 2 operations for an application beyond just ensuring a deployment has occurred. While in this article, we will be focusing on Level 1 (Basic Install) of an Operator’s capabilities to keep things simple, Operators can do a whole lot more, including failure recovery, metrics analysis, and auto-healing; the levels and requirements for these different types of features are defined though through the Operator Maturity / Capabilities model.

We won’t be diving too deep into the technical details of Operators in this article, but you can learn more about Operators including their benefits, use-cases, and high level concepts from the OpenShift documentation.

Resources

Below is a list of additional online resources to learn more about Operators or the operator-sdk:

  1. Operator SDK
  2. Operator Pattern
  3. Kubebuilder Book

Environment Setup

Go will be the language of choice for building our Operator since the Operator SDK only supports Go, Ansible, and Helm, however, you can write Operators in any language.

You can find instructions for how to install Go on your system here.

Be sure to also configure the following variables in your environment:

$GOPATH=/your/preferred/path/
$GO111MODULE=on

GO111MODULE=on is required for Go module support

Cluster Prerequisites

  • Kubernetes (k8s) v1.11.3+
  • Kubernetes (k8s) cluster with admin privileges (crc, minikube or minishift can also work)

Operator SDK Setup

We will use the Operator SDK to build the scaffolding for our Operator, to help abstract most of the boilerplate required to build an Operator, leaving us to write the custom operational logic needed to manage our application.

Installation instructions for the Operator SDK can be found here. Other alternatives include Kubebuilder, the scaffolding engine that the Operator SDK v1.0.0+ uses, or kompf for python based k8s Operators.

Building the Operator

Scaffold

Let’s begin by creating an empty directory for our Operator, this will be our projects root directory:

mkdir arcade-operator
cd arcade-operator

We can then use the Operator SDK to scaffold the project with some boilerplate code by running the command below with a few flags from our project’s root directory, the domain does not have to be example.com.

Alternatively, you can simply run operator-sdk init and you will be prompted in the CLI for any necessary information, i.e. domain, repo, etc..

operator-sdk init --domain=example.com --repo=github.com/example-inc/arcade-operator

You should have a directory structure similar to the one below representing the boilerplate needed for our Operators scaffolding:

.
├── bin
├── config
│ ├── certmanager
│ ├── default
│ ├── manager
│ ├── prometheus
│ ├── rbac
│ ├── scorecard
│ │ ├── bases
│ │ └── patches
│ └── webhook
└── hack

As part of the scaffolding provided from the operator-sdk, a Makefile with several targets is generated in the projects root directory to aid with development/deployment.

API

The meat and potatoes of Operators lies within the controller logic, here we can define which resources we want to watch and what to do when we detect various changes. However, before we can do that we need to define an API that will help represent our application in a schema format that k8s will understand, better known as a Custom Resource Definition (CRD).

Creating the API that will extend the k8s API, letting k8s know about our application, and is also what our Operator will use to reconcile changes to achieve a given desired state when a Custom Resource (CR) matching our CRD is deployed. Creating the API can be done using the following command:

operator-sdk create api --version v1alpha1 --kind Arcade --group game --resource=true --controller=true

This command will create a new resource type under the folder apis/v1alpha1/arcade_types.go, representing the API types for our Operator. Since we also passed the --controllers flag as true, you should also have a new controllers folder in the project root with an arcade_controller.go file and a suite_test.go file for testing setup; arcade_controller.go represents the controller logic for our Operator.

Note, from the example above the arcade prefix for our controller is the kind specified when creating the api

Your directory structure should look similar to the one below representing the boilerplate after creating the API:

.
├── api
│ └── v1alpha1
├── bin
├── config
│ ├── certmanager
│ ├── crd
│ │ └── patches
│ ├── default
│ ├── manager
│ ├── prometheus
│ ├── rbac
│ ├── samples
│ ├── scorecard
│ │ ├── bases
│ │ └── patches
│ └── webhook
├── controllers
└── hack

arcade_types.go is where we can define the different Spec fields for our arcade Custom Resource (CR) by modifying the type definitions; these fields can include configuration settings for our application or settings for how we want k8s to manage our application. We can also define Status field types to indicate the status of an application at any point during lifecycle management; for example, if our application requires deploying a database prior to coming online, we can communicate the status of the Database and use that status within Operator’s logic to gracefully handle situations where the Database may not be available.

Be sure to run make generate whenever modifying the type definitions from arcade_types.go in order to update the auto-generated code created for the resource type.

Kubebuilder annotation markers for the API types can also be used to specify defaults or other properties for given spec fields, as shown below.

Snippet from arcade_types.go showing a simple spec field, with a few annotations.

// ArcadeSpec defines the desired state of Arcade, through defined fields
type ArcadeSpec struct {
// Size field used to determine total number of Arcade deployments. This field is optional
// +optional
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="Size",xDescriptors="urn:alm:descriptor:io.kubernetes:size"
Size int32 `json:"size,omitempty"`
}

Find a more complete list of possible annotations for kubebuilder here and operator-sdk markers here

Operator Logic

As mentioned before, Operators work by watching k8s resources and reacting to specific resource changes. This process is known as the reconcile-loop, and is where most of the Operators business logic lives, such as which resources to monitor and what actions to take when change occurs.

For our Operator’s business logic, we will want to ensure that a deployment exists for our application if we have an instance of our arcade deployed. Below is small snippet of what this logic looks like, a more complete example can be found here

// ... other logic
deployment := &appsv1.Deployment{}
err = r.Get(ctx, types.NamespacedName{Name: arcade.Name, Namespace: arcade.Namespace}, deployment)
if err != nil && errors.IsNotFound(err) {
dep := r.deploymentForArcade(arcade)
log.Info("Creating a new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
err = r.Create(ctx, dep)
if err != nil {
log.Error(err, "Failed to create new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
return ctrl.Result{}, err
}
return ctrl.Result{Requeue: true}, nil
} else if err != nil {
log.Error(err, "Failed to get Deployment")
return ctrl.Result{}, err
}

Similar logic can be adapted for any other resources that our Operator requires, for example Services, ReplicaSets, Routes, etc…

Running locally

Once the API is defined, the full schema definition of our Custom Resource (or CRD - CustomResourceDefinition) can be generated using make manifests; this will generate a CRD (with validation via OpenAPIv3 schema) under config/crd/bases/<group>.<domain>_<kind>.yaml. This generated schema is what the k8s API will validate against when creating Custom Resources (CR) in the cluster. CRs created with a spec containing anything other than fields defined in the CRD will either automatically be rejected or those fields will be ignored.

At this point, if you have access to a k8s or OpenShift cluster and are logged in as a user with correct privileges, we can install our CRD by running make install. Note that we have merely installed the CRD so our k8s cluster knows about the Custom Resource, but nothing has been deployed.

To test our Operator on our local machine without installing the Operator in the cluster we can invoke make run in a terminal from our projects directory. You should see a few logs indicating that our Operator started and is watching for changes.

Finally, we’ll create a simple deployment for our custom resource or use the one under config/samples/*_.yaml and watch the Operator logs, as it reconciles, in the same terminal we invoked make run .

Example CustomResource yaml:

apiVersion: games.example.com/v1alpha1
kind: Arcade
metadata:
name: arcade-sample
spec:
# Additional spec fields below
size: 1

We can save the example YAML above under config/samples/games_v1alpha1_arcade.yaml then invoke the following k8s command to deploy the CR to the cluster:

kubectl apply -f config/samples/games_v1alpha1_arcade.yaml

Testing

No software is complete without tests; since our Operator is written in Go, testing can be done using ginkgo and gomega, a BDD style testing framework for Go. Note, ginkgo and gomega are not the only testing frameworks that can be used, but you may notice, the operator-sdk already provided some scaffolding for some basic test setup using ginkgo and gomega in the suite_test.go file. Let’s add a few tests for our controller that we added previously; we will start by creating a new test file: arcade_controller_test.go

vi arcade_controller_tests.go

We’ll add a test that verify our deployments reflect our desired state after a reconcile:

// ...imports
var _ = Describe("Arcade Controller", func() {
ctx := context.Background()
By("Verify deployment was created")
dep := &appsv1.Deployment{}
Eventually(func() bool {
err := k8sClient.Get(ctx, key, dep)
if err != nil {
return false
}
return true
}, timeout, interval).Should(BeTrue())
Expect(dep.Spec.Template.Spec.Containers[0].Name).To(Equal("arcade"))
Expect(dep.Spec.Template.Spec.Containers[0].Image).To(ContainSubstring("arcade"))
// ...more assertions
}

We can run our tests by calling the test Make target:

make test

You can view a more complete example of this test and the entire Operator code here

Next Steps

By now, we should have a simple Operator that deploys an application and ensures that our application’s state reflects what we have defined in our deployed sample spec. This article only covers Level 1 from the Operator maturity / capabilities model, Basic Install and configuration, however, Operators can encompass the entire lifecycle of an application, from seamless upgrade to auto-pilot where applications are auto-scaled to meet demand. You can build on top of existing business logic in our controller (or separate controllers) to achieve additional levels from the maturity / capabilities model. Lastly, to make your Operator more widely available you can explore having your Operator certified by Red Hat to be listed on OpenShift’s OperatorHub or certified by Red Hat Marketplace and listed as a part of the Red Hat Marketplace catalog.

Certification Process

Operators can be certified through Red Hat to help indicate conformance with Kubernetes plugin APIs like Container Storage Interface (CSI) or Container Networking Interface (CNI). View the Red Hat Badge Certification documentation for more information.

In addition, Operators can also be on-boarded to the Red Hat Marketplace, a marketplace for certified Operators; please see the RHM on-boarding process documentation for additional information.