A quick overview of OPA Gatekeeper
Recently I had a chance to play with OPA Gatekeeper inside a Kubernetes cluster. While this was an interesting activity, I found the documentation to be spread across different places and this is my attempt to put the basics in one place for anyone who is just starting and wants to get a high-level overview of what’s underneath.
This article doesn’t show you any examples of how to implement OPA Gatekeeper policies as I think the examples in the documentation are good enough.
Disclaimer: At the time of writing versions of the mentioned tools are:
OPA Gatekeeper v3.2.1
OPA v0.24.0
Kubernetes v1.19
Problem
I want to enforce some rules when the Kubernetes API server creates an object like a Pod, Namespace, etc. I want a central place where I can write my rules and those get automatically enforced.
Solution
Use OPA Gatekeeper :)
OPA
Open Policy Agent is an open-source framework that provides you a Rule engine and a Policy language you use to write your rules or policies. It is agnostic of any framework or in other words, doesn’t depend on any other tools. Think of it as a language runtime, which reads/parses a specific language (discussed next), reads the inputs, optionally uses some context apart from input data, applies all of this to the rules, and produces the output.
The great thing about OPA is that it’s unbiased and can be used as a general-purpose rule engine. The only requirement is that the input, output, and context (also called data) have to be in a JSON format. Whatever stack you’re running if you happen to be in a need of a rule engine where you want something as below, OPA can be used.
Rego
Rego is the language in which you write your rules. It’s a very advanced declarative language purposefully designed to be able to write complex rules. In my opinion, I felt it has a bit of a learning curve since it works differently than most other declarative languages I’ve seen and it was difficult for me to understand the examples related to Kubernetes policies. So my recommendation would be to check out the language documentation before you look at sample policies.
Rego also has a nice online playground to try out your policies before you use them. You can provide the necessary input and context and check out if the execution happens as per your expectations.
Kubernetes Admission Controller
Kubernetes API service is the single place that is used to manage Kubernetes resources. Kubernetes Admission Controller allows us to intercept a request made to the server and take an action. Refer to the following image:
Kubernetes provides a bunch of built-in admission controllers. These controllers take effect after a request is authenticated and authorized. Once it reaches an admission controller it can allow, deny, or mutate it. A request allowed/mutated moves to the next admission controller until any one of them denies it where it gets entirely rejected or it gets accepted and creates/updates a Kubernetes resource.
Now out of available admission controllers, there are two special controllers (introduced after v1.13) which have infinite potential as they allow users to register their own webhooks which can be called by the controllers. The one we are interested in is ValidatingAdmissionWebhook. Kubernetes allows us to register our own ValidatingWebhookConfiguration objects which are called by our admission controller.
Now that sounds like an interesting opportunity to introduce our OPA framework, isn’t it! Indeed and configuring it can be seen here. If you read the steps, this looks like a lot and tedious configuration. That’s where OPA Gatekeeper comes in!
Gatekeeper
Think of OPA Gatekeeper as a wrapper that provides you a predefined pattern to help you setup OPA Policies. This basically means configuring OPA for Kubernetes is as simple as applying a single (large) YML file that has all the necessary components covered. Behind the scenes, it will create a Namespace, a Service which will serve the webhooks, an Ingress, and two special Custom Resource Definitions. The two special CRDs namely ConstraintTemplate and Constraint are what makes writing policies easy for you as a user.
Gatekeeper also provides a few additional features on top of manual OPA configuration such as Auditing, dry run policies, and debugging help. It cleverly uses the Constraint resource object to store any violations for that constraint. I was impressed by this!
Apart from the above-mentioned features Gatekeeper also provides a mechanism to sync other Kubernetes resources information which acts as the “Data” part to the Policy Engine. This can also be used as a cache to limit the number of calls to the Kubernetes API server by itself. This is pretty handy if you are writing complex rules.
By default, Gatekeeper doesn’t consider DELETE operations in the admission webhook but you can include that in the big large configuration file I mentioned above.
Constraint Framework Pattern
The Constraint framework is a simple pattern to write and manage multiple policies in our cluster. According to it, if you want to write any policies you need to create just two types of objects. These are already defined in the cluster for you.
ConstraintTemplate: Use this to define what arguments your constraint can accept (making it “Templatable“) and the actual Rego Policy that needs to be applied. The rego policy now just needs to define special rules (a function in rego) called “violation” with a predefined set of arguments, inside the body goes your actual logic, returns undefined for valid config, or a message that highlights the error and that’s it!
Constraint: Use it to “implement” the template with particular arguments, provide the scope of the application of policy like ApiGroup, Kind included/excluded namespaces, etc.
At this point, you can take a look at the example policies and it should give you a fair idea of what’s possible. One of the easiest examples of “Forcing required labels on every resource” can make a huge impact on implementing better practices of creating and managing a large number of resources within a cluster.
a few of the interesting use cases are implementing Compliances like the use of preapproved container registries only, cost budget per environment.
Bonus:
As I was thinking of making this work at multiple clusters at the same time without repeating myself (read at scale). I thought of using Rancher Fleets which provides us a Gitops way to implement this. But that’s for another time.