Securing tenant services while using Chaos Mesh using OPA
Introduction:
Your Kubernetes cluster is used by multiple tenant services. You are already following the best security practices for Kubernetes like each tenant service is running in its own namespaces, users of these tenant services have appropriate access only for their respective namespaces, etc.
Now you have installed and configured chaos mesh (Chaos Mesh is a cloud-native Chaos Engineering platform that orchestrates chaos on Kubernetes environments) on this cluster so that tenant services can make use of it. You have given a few more chaos mesh specific rights to those tenant service users so that they can create chaos mesh resources. Even though users will not be able to create chaos mesh resources in other namespaces they can still impact the other namespace services because chaos mesh does not enforce any such restrictions (chaos mesh version 1.0.1).
Let's try to understand this via simple chaos mesh YAML file:
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
name: pod-kill-example
namespace: chaos-testing
spec:
action: pod-kill
mode: one
selector:
namespaces:
- tidb-cluster-demo
labelSelectors:
'app.kubernetes.io/component': 'tikv'
scheduler:
cron: '@every 1m'
Suppose the user has required rights to namespace chaos-testing so the user has created the above chaos mesh resource in chaos-testing namespace. As we can see in the selector section user has specified some other namespace (tidb-cluster-demo), which means the pods which will be selected for chaos operation will be from this namespace i.e. tidb-cluster-demo, and not from the one for which the user has access i.e. chaos-testing. PROBLEM !!!
Solution:
So you as cluster owner/admin need to make sure that tenant users should not be able to impact any other tenant services.
How can we solve this problem? We can solve this using the OPA. Let's understand with an example.
At a high level, we want to make sure the namespace mentioned in the selector section is the same as the one in the metadata section. For network chaos, we even have to specify the target selector section so we have to make sure that the namespace in the target section is also matching with the metadata namespace. Let's understand it from the flow chart as well.

ns means namespace.
The first check can be done via RBAC RoleBinding. Which I am assuming is already in place as you are following the best security and least privilege concept.
The second and third checks can be implemented using the following OPA policy.
package kubernetes.admissionkind = {"IoChaos", "KernelChaos", "NetworkChaos", "PodChaos", "StressChaos", "TimeChaos"}#----------------------------------------------------------------- # Deny creation of chaos-mesh resources unless the spec.selector.namespaces and the object.metadata.namespace are same
#----------------------------------------------------------------deny[msg] {
isCreateOrUpdate
kind[input.request.kind.kind]
namespace := input.request.object.spec.selector.namespaces[_]
namespace != input.request.object.metadata.namespace
msg = sprintf("chaos selector namespace[] not matching chaos object namespace: %s", [namespace])
}#-------------------------------------------------------------------# Deny creation of chaos-mesh resources if the spec.target.selector.namespaces is present and the object.metadata.namespace are not same
#-------------------------------------------------------------------deny[msg] {
isCreateOrUpdate
kind[input.request.kind.kind]
input.request.object.spec.target
namespace := input.request.object.spec.target.selector.namespaces[_]
namespace != input.request.object.metadata.namespace
msg = sprintf("chaos target namespace[] not matching chaos object namespace: %s", [namespace])
}#-------------------------------------------------------------------# Deny creation of chaos-mesh resources if input.request.object.spec.target is present and the input.request.object.spec.target.selector.namespaces is missing
#-------------------------------------------------------------------deny[msg] {
isCreateOrUpdate
kind[input.request.kind.kind]
input.request.object.spec.target
not input.request.object.spec.target.selector.namespaces
msg = "chaos target namespace[] should be present"
}#-------------------------------------------------------------------# Deny creation of chaos-mesh resources if input.request.object.spec.selector.namespaces is missing
#-------------------------------------------------------------------deny[msg] {
isCreateOrUpdate
kind[input.request.kind.kind]
not input.request.object.spec.selector.namespaces
msg = "chaos selector namespace[] should be present"
}#----------------------------------------------------------------
# Deny creation of chaos-mesh resources if
input.request.object.spec.selector.pods is not having namespace
mentioned in metadata.namespace
#-----------------------------------------------------------------
deny[msg] {
isCreateOrUpdate
kind[input.request.kind.kind]
input.request.object.spec.selector.pods
namespace := input.request.object.metadata.namespace
pods_ns := input.request.object.spec.selector.pods
not input.request.object.spec.selector.pods[namespace]
msg = sprintf("namespace mentioned at spec.selector.pods: %s is not matching with metadata namespace: %s", [pods_ns, namespace])
}#-------------------------------------------------------------------
# Deny creation of chaos-mesh resources if
input.request.object.spec.selector.pods is having multiple namespaces
#------------------------------------------------------------------
deny[msg] {
isCreateOrUpdate
kind[input.request.kind.kind]
input.request.object.spec.selector.pods
namespace := input.request.object.metadata.namespace
pods_ns := input.request.object.spec.selector.pods
namespaces := count(input.request.object.spec.selector.pods)
namespaces > 1
msg = sprintf("can't specify more than one namespace at spec.selector.pods: %s", [pods_ns])
}#-------------------------------------------------------------------
# Deny creation of chaos-mesh resources if
input.request.object.spec.target.selector.pods is not having namespace mentioned in metadata.namespace
#-------------------------------------------------------------------
deny[msg] {
isCreateOrUpdate
kind[input.request.kind.kind]
input.request.object.spec.target.selector.pods
namespace := input.request.object.metadata.namespace
pods_ns := input.request.object.spec.target.selector.pods
not input.request.object.spec.target.selector.pods[namespace]
msg = sprintf("namespace mentioned at spec.target.selector.pods: %s is not matching with metadata namespace: %s", [pods_ns, namespace])
}#-------------------------------------------------------------------
# Deny creation of chaos-mesh resources if
input.request.object.spec.target.selector.pods is having multiple namespaces
#-------------------------------------------------------------------
deny[msg] {
isCreateOrUpdate
kind[input.request.kind.kind]
input.request.object.spec.target.selector.pods
namespace := input.request.object.metadata.namespace
pods_ns := input.request.object.spec.target.selector.pods
namespaces := count(input.request.object.spec.target.selector.pods)
namespaces > 1
msg = sprintf("can't specify more than one namespace at spec.target.selector.pods: %s", [pods_ns])
}
You also need to add chaos operation related rules in ValidatingWebhookConfiguration resource of OPA like:
- operations: ["CREATE", "UPDATE"]
apiGroups: ["chaos-mesh.org"]
apiVersions: ["v1alpha1"]
resources: ["networkchaos", "podchaos", "stresschaos", "iochaos", "timechaos", "kernalchaos"]
That’s it.
Now if the user mention any other namespace in the selector or in the target selector section, the OPA policy will not allow creating/update the chaos mesh resource and will show an appropriate error message.
Conclusion:
If you are going to enable chaos mesh in your Kubernetes cluster and want to make sure users should not be able to inject chaos in other namespaces, you need to have an OPA policy in place.
Since the release of Chaos Mesh 1.1.3, this security flaw has been fixed with restrict authorization. Check out my blog for more details.
To know more on how OPA policy works, please refer blog.